Maintenance Notice

Due to necessary scheduled maintenance, the JMIR Publications website will be unavailable from Wednesday, July 01, 2020 at 8:00 PM to 10:00 PM EST. We apologize in advance for any inconvenience this may cause you.

Who will be affected?

Journal Description

The Journal of Medical Internet Research (JMIR) is the pioneer open access eHealth journal, and is the flagship journal of JMIR Publications. It is a leading health services and digital health journal globally in terms of quality/visibility (Journal Impact Factor™ 5.8 (Clarivate, 2024)), ranking Q1 in both the 'Medical Informatics' and 'Health Care Sciences & Services' categories, and is also the largest journal in the field. The journal is ranked #1 on Google Scholar in the 'Medical Informatics' discipline. The journal focuses on emerging technologies, medical devices, apps, engineering, telehealth and informatics applications for patient education, prevention, population health and clinical care.

JMIR is indexed in all major literature indices including National Library of Medicine(NLM)/MEDLINE, Sherpa/Romeo, PubMed, PMCScopus, Psycinfo, Clarivate (which includes Web of Science (WoS)/ESCI/SCIE), EBSCO/EBSCO Essentials, DOAJ, GoOA and others. The Journal of Medical Internet Research received a CiteScore of 14.4, placing it in the 95th percentile (#7 of 138) as a Q1 journal in the field of Health Informatics. It is a selective journal complemented by almost 30 specialty JMIR sister journals, which have a broader scope, and which together receive over 10,000 submissions a year. 

As an open access journal, we are read by clinicians, allied health professionals, informal caregivers, and patients alike, and have (as with all JMIR journals) a focus on readable and applied science reporting the design and evaluation of health innovations and emerging technologies. We publish original research, viewpoints, and reviews (both literature reviews and medical device/technology/app reviews). Peer-review reports are portable across JMIR journals and papers can be transferred, so authors save time by not having to resubmit a paper to a different journal but can simply transfer it between journals. 

We are also a leader in participatory and open science approaches, and offer the option to publish new submissions immediately as preprints, which receive DOIs for immediate citation (eg, in grant proposals), and for open peer-review purposes. We also invite patients to participate (eg, as peer-reviewers) and have patient representatives on editorial boards.

As all JMIR journals, the journal encourages Open Science principles and strongly encourages publication of a protocol before data collection. Authors who have published a protocol in JMIR Research Protocols get a discount of 20% on the Article Processing Fee when publishing a subsequent results paper in any JMIR journal.

Be a widely cited leader in the digital health revolution and submit your paper today!

 

Recent Articles:

  • Prompt: “Create an image illustrating an emerging adult using chatbot via mobile phone, warm background, cozy comfortable environment”
AI-generated image further edited by the author using Adobe Photoshop (ver 2021).
Date of request: 22/12/2024. Source: Image generated by Open AI’s ChatGPT Version 4o, and further edited by the authors in Photoshop.; Copyright: N/A (AI-generated image); URL: https://jmir.org/2025/1/e70436/; License: Public Domain (CC0).

    Effectiveness of Topic-Based Chatbots on Mental Health Self-Care and Mental Well-Being: Randomized Controlled Trial

    Abstract:

    Background: The global surge in mental health challenges has placed unprecedented strain on health care systems, highlighting the need for scalable interventions to promote mental health self-care. Chatbots have emerged as promising tools by providing accessible, evidence-based support. While chatbots have shown promise in delivering mental health interventions, most studies have only focused on clinical populations and symptom reduction, leaving a critical gap in understanding their preventive potential for self-care and mental health literacy in the general population. Objective: This study evaluated the effectiveness of a rule-based, topic-specific chatbot intervention in improving self-care efficacy, mental health literacy, self-care intention, self-care behaviors, and mental well-being immediately after 10 days and 1 month of its use. Methods: A 2-arm, assessor-blinded randomized controlled trial was conducted. A total of 285 participants were randomly assigned to the chatbot intervention group (n=140) and a waitlist control group (n=145). The chatbot intervention consisted of 10 topic-specific sessions targeting stress management, emotion regulation, and value clarification, delivered over 10 days with a 7-day free-access period. Primary outcomes included self-care self-efficacy, behavioral intentions, self-care behaviors, and mental health literacy. Secondary outcomes included depressive symptoms, anxiety symptoms, and mental well-being. Assessments were self-administered on the web at baseline, 10 days after the intervention, and at a 1-month follow-up. All outcomes were analyzed using linear mixed models with an intention-to-treat approach, and effect sizes were calculated using Cohen d. Results: Participants in the chatbot group demonstrated significantly greater improvements in behavioral intentions (F2,379.74=15.02; P<.001) and mental health literacy (F2,423.57=4.27; P=.02) compared to the control group. The chatbots were also able to bring significant improvement in self-care behaviors (Cohen d=0.36, 95% CI 0.08-0.30; P<.001), mindfulness (Cohen d=0.37, 95% CI 0.14-0.38; P<.001), depressive symptoms (Cohen d=–0.26, 95% CI –1.77 to –0.26; P=.004), overall well-being (Cohen d=0.22, 95% CI 0.02-0.42; P=.02), and positive emotions (Cohen d=0.28, 95% CI 0.08-0.54; P=.004) after 10 days. However, these improvements did not differ significantly at 1 month when compared to the waitlist control group. Adherence was higher among participants who received push notifications (t138=–4.91; P<.001). Conclusions: This study highlights the potential of rule-based chatbots in promoting mental health literacy and fostering short-term self-care intentions. However, the lack of sustained effects points to the necessary improvements required in chatbot design, including greater personalization and interactive features to enhance self-efficacy and long-term mental health outcomes. Future research should explore hybrid approaches that combine rule-based and generative artificial intelligence systems to optimize intervention effectiveness. Trial Registration: ClinicalTrials.gov NCT05694507; https://clinicaltrials.gov/ct2/show/NCT05694507

  • Source: Freepik; Copyright: gpointstudio; URL: https://www.freepik.com/free-photo/doctor-showing-examination-results-digital-tablet_11981847.htm; License: Licensed by JMIR.

    New Performance Measurement Framework for Realizing Patient-Centered Clinical Decision Support: Qualitative Development Study

    Abstract:

    Background: Patient-centered clinical decision support (PC CDS) exists on a continuum that reflects the degree to which its knowledge base, data, delivery, and use focus on patient needs and experiences. A new focus on value-based, whole-person care has resulted in broader development of PC CDS technologies, yet there is limited information on how to measure their performance and effectiveness. To address these gaps, there is a need for more measurement guidance to assess PC CDS interventions. Objective: This paper presents a new framework that incorporates patient-centered principles into traditional health IT and clinical decision support (CDS) evaluation frameworks to create a unified guide to PC CDS performance measurement. Methods: We conducted a targeted literature review of 147 sources on health IT, CDS, and PC CDS measurement and evaluation to develop the framework. Sources were reviewed if they included the sociotechnical components relevant to PC CDS, covered the full IT life cycle of PC CDS, and addressed measurement considerations at different user and system levels. We then validated and refined the measurement framework through key informant interviews with 6 experts in measurement, CDS, and clinical informatics. Throughout the framework development, we gathered feedback from a 7-member expert committee on the methods, findings, and the framework’s relevance and application. Results: The PC CDS performance measurement framework includes 6 domains: safe, timely, effective, efficient, equitable, and patient centered. The 6 domains contain 34 subdomains that can be selected to assess performance, depending on the type of PC CDS intervention or the specific research focus. In addition, there are 4 levels of aggregation at which subdomains can be measured (individual, population, organization, or IT system) that account for the multilevel impact of PC CDS. We provide examples of measures and approaches to patient centeredness for each subdomain, followed by 2 illustrative use cases demonstrating the framework application. Conclusions: This framework can be used by researchers, health system leaders, informaticians, and patients to understand the full breadth of performance and impact of PC CDS technology. The framework is significant in that it (1) covers the entire PC CDS life cycle, (2) has a direct focus on the patient, (3) covers measurement at different levels, (4) encompasses 6 independent but related domains, and (5) requires additional research and development to fully characterize all domains and subdomains. As the field of PC CDS matures, researchers and evaluators can build upon the framework to assess which components of PC CDS technologies work; whether PC CDS technologies are being used as anticipated; and whether the intended outcomes of delivering evidence-based, patient-centered care are being achieved.

  • Source: freepik; Copyright: freepik; URL: https://www.freepik.com/free-photo/medium-shot-woman-looking-out-window_24238579.htm; License: Licensed by JMIR.

    Investigating Protective and Risk Factors and Predictive Insights for Aboriginal Perinatal Mental Health: Explainable Artificial Intelligence Approach

    Abstract:

    Background: Perinatal depression and anxiety significantly impact maternal and infant health, potentially leading to severe outcomes like preterm birth and suicide. Aboriginal women, despite their resilience, face elevated risks due to the long-term effects of colonization and cultural disruption. The Baby Coming You Ready (BCYR) model of care, centered on a digitized, holistic, strengths-based assessment, was co-designed to address these challenges. The successful BCYR pilot demonstrated its ability to replace traditional risk-based screens. However, some health professionals still overrely on psychological risk scores, often overlooking the contextual circumstances of Aboriginal mothers, their cultural strengths, and mitigating protective factors. This highlights the need for new tools to improve clinical decision-making. Objective: We explored different explainable artificial intelligence (XAI)–powered machine learning techniques for developing culturally informed, strengths-based predictive modeling of perinatal psychological distress among Aboriginal mothers. The model identifies and evaluates influential protective and risk factors while offering transparent explanations for AI-driven decisions. Methods: We used deidentified data from 293 Aboriginal mothers who participated in the BCYR program between September 2021 and June 2023 at 6 health care services in Perth and regional Western Australia. The original dataset includes variables spanning cultural strengths, protective factors, life events, worries, relationships, childhood experiences, family and domestic violence, and substance use. After applying feature selection and expert input, 20 variables were chosen as predictors. The Kessler-5 scale was used as an indicator of perinatal psychological distress. Several machine learning models, including random forest (RF), CatBoost (CB), light gradient-boosting machine (LightGBM), extreme gradient boosting (XGBoost), k-nearest neighbor (KNN), support vector machine (SVM), and explainable boosting machine (EBM), were developed and compared for predictive performance. To make the black-box model interpretable, post hoc explanation techniques including Shapley additive explanations and local interpretable model-agnostic explanations were applied. Results: The EBM outperformed other models (accuracy=0.849, 95% CI 0.8170-0.8814; F1-score=0.771, 95% CI 0.7169-0.8245; area under the curve=0.821, 95% CI 0.7829-0.8593) followed by RF (accuracy=0.829, 95% CI 0.7960-0.8617; F1-score=0.736, 95% CI 0.6859-0.7851; area under the curve=0.795, 95% CI 0.7581-0.8318). Explanations from EBM, Shapley additive explanations, and local interpretable model-agnostic explanations identified consistent patterns of key influential factors, including questions related to “Feeling Lonely,” “Blaming Herself,” “Makes Family Proud,” “Life Not Worth Living,” and “Managing Day-to-Day.” At the individual level, where responses are highly personal, these XAI techniques provided case-specific insights through visual representations, distinguishing between protective and risk factors and illustrating their impact on predictions. Conclusions: This study shows the potential of XAI-driven models to predict psychological distress in Aboriginal mothers and provide clear, human-interpretable explanations of how important factors interact and influence outcomes. These models may help health professionals make more informed, non-biased decisions in Aboriginal perinatal mental health screenings.

  • Source: freepik.com; Copyright: DCStudio; URL: https://www.freepik.com/free-photo/senior-pediatrician-woman-discussing-sickness-treatment-using-clipboard-medical-presentation_16273933.htm; License: Licensed by JMIR.

    Integrating a Mobile App to Enhance Atrial Fibrillation Care: Key Insights From an Implementation Study Guided by the Consolidated Framework for...

    Abstract:

    Background: Despite the growing use of mobile health apps in managing chronic heart disease, their integration into routine care remains challenging due to dynamic, context-specific barriers. Objective: This study aimed to identify the key enablers and challenges of implementing a mobile app for cardiac rehabilitation and healthy lifestyles in patients with atrial fibrillation at an Australian cardiology clinic. Methods: We interviewed both clinicians and patients to understand their perspectives about the mobile app and what factors affected the implementation. The two semistructured interview guides used, one for clinicians and one for patients, were developed based on the Consolidated Framework for Implementation Research (CFIR) and nonadoption abandonment, scale-up, spread, and sustainability complexity assessment tool. All interviews were recorded and transcribed, and the transcripts were analyzed inductively to generate codes using a constructionist perspective. These codes were subsequently mapped onto the constructs within the CFIR across its five domains. This framework analysis was followed by examining the interconnections among the constructs to understand their collective impact on the implementation process, thereby identifying key enablers and challenges for the integration efforts. Results: We interviewed 24 participants including 18 patients, whose mean age was 69 (SD 9.2) years, and 6 clinicians, comprising 4 specialist cardiac electrophysiologists and 2 nurses. Patient engagement with the app varied: 3 participants completed the cardiac rehabilitation plan, 1 participant was still actively engaged, 2 participants had partial use, 10 participants downloaded but never used the app, and 2 participants did not download the app. We identified a complex interplay between key determinants across all five CFIR domains, collectively impacting two main elements in the implementation process: (1) acceptability and user engagement with the app and (2) the clinic’s implementation readiness. The app was more likely to be accepted and used by patients who needed to establish healthy lifestyle habits. Those with established healthy lifestyle habits did not indicate that the app provided sufficient added value to justify adoption. Interoperability with other devices and design issues, for example, limited customization options, also negatively impacted the uptake. The clinic’s implementation readiness was limited by various challenges including limited staff availability, insufficient internal communication processes, the absence of an implementation evaluation plan, and lack of clarity around who is funding the app’s use beyond the initial trial. Despite the clinician’s overall inclination toward technology use, diverse opinions on the evidence for short-term cardiac rehabilitation programs in atrial fibrillation critically reduced their commitment to app integration. Conclusions: Mobile health apps have seen rapid expansion and offer clear benefits, yet their integration into complex health systems remains challenging. Whilst our findings are from a single app implementation, they highlight the importance of embedding contextual analysis and proactive strategic planning in the integration process.

  • Source: Freepik; Copyright: Freepik; URL: https://www.freepik.com/free-photo/friends-with-social-distance-concept_11382491.htm; License: Licensed by JMIR.

    Use of Retrieval-Augmented Large Language Model for COVID-19 Fact-Checking: Development and Usability Study

    Abstract:

    Background: The COVID-19 pandemic has been accompanied by an “infodemic,” where the rapid spread of misinformation has exacerbated public health challenges. Traditional fact-checking methods, though effective, are time-consuming and resource-intensive, limiting their ability to combat misinformation at scale. Large language models (LLMs) such as GPT-4 offer a more scalable solution, but their susceptibility to generating hallucinations—plausible yet incorrect information—compromises their reliability. Objective: This study aims to enhance the accuracy and reliability of COVID-19 fact-checking by integrating a retrieval-augmented generation (RAG) system with LLMs, specifically addressing the limitations of hallucination and context inaccuracy inherent in stand-alone LLMs. Methods: We constructed a context dataset comprising approximately 130,000 peer-reviewed papers related to COVID-19 from PubMed and Scopus. This dataset was integrated with GPT-4 to develop multiple RAG-enhanced models: the naïve RAG, Lord of the Retrievers (LOTR)–RAG, corrective RAG (CRAG), and self-RAG (SRAG). The RAG systems were designed to retrieve relevant external information, which was then embedded and indexed in a vector store for similarity searches. One real-world dataset and one synthesized dataset, each containing 500 claims, were used to evaluate the performance of these models. Each model’s accuracy, F1-score, precision, and sensitivity were compared to assess their effectiveness in reducing hallucination and improving fact-checking accuracy. Results: The baseline GPT-4 model achieved an accuracy of 0.856 on the real-world dataset. The naïve RAG model improved this to 0.946, while the LOTR-RAG model further increased accuracy to 0.951. The CRAG and SRAG models outperformed all others, achieving accuracies of 0.972 and 0.973, respectively. The baseline GPT-4 model reached an accuracy of 0.960 on the synthesized dataset. The naïve RAG model increased this to 0.972, and the LOTR-RAG, CRAG, and SRAG models achieved an accuracy of 0.978. These findings demonstrate that the RAG-enhanced models consistently maintained high accuracy levels, closely mirroring ground-truth labels and significantly reducing hallucinations. The CRAG and SRAG models also provided more detailed and contextually accurate explanations, further establishing the superiority of agentic RAG frameworks in delivering reliable and precise fact-checking outputs across diverse datasets. Conclusions: The integration of RAG systems with LLMs substantially improves the accuracy and contextual relevance of automated fact-checking. By reducing hallucinations and enhancing transparency by citing retrieved sources, this method holds significant promise for rapid, reliable information verification to combat misinformation during public health crises.

  • AI-generated image, in response to the request:

 "Generate an image of: A colorful photograph capturing a dynamic scene within a hospital command center. Three individuals: an engineer, a nurse, and a doctor, are collaborating intently. They stand before a large, multi-screen display showcasing a complex visualization of real-time data and analytics. The engineer, focused on the technical aspects, points towards a specific area on the display, while the nurse and doctor lean in attentively, discussing the implications. The vibrant colors of the hospital environment and the technology create a sense of urgency and innovation. The image emphasizes the collaborative spirit and the crucial role of interdisciplinary teamwork in modern healthcare. Make sure to avoid badges or lab coats that contain text." (Generator: Google Gemini 2.0 Flash April 15th, 2025; Requestor: Fernando Acosta Perez). Source: Google Gemini 2.0 Flash; Copyright: N/A (AI Generated Image); URL: https://jmir.org/2025/1/e63765/; License: Public Domain (CC0).

    Toward Real-Time Discharge Volume Predictions in Multisite Health Care Systems: Longitudinal Observational Study

    Abstract:

    Background: Emergency department (ED) admissions are one of the most critical decisions made in health care, with 40% of ED visits resulting in inpatient hospitalization for Medicare patients. A main challenge with the ED admissions process is the inability to move patients from the ED to an inpatient unit quickly. Identifying hospital discharge volume in advance may be valuable in helping hospitals determine capacity management mechanisms to reduce ED boarding, such as transferring low-complexity patients to neighboring hospitals. Although previous research has studied the prediction of discharges in the context of inpatient care, most of the work is on long-term predictions (ie, discharges within the next 24 to 48 hours) in single-site health care systems. In this study, we approach the problem of inpatient discharge prediction from a system-wide lens and evaluate the potential interactions between the two facilities in our partner multisite system to predict short-term discharge volume. Objective: The objective of this paper was to predict discharges from the general care units within a large tertiary teaching hospital network in the Midwest and evaluate the impact of external information from other hospitals on model performance. Methods: We conducted 2 experiments with 174,799 discharge records from 2 hospitals. In Experiment 1, we predicted the number of discharges across 2 time points (within the hour and the next 4 hours) using random forest (RF) and linear regression (LR) models. Models with access to internal hospital data (ie, system-agnostic) were compared with models with access to additional data from the other hospitals in the network (ie, system-aware). In Experiment 2, we evaluated the performance of an RF model to predict afternoon discharges (ie, 12 PM to 4 PM) 1 to 4 hours in advance. Results: In Experiment 1 and Hospital 1, RF and LR models performed equivalently, with R2 scores varying from 0.76 (hourly) to 0.89 (4 hours). In Hospital 2, the RF model performed best, with scores varying from 0.68 (hourly) to 0.84 (4 hours), while scores for LR models ranged from 0.63 to 0.80. There was no significant difference in performance between a system-aware approach and a system-agnostic one. In experiment 2, the mean absolute percentage error increased from 11% to 16% when predicting 4 hours in advance relative to zero hours in Hospital 1 and from 24% to 35% in Hospital 2. Conclusions: Short-term discharges in multisite hospital systems can be locally predicted with high accuracy, even when predicting hours in advance.

  • Source: The Authors / Placeit; Copyright: The Authors / Placeit; URL: https://www.jmir.org/2025/1/e63687/; License: Licensed by JMIR.

    Challenging the Continued Usefulness of Social Media Recruitment for Surveys of Hidden Populations of People Who Use Opioids

    Abstract:

    Historically, recruiting research participants through social media facilitated access to people who use opioids, capturing a range of drug use behaviors. The current rapidly changing online landscape, however, casts doubt on social media’s continued usefulness for study recruitment. In this viewpoint paper, we assessed social media recruitment for people who use opioids and described challenges and potential solutions for effective recruitment. As part of a study on barriers to harm reduction health services, we recruited people who use opioids in New York City to complete a REDCap (Research Electronic Data Capture; Vanderbilt University) internet-based survey using Meta (Facebook and Instagram), X (formerly known as Twitter), Reddit, and Discord. Eligible participants must have reported using opioids (heroin, prescription opioids, or fentanyl) for nonprescription purposes in the past 90 days and live or work in New York City. Data collection took place from August 2023 to November 2023. Including study purpose, compensation, and inclusion criteria caused Meta’s social media platforms and X to flag our ads as “discriminatory” and “spreading false information.” Listing incentives increased bot traffic across all platforms despite bot prevention activities (eg, reCAPTCHA and counting items in an image). We instituted a rigorous post hoc data cleaning protocol (eg, investigating duplicate IP addresses, participants reporting use of a fictitious drug, invalid ZIP codes, and improbable drug use behaviors) to identify bot submissions and repeat participants. Participants received a US $20 gift card if still deemed eligible after post hoc data inspection. There were 2560 submissions, 93.2% (n=2387) of which were determined to be from bots or malicious responders. Of these, 23.9% (n=571) showed evidence of a duplicate IP or email address, 45.9% (n=1095) reported consuming a fictitious drug, 15.8% (n=378) provided an invalid ZIP code, and 9.4% (n=225) reported improbable drug use behaviors. The majority of responses deemed legitimate (n=173) were collected from Meta (n=79, 45.7%) and Reddit (n=48, 27.8%). X’s ads were the most expensive (US $1.96/click) and yielded the fewest participants (3 completed surveys). Social media recruitment of hidden populations is challenging but not impossible. Rigorous data collection protocols and post hoc data inspection are necessary to ensure the validity of findings. These methods may counter previous best practices for researching stigmatized behaviors.

  • Source: freepik.com; Copyright: Freepik; URL: https://www.freepik.com/free-photo/medium-shot-smiley-woman-with-cup_13435959.htm; License: Licensed by JMIR.

    Low Earth Orbit Communication Satellites: A Positively Disruptive Technology That Could Change the Delivery of Health Care in Rural and Northern Canada

    Abstract:

    Canada is a progressive nation that endeavors to provide comprehensive, universal, and portable health care to all its citizens. This is a challenge for a country with a population of 40 million living within a land expanse of 10 million km2 and where 18% live in rural or highly remote locations. The combined population of Yukon, Northwest Territories, and Nunavut is only 128,959 (0.32% of the population), living within 3.92 million km2, and many of these citizens live in isolated communities with unique health needs and social issues. The current solution to providing health care in the most remote locations has been to transport the patient to the health care provider or vice versa, which incurs considerable financial strain on our health care system and personal stress to the patient and provider. The recent global deployment of low Earth orbit communication satellites (LEO-ComSats) will change the practice and availability of online medicine everywhere, especially in northern Canada. The deployment of LEO-ComSats could result in disruptive but positive changes in medical care for underserved communities in remote geographic locations across Canada. LEO-ComSats can be used to demonstrate online medical encounters between a patient and a doctor in Canada, separated by thousands of kilometers. Most certainly, the academic medical centers in lower Canada could perform online telementored medical care to our northern communities like the remote care provided to many Canadians during the COVID-19 pandemic. An online health care model requires effective design, testing, and validation of the policies, standards, requirements, procedures, and protocols. Although the COVID-19 pandemic was the initial prime mover across all of Canada in the use of online medical encounters and creating rapidly devised reimbursement models, it was nonetheless created reactively, using real-time managerial fiat and poorly defined procedures based on minimal pedagogical experience, which made it “difficult to prove it was universally safe.” It is essential to proactively derive the medical policies, standards, and procedures for telementored medicine and “prove it is safe” before LEO-ComSat technology is ubiquitously deployed in northern Canada. This viewpoint was written by subject matter experts who have researched online and internet-based medicine for many years, sometimes 3 decades. In many cases, a literature review was not necessary since they already had the articles in the bibliography or knowledge in their possession. In many cases, internet search engines (ie, Google or PubMed) and Canadian government documents were used to provide corroborating evidence.

  • AI-generated image, in response to request "A abstract image illustrating 'Readdressing the Ongoing Challenge of Missing Data in Youth Ecological Momentary Assessment Studies: A Meta-Analysis Update." (Generator: DALL-E3/OpenAI August 23, 2024; Requestor: Konstantin Drexl). Source: Created with DALL-E, AN AI system by OpenAI; Copyright: N/A (AI_generated image); URL: https://www.jmir.org/2025/1/e65710/; License: Public Domain (CC0).

    Readdressing the Ongoing Challenge of Missing Data in Youth Ecological Momentary Assessment Studies: Meta-Analysis Update

    Abstract:

    Background: Ecological momentary assessment (EMA) is pivotal in longitudinal health research in youth, but potential bias associated with nonparticipation, omitted reports, or dropout threatens its clinical validity. Previous meta-analytic evidence is inconsistent regarding specific determinants of missing data. Objective: This meta-analysis aimed to update and expand upon previous research by examining key participation metrics—acceptance, compliance, and retention—in youth EMA studies. In addition, it sought to identify potential moderators among sample and design characteristics, with the goal of better understanding and mitigating the impact of missing data. Methods: We used a bibliographic database search to identify EMA studies involving children and adolescents published from 2001 to November 2023. Eligible studies used mobile-delivered EMA protocols in samples with an average age up to 18 years. We conducted separate meta-analyses for acceptance, compliance, and retention rates, and performed meta-regressions to address sample and design characteristics. Furthermore, we extracted and pooled sample-level effect sizes related to correlates of response compliance. Risk of publication bias was assessed using funnel plots, regression tests, and sensitivity analyses targeting inflated compliance rates. Results: We identified 285 samples, including 17,441 participants aged 5 to 17.96 years (mean age 14.22, SD 2.24 years; mean percentage of female participants 55.7%). Pooled estimates were 67.27% (k=88, 95% CI 62.39-71.96) for acceptance, 71.97% (k=216, 95% CI 69.83-74.11) for compliance, and 96.57% (k=169, 95% CI 95.42-97.56) for retention. Despite overall poor moderation of participation metrics, acceptance rates decreased as the number of EMA items increased (log-transformed b=−0.115, SE 0.036; 95% CI −0.185 to −0.045; P=.001; R2=19.98), compliance rates declined by 0.8% per year of publication (SE 0.25, 95% CI −1.3 to −0.3; P=.002; R2=4.17), and retention rates dropped with increasing study duration (log-transformed b=−0.061, SE 0.015; 95% CI −0.091 to 0.032; P<.001; R2=10.06). The benefits of monetary incentives on response compliance diminished as the proportion of female participants increased (b=−0.002, SE 0.001; 95% CI −0.003 to −0.001; P=.003; R2=9.47). Within-sample analyses showed a small but significant effect indicating higher compliance in girls compared to boys (k=25; g=0.18; 95% CI 0.06-0.31; P=.003), but no significant age-related effects were found (k=14; z score=0.05; 95% CI −0.01 to 0.16). Conclusions: Despite a 5-fold increase in included effect sizes compared to the initial review, the variability in rates of missing data that one can expect based on specific sample and design characteristics remains substantial. The inconsistency in identifying robust moderators highlights the need for greater attention to missing data and its impact on study results. To eradicate any health-related bias in EMA studies, researchers should collectively increase transparent reporting practices, intensify primary methodological research, and involve participants’ perspectives on missing data. Clinical Trial: PROSPERO CRD42022376948; https://www.crd.york.ac.uk/PROSPERO/view/CRD42022376948

  • Source: Freepik; Copyright: pressfoto; URL: https://www.freepik.com/free-photo/top-view-unrecognizable-hacker-performing-cyberattack-night_5698343.htm; License: Licensed by JMIR.

    Accuracy of Large Language Models When Answering Clinical Research Questions: Systematic Review and Network Meta-Analysis

    Abstract:

    Background: Large language models (LLMs) have flourished and gradually become an important research and application direction in the medical field. However, due to the high degree of specialization, complexity, and specificity of medicine, which results in extremely high accuracy requirements, controversy remains about whether LLMs can be used in the medical field. More studies have evaluated the performance of various types of LLMs in medicine, but the conclusions are inconsistent. Objective: This study uses a network meta-analysis (NMA) to assess the accuracy of LLMs when answering clinical research questions to provide high-level evidence-based evidence for its future development and application in the medical field. Methods: In this systematic review and NMA, we searched PubMed, Embase, Web of Science, and Scopus from inception until October 14, 2024. Studies on the accuracy of LLMs when answering clinical research questions were included and screened by reading published reports. The systematic review and NMA were conducted to compare the accuracy of different LLMs when answering clinical research questions, including objective questions, open-ended questions, top 1 diagnosis, top 3 diagnosis, top 5 diagnosis, and triage and classification. The NMA was performed using Bayesian frequency theory methods. Indirect intercomparisons between programs were performed using a grading scale. A larger surface under the cumulative ranking curve (SUCRA) value indicates a higher ranking of the corresponding LLM accuracy. Results: The systematic review and NMA examined 168 articles encompassing 35,896 questions and 3063 clinical cases. Of the 168 studies, 40 (23.8%) were considered to have a low risk of bias, 128 (76.2%) had a moderate risk, and none were rated as having a high risk. ChatGPT-4o (SUCRA=0.9207) demonstrated strong performance in terms of accuracy for objective questions, followed by Aeyeconsult (SUCRA=0.9187) and ChatGPT-4 (SUCRA=0.8087). ChatGPT-4 (SUCRA=0.8708) excelled at answering open-ended questions. In terms of accuracy for top 1 diagnosis and top 3 diagnosis of clinical cases, human experts (SUCRA=0.9001 and SUCRA=0.7126, respectively) ranked the highest, while Claude 3 Opus (SUCRA=0.9672) performed well at the top 5 diagnosis. Gemini (SUCRA=0.9649) had the highest rated SUCRA value for accuracy in the area of triage and classification. Conclusions: Our study indicates that ChatGPT-4o has an advantage when answering objective questions. For open-ended questions, ChatGPT-4 may be more credible. Humans are more accurate at the top 1 diagnosis and top 3 diagnosis. Claude 3 Opus performs better at the top 5 diagnosis, while for triage and classification, Gemini is more advantageous. This analysis offers valuable insights for clinicians and medical practitioners, empowering them to effectively leverage LLMs for improved decision-making in learning, diagnosis, and management of various clinical scenarios. Trial Registration: PROSPERO CRD42024558245; https://www.crd.york.ac.uk/PROSPERO/view/CRD42024558245

  • Source: freepik; Copyright: freepik; URL: https://www.freepik.com/free-photo/anxious-man-indoors-front-view_32407617.htm; License: Licensed by JMIR.

    Potential Harms of Feedback After Web-Based Depression Screening: Secondary Analysis of Negative Effects in the Randomized Controlled DISCOVER Trial

    Abstract:

    Background: Web-based depression screening followed by automated feedback of results is frequently used and promoted by mental health care providers. However, criticism points to potential associated harms. Systematic empirical evidence on postulated negative effects is missing. Objective: We aimed to examine whether automated feedback after web-based depression screening is associated with misdiagnosis, mistreatment, deterioration in depression severity, deterioration in emotional response to symptoms, and deterioration in suicidal ideation at 1 and 6 months after screening. Methods: This is a secondary analysis of the German-wide, web-based, randomized controlled DISCOVER trial. Affected but undiagnosed individuals screening positive for depression (9-item Patient Health Questionnaire [PHQ-9] ≥10 points) were randomized 1:1:1 to receive nontailored feedback, tailored feedback, or no feedback on their screening result. Misdiagnosis and mistreatment were operationalized as having received a depression diagnosis by a health professional and as having started guideline-based depression treatment since screening (self-report), respectively, while not having met the Diagnostic and Statistical Manual of Mental Disorders (Fifth Edition) (DSM-V) criteria of a major depressive disorder at baseline (Structured Clinical Interview for DSM-V Disorders). Deterioration in depression severity was defined as a pre-post change of ≥4.4 points in the PHQ-9, deterioration in emotional response to symptoms as a pre-post change of ≥3.1 points in a composite scale of the Brief Illness Perception Questionnaire, and deterioration in suicidal ideation as a pre-post change of ≥1 point in the PHQ-9 suicide item. Outcome rates were compared between each feedback arm and the no feedback arm in terms of relative risks (RRs). Results: In the per protocol sample of 948 participants (n=685, 72% female; mean age of 37.3, SD 14.1 years), there was no difference in rates of misdiagnosis (ranging from 3.5% to 4.9% across all study arms), mistreatment (7.2%-8.3%), deterioration in depression severity (2%-6.8%), deterioration in emotional response (0.7%-2.9%), and deterioration in suicidal ideation at 6 months (6.8%-13.1%) between the feedback arms and the no feedback arm (RRs ranging from 0.46 to 1.96; P values ≥.13). The rate for deterioration in suicidal ideation at 1 month was increased in the nontailored feedback arm (RR 1.92; P=.01) but not in the tailored feedback arm (RR 1.26; P=.43), with rates of 12.3%, 8.1%, and 6.4% in the nontailored, tailored, and no feedback arms, respectively. All but 1 sensitivity analyses as well as subgroup analyses for false-positive screens supported the findings. Conclusions: The results indicate that feedback after web-based depression screening is not associated with negative effects such as misdiagnosis, mistreatment, and deterioration in depression severity or in emotional response to symptoms. However, it cannot be ruled out that nontailored feedback may increase the risk of deterioration in suicidal ideation. Robust prospective research on negative effects and particularly suicidal ideation is needed and should inform current practice. Trial Registration: ClinicalTrials.gov NCT04633096; https://clinicaltrials.gov/study/NCT04633096; Open Science Framework 10.17605/OSF.IO/TZYRD; https://osf.io/tzyrd

  • Source: Freepik; Copyright: freepik; URL: https://www.freepik.com/free-photo/medical-record-report-healthcare-document-concept_17076148.htm; License: Licensed by JMIR.

    Harnessing an Artificial Intelligence–Based Large Language Model With Personal Health Record Capability for Personalized Information Support in Postsurgery...

    Abstract:

    Background: Myocardial infarction (MI) remains a leading cause of morbidity and mortality worldwide. Although postsurgical cardiac interventions have improved survival rates, effective management during recovery remains challenging. Traditional informational support systems often provide generic guidance that does not account for individualized medical histories or psychosocial factors. Recently, artificial intelligence (AI)–based large language models (LLM) tools have emerged as promising interventions to deliver personalized health information to post-MI patients. Objective: We aim to explore the user experiences and perceptions of an AI-based LLM tool (iflyhealth) with integrated personal health record functionality in post-MI care, assess how patients and their family members engaged with the tool during recovery, identify the perceived benefits and challenges of using the technology, and to understand the factors promoting or hindering continued use. Methods: A purposive sample of 20 participants (12 users and 8 nonusers) who underwent MI surgery within the previous 6 months was recruited between July and August 2024. Data were collected through semistructured, face-to-face interviews conducted in a private setting, using an interview guide to address participants’ first impressions, usage patterns, and reasons for adoption or nonadoption of the iflyhealth app. The interviews were audio-recorded, transcribed verbatim, and analyzed using Colaizzi method. Results: Four key themes revealed included: (1) participants’ experiences varied based on digital literacy, prior exposure to health technologies, and individual recovery needs; (2) users appreciated the app’s enhanced accessibility to professional health information, personalized advice tailored to their clinical conditions, and the tool’s responsiveness to health status changes; (3) challenges such as difficulties with digital literacy, usability concerns, and data privacy issues were significant barriers; and (4) nonusers and those who discontinued use primarily cited complexity of the interface and perceived limited relevance of the advice as major deterrents. Conclusions: iflyhealth, an LLM AI app with a built-in personal health record functionality, shows significant potential in assisting post-MI patients. The main benefits reported by iflyhealth users include improved access to personalized health information and an enhanced ability to respond to changing health conditions. However, challenges such as digital literacy, usability, and privacy and security concerns persist. Overcoming the barriers may further enhance the use of the iflyhealth app, which can play an important role in patient-centered, personalized post-MI management. Trial Registration:

Citing this Article

Right click to copy or hit: ctrl+c (cmd+c on mac)

Latest Submissions Open for Peer-Review:

View All Open Peer Review Articles
  • Bridging the AI-literacy gap in healthcare: a qualitative analysis of the Flanders case-study

    Date Submitted: Apr 29, 2025

    Open Peer Review Period: Apr 30, 2025 - Jun 25, 2025

    Background: The integration of Artificial Intelligence (AI) in healthcare is advancing rapidly, promising to enhance clinical decision-making, streamline administrative tasks, and personalize patient...

    Background: The integration of Artificial Intelligence (AI) in healthcare is advancing rapidly, promising to enhance clinical decision-making, streamline administrative tasks, and personalize patient care. However, many healthcare professionals report a lack of confidence in understanding, critically evaluating, and ethically applying AI technologies. In regions like Flanders, Belgium—recognized for innovation yet facing moderate lifelong learning participation—these challenges are pronounced, especially amid an aging healthcare workforce and resource disparities between professions. Objective: This study aimed to explore the requirements, obstacles, and prospects of AI adoption among healthcare professionals, and to identify the specific training priorities needed to bridge the AI-literacy gap in clinical practice in the Flanders region. Methods: A multi-stage qualitative methodology was employed. First, 15 semi-structured interviews with key informants were conducted to inform the survey design. Then, a survey was distributed to healthcare professionals across Flanders, gathering 134 valid responses. Finally, three focus groups involving 39 participants were conducted to co-interpret the survey findings. Thematic analysis and descriptive statistics were used to synthesize insights across stages. Results: Healthcare professionals recognized AI’s potential to reduce administrative burdens and enhance clinical care but reported low self-perceived AI literacy, especially among older and non-physician staff. Interest in AI training was high, particularly for practical applications and basic AI knowledge, rather than technical coding or standalone ethics courses. Differences emerged based on occupation, age, and perceived job security. Nurses and younger professionals were especially concerned about the risks and opportunities of AI adoption. A lack of legally approved AI tools and practical hands-on training were identified as major barriers. Focus group discussions highlighted disparities in access to AI training between doctors and nurses, skepticism about private-sector-led courses, and the need for hospital management support in facilitating AI education. Conclusions: A one-size-fits-all approach to AI training in healthcare is inadequate. Training programs must be stratified by occupation, age, and resource availability, emphasizing immediate practical applications while embedding ethical considerations within broader curricula. Addressing barriers to training accessibility and clarifying regulatory frameworks will be crucial to scaling AI integration in healthcare systems, starting in Flanders and potentially informing broader European initiatives under frameworks like the EU AI Act.

  • The feasibility and effectiveness of smartwatch device on the adherence to the home-based cardiac rehabilitation in patients with coronary heart disease: A randomized controlled trial.

    Date Submitted: Apr 29, 2025

    Open Peer Review Period: Apr 30, 2025 - Jun 25, 2025

    Background: Digital technologies have the potential to address many of the challenges associated with the traditional center-based CR (CBCR), but the remote home-based cardiac rehabilitation(HBCR) mod...

    Background: Digital technologies have the potential to address many of the challenges associated with the traditional center-based CR (CBCR), but the remote home-based cardiac rehabilitation(HBCR) model remains a challenge. Objective: This study is designed to investigate the feasibility, and efficacy of a smartwatch-facilitated HBCR model in patients with coronary heart disease (CHD). Methods: It was a single-center, randomized, non-blind, and parallel-controlled study. We recruited patients aged 18 years or older with coronary heart disease from a tertiary hospital in Jilin Province, China. The intervention group received a 3-month smartwatch-based HBCR program involving remotely delivered real-time feedback, supervision, and education. The control group received conventional HBCR. Adherence is the primary outcome of the trial, assessed by the Home-Based Cardiac Rehabilitation Exercise Adherence Scale. The secondary outcomes include cardiopulmonary function, measured by cardiopulmonary exercise testing, anxiety (General Anxiety Disorder-7), depression(Patient Health Questionnaire-9), and quality of life (36-Item Short Form Health Survey) at 3 months. Results: Between January 1, 2023, and December 30, 2023, 62 patients (mean age 59.93±10.06 years), of whom 33.3% were female and 66.% were male, were recruited and subsequently randomly assigned to the smartwatch group (n=32) or control group (n=30). No difference was detected in the baseline characteristics between the two groups. After the intervention, the subjects in the smartwatch group performed significantly better in peak VO2, home-based cardiac rehabilitation adherence, GAD-7, PHQ-9, and some other parameters than those in the control group. Conclusions: This feasibility study showed that the smartwatch device was well-accepted and effective in supporting a home-based cardiac rehabilitation model for patients with coronary heart disease (CHD). Clinical Trial: ChiCTR2400088039; https://www.chictr.org.cn/bin/project/edit?pid=215602

  • Medico-economic Evaluation of a Telehealth Platform for Elective Outpatient Surgeries: A Randomized Controlled Trial

    Date Submitted: Apr 29, 2025

    Open Peer Review Period: Apr 30, 2025 - Jun 25, 2025

    Background: The increasing prevalence of ambulatory surgeries has highlighted the need for effective postoperative follow-up. While telemedicine represents a promising option for perioperative support...

    Background: The increasing prevalence of ambulatory surgeries has highlighted the need for effective postoperative follow-up. While telemedicine represents a promising option for perioperative support and postoperative monitoring, evidence of its actual benefits remains limited. Objective: To evaluate the medico-economic impact of a personalized telemedicine platform for postoperative follow-up in day-surgery patients in terms of cost-effectiveness and cost-utility. Methods: Design and Setting: This single-blinded with two-group randomized controlled trial was conducted at the Centre hospitalier de l’Université de Montréal (CHUM) from August 2022 to September 2023. Participants: Adults undergoing elective day surgery were randomized into two groups: the intervention group, which received postoperative follow-up via the LeoMed® telemedicine platform, and the control group, which received standard care. The study adhered to ethical standards and was registered with ClinicalTrials.gov (NCT04948632). Intervention: The intervention group used a personalized telehealth platform offering preoperative education, psychological support, and postoperative monitoring through daily follow-up forms sent to patients’ smartphones. Alerts generated by patient responses were reviewed by CHUM’s telehealth support unit. Main Outcomes and Measures: The primary outcome was unanticipated healthcare utilization, including emergency visits, readmissions, and medical consultations within 30 days post-procedure. Secondary outcomes included gained quality-adjusted life years (QALY), patient satisfaction, healthcare costs, and greenhouse gas emissions. Results: Of 1,411 patients screened, 1,214 were randomized, with 436 in the intervention group and 445 in the control group analyzed. No significant differences in unanticipated healthcare utilization or costs were observed. The intervention group demonstrated a statistically significant QALY gain at postoperative day 14 (0.002, p = 0.013), but the difference was no longer significant at day 30 (0.001, p = 0.143). However, patient satisfaction was significantly higher in the intervention group at both days 14 (p = 0.018) and 30 (p < 0.001). Conclusions: This trial demonstrates the potential of telemedicine platforms to enhance postoperative care in ambulatory surgery settings. While no significant reductions in healthcare utilization were observed, the intervention improved QALYs and patient satisfaction, suggesting potential cost-utility benefits. Larger trials are needed to confirm these findings and explore the impact on long-term recovery and healthcare savings. Clinical Trial: ClinicalTrials.gov Identifier: NCT04948632

  • Costs and Cost Effectiveness of an Enhanced Web-Based Physical Activity Intervention for Latinas: 12- and 24-Month Findings from Pasos Hacia La Salud II

    Date Submitted: Apr 29, 2025

    Open Peer Review Period: Apr 30, 2025 - Jun 25, 2025

    Background: Increasing adherence to physical activity (PA) guidelines could prevent chronic disease morbidity and mortality, save considerable healthcare costs, and reduce health disparities. We previ...

    Background: Increasing adherence to physical activity (PA) guidelines could prevent chronic disease morbidity and mortality, save considerable healthcare costs, and reduce health disparities. We previously established the efficacy and cost-effectiveness of a web-based PA intervention for Latina women, which increased PA but few participants met PA guidelines and long-term maintenance was not examined. A new version with enhanced intervention features was found to outperform the original intervention in long-term guideline adherence. Objective: to determine the costs and cost-effectiveness of the enhanced multi-technology PA intervention vs. the original web-based intervention in increasing minutes of activity and adherence to guidelines Methods: Latina adults (N=195) were randomly assigned to receive a Spanish language individually tailored web-based PA intervention (Original), or the same intervention additional phone calls and interactive text messaging (Enhanced). PA was measured at baseline, 12 months (end of active intervention), and 24 months (end of tapered maintenance) using self-report (7-Day Physical Activity Recall Interview) and ActiGraph accelerometers. Costs were estimated from a payer perspective and included all features needed to deliver the intervention, including staff, materials, and technology. Cost effectiveness was calculated as the cost per additional minute of PA added over the intervention, and the incremental cost effectiveness ratios of each additional person meeting guidelines. Results: at 12 months, the costs of delivering the interventions were $16/person/month and $13/person/month in the Enhanced and Original arms, respectively. These costs fell to $14 and $8 at 24 months. At 12 months, each additional minute of self-reported activity in the Enhanced group cost $0.09 vs. $0.11 in Original ($0.19 vs. $0.16 for ActiGraph), with incremental costs of $0.05 per additional minute in Enhanced beyond Original. At the end of maintenance (24 months), costs per additional minute fell to $0.06 and $0.05 ($0.12 vs. $0.10 for ActiGraph), with incremental costs of $0.08 per additional minute in Enhanced ($0.20 for ActiGraph). Costs of meeting PA guidelines at 12 months were $705 vs. $503 in Enhanced vs. Original, and increased to $812 and $601 at 24 months. The ICER for meeting guidelines at 24 months was $1837 (95% CI $730.89-$2673.89) per additional person in the Enhanced vs. Original arm. Conclusions: As expected, the Enhanced intervention was more expensive, but yielded better long-term maintenance of activity. Both conditions were low costs relative to other medical interventions. The Enhanced intervention may be preferable in high risk populations, where more investment in meeting guidelines could yield more cost savings. Clinical Trial: NCT03491592

  • Leveraging Large Language Models for Enhanced Quality Assessment of Nutrition and Health Dashboards

    Date Submitted: Apr 29, 2025

    Open Peer Review Period: Apr 30, 2025 - Jun 25, 2025

    Background: Data dashboards have become an essential tool in food and nutrition surveillance, enabling integration, visualization, and dissemination of multi-faceted data. While dashboards enhance dat...

    Background: Data dashboards have become an essential tool in food and nutrition surveillance, enabling integration, visualization, and dissemination of multi-faceted data. While dashboards enhance data accessibility and decision-making, inconsistencies in data quality, standardization, and responsible conduct, along with misleading visualizations, limit their reliability and usefulness. Large Language Models (LLMs) could offer opportunities to assist in dashboard evaluation, yet their effectiveness compared to human expert assessments remains uncertain. Objective: In this study, we evaluate the potential of LLMs in assessing nutrition and health dashboards by comparing ChatGPT-generated evaluations with expert reviews. We examine the alignment and discrepancies in scoring across dashboards and key dashboard quality indicators. Methods: We developed a structured evaluation framework based on the 4E principles—Evidence, Efficiency, Emphasis, and Ethics—comprising 45 criteria. Seven publicly available nutrition and health dashboards were selected for evaluation. ChatGPT-4o was prompted to assess dashboards using extracted textual and visual content, generating scores and justifications for each criterion. Results were compared to previously published expert evaluations, analyzing ranking consistency and differences in dashboards and indicator-specific criteria. Results: ChatGPT-4o successfully generated scores and justifications in accordance with the instructions provided in the prompt we designed. ChatGPT-4o rankings were well aligned with expert evaluations for dashboards included in our study (Spearman correlation = 0.79). When comparing average scores across all dashboards for specific evaluation criteria, granularity and completeness had high consistency, with both AI and human experts assigning relatively lower scores. However, ChatGPT assigned lower scores for standardization and higher scores for responsible conduct compared to human experts. Conclusions: ChatGPT-4o could offer structured, scalable dashboard evaluations, showing reasonable alignment with expert assessments, especially in objective criteria like readability and accessibility. However, inconsistencies in assessing standardization, platform capability, and ethical considerations reveal limitations in contextual reasoning. While LLMs can enhance evaluation efficiency, expert oversight remains essential for accuracy and depth. Future research should explore diverse dashboards, compare multiple LLM models, and integrate multimodal capabilities to better assess interactivity and visualization integrity.

  • Addressing the implementation gap in digital health adoption: A systems engineering perspective

    Date Submitted: Apr 29, 2025

    Open Peer Review Period: Apr 30, 2025 - Jun 25, 2025

    In the NHS, as in other health systems, it is generally agreed that difficulties in achieving digital transformation lie less in problems with the technical (hardware and software) aspects of digital...

    In the NHS, as in other health systems, it is generally agreed that difficulties in achieving digital transformation lie less in problems with the technical (hardware and software) aspects of digital solutions than the “soft” system issues relating to institutional context, organisational complexity and what are broadly described as “human factors”. A range of approaches have been explored within digital health research to better understand and address the complex series of factors that have given rise to the implementation gap. Focusing on the need to deploy digital health technologies to support the “shift left” (from hospital to community, sickness to prevention, analogue to digital) agenda, this paper explores how a systems engineering approach could provide the cross-disciplinary, holistic framework that is required to address what could be described as a very messy problem. Our framework combines methods such as Digital Twins to simulate complex care pathways with Living Labs that enable interdisciplinary collaboration, co-design, and iterative pilot testing. When combined, these methods could help align interests, integrate end-user needs, embed design for successful implementation and iteratively adapt and improve digital health technologies, as well as offering an evaluation strategy that emphasizes safety, effectiveness and cost-efficiency.