TY - JOUR AU - Schmieding, L. Malte AU - Kopka, Marvin AU - Bolanaki, Myrto AU - Napierala, Hendrik AU - Altendorf, B. Maria AU - Kuschick, Doreen AU - Piper, K. Sophie AU - Scatturin, Lennart AU - Schmidt, Konrad AU - Schorr, Claudia AU - Thissen, Alica AU - Wäscher, Cornelia AU - Heintze, Christoph AU - Möckel, Martin AU - Balzer, Felix AU - Slagman, Anna PY - 2025/4/2 TI - Impact of a Symptom Checker App on Patient-Physician Interaction Among Self-Referred Walk-In Patients in the Emergency Department: Multicenter, Parallel-Group, Randomized, Controlled Trial JO - J Med Internet Res SP - e64028 VL - 27 KW - digital health KW - triage KW - symptom checker KW - patient-centered care KW - eHealth apps KW - mobile phone KW - decision support systems KW - consumer health information KW - health literacy KW - randomized controlled trials KW - null results KW - emergency care KW - patient-physician-interaction KW - patient satisfaction N2 - Background: Symptom checker apps (SCAs) are layperson-facing tools that advise on whether and where to seek care, or possible diagnoses. Previous research has primarily focused on evaluating the accuracy, safety, and usability of their recommendations. However, studies examining SCAs? impact on clinical care, including the patient-physician interaction and satisfaction with care, remain scarce. Objective: This study aims to evaluate the effects of an SCA on satisfaction with the patient-physician interaction in acute care settings. Additionally, we examined its influence on patients? anxiety and trust in the treating physician. Methods: This parallel-group, randomized controlled trial was conducted at 2 emergency departments of an academic medical center and an emergency practice in Berlin, Germany. Low-acuity patients seeking care at these sites were randomly assigned to either self-assess their health complaints using a widely available commercial SCA (Ada Health) before their first encounter with the treating physician or receive usual care. The primary endpoint was patients? satisfaction with the patient-physician interaction, measured by the Patient Satisfaction Questionnaire (PSQ). The secondary outcomes were patients? satisfaction with care, their anxiety levels, and physicians? satisfaction with the patient-physician interaction. We used linear mixed models to assess the statistical significance of primary and secondary outcomes. Exploratory descriptive analyses examined patients? and physicians? perceptions of the SCA?s utility and the frequency of patients questioning their physician?s authority. Results: Between April 11, 2022, and January 25, 2023, we approached 665 patients. A total of 363 patients were included in the intention-to-treat analysis of the primary outcome (intervention: n=173, control: n=190). PSQ scores in the intervention group were similar to those in the control group (mean 78.5, SD 20.0 vs mean 80.8, SD 19.6; estimated difference ?2.4, 95% CI ?6.3 to 1.1, P=.24). Secondary outcomes, including patients? and physicians? satisfaction with care and patient anxiety, showed no significant group differences (all P>.05). Patients in the intervention group were more likely to report that the SCA had a beneficial (66/164, 40.2%) rather than a detrimental (3/164, 1.8%) impact on the patient-physician interaction, with most reporting no effect (95/164, 57.9%). Similar patterns were observed regarding the SCA?s perceived effect on care. In both groups, physicians rarely reported that their authority had been questioned by a patient (intervention: 2/188, 1.1%; control: 4/184, 2.2%). While physicians more often found the SCA helpful rather than unhelpful, the majority indicated it was neither helpful nor unhelpful for the encounter. Conclusions: We found no evidence that the SCA improved satisfaction with the patient-physician interaction or care in an acute care setting. By contrast, both patients and their treating physicians predominantly described the SCA?s impact as beneficial. Our study did not identify negative effects of SCA use commonly reported in the literature, such as increased anxiety or diminished trust in health care professionals. Trial Registration: German Clinical Trial Register DRKS00028598; https://drks.de/search/en/trial/DRKS00028598/entails International Registered Report Identifier (IRRID): RR2-10.1186/s13063-022-06688-w UR - https://www.jmir.org/2025/1/e64028 UR - http://dx.doi.org/10.2196/64028 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/64028 ER - TY - JOUR AU - Koch, Roland AU - Steffen, Marie-Theres AU - Wetzel, Anna-Jasmin AU - Preiser, Christine AU - Klemmt, Malte AU - Ehni, Hans-Jörg AU - Mueller, Regina AU - Joos, Stefanie PY - 2025/3/21 TI - Exploring Laypersons? Experiences With a Mobile Symptom Checker App as an Interface Between eHealth Literacy, Health Literacy, and Health-Related Behavior: Qualitative Interview Study JO - JMIR Form Res SP - e60647 VL - 9 KW - symptom checker apps KW - health literacy KW - eHealth literacy KW - qualitative research KW - interview study KW - artificial intelligence KW - AI N2 - Background: Symptom checkers aim to help users recognize medical symptoms and recommend actions. However, they are not yet reliable for self-triage or diagnostics. Health literacy plays a role in their use, but the process from symptom recognition to health care consultation remains unclear. Objective: This qualitative observatory study explored how laypersons use symptom checkers, focusing on the process of use, entry points and outcomes, and the role of health literacy. Laypersons are defined as individuals who are neither medical professionals nor developers of such apps. Three research questions were addressed: (1) How do such users describe the process of using symptom checkers? (2) What are entry points and possible outcomes of symptom checker app use? (3) How are health literacy and eHealth literacy expressed during the use of symptom checker apps? Methods: As part of the Ethical, Legal, and Social Implications of Symptom Checker Apps in Primary Health Care project, 15 laypersons (n=9, 60% female and n=6, 40% male; mean age 30.7, SD 13.6 years) were interviewed about their experiences with the symptom checker Ada. The interviews were analyzed using an integrative approach combining social positioning, agency, and the Rubicon model as a heuristic framework. Results: App use follows a cyclic process comprising 4 steps: motivation (influenced by biography and context), intention formation (assigning a purpose), intention implementation (recruiting resources), and evaluation (transforming interactions into health-related insights). Biographical, social, and contextual factors shape process initiation. Users use symptom checkers for 3 main purposes: understanding their condition, receiving recommendations for action, and documenting or communicating health-related information. Each purpose requires specific planning and integration into health-related behaviors drawing on personal, social, and technological resources. Evaluation depends on contextual factors, app outputs, and the outcomes of users? health-related actions. Users assess whether the app aligns with their expectations, condition severity, and previous experiences, with health literacy playing a critical role in validation processes. Conclusions: Symptom checker use is a complex, cyclic process shaped by context, biography, and health literacy. Users are motivated by health concerns influenced by personal, social, and contextual factors, with trust and attitudes impacting initial engagement. Intention formation reflects a balance between user skills and context, where app outputs inform decisions but may not always lead to action, especially in ambiguous situations. Users rely on personal resources and social networks to integrate app use into health-related behaviors, highlighting the limitations of symptom checkers in providing social or empathetic support. Symptom checkers have the potential to serve as an interface between users and health care, but future development must address the complexity of their use to unlock this potential. International Registered Report Identifier (IRRID): RR2-10.2196/34026 UR - https://formative.jmir.org/2025/1/e60647 UR - http://dx.doi.org/10.2196/60647 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/60647 ER - TY - JOUR AU - Wickham, P. Aidan AU - Hewings-Martin, Yella AU - Goddard, GB Frederick AU - Rodgers, K. Allison AU - Cunningham, C. Adam AU - Prentice, Carley AU - Wilks, Octavia AU - Kaplan, C. Yusuf AU - Marhol, Andrei AU - Meczner, András AU - Stsefanovich, Heorhi AU - Klepchukova, Anna AU - Zhaunova, Liudmila PY - 2024/12/12 TI - Exploring Self-Reported Symptoms for Developing and Evaluating Digital Symptom Checkers for Polycystic Ovarian Syndrome, Endometriosis, and Uterine Fibroids: Exploratory Survey Study JO - JMIR Form Res SP - e65469 VL - 8 KW - polycystic ovary syndrome KW - PCOS KW - self-assessment KW - self-reported KW - endometriosis KW - uterine fibroids KW - symptoms KW - digital symptom checker KW - women's health KW - gynecological conditions KW - reproductive health N2 - Background: Reproductive health conditions such as polycystic ovary syndrome (PCOS), endometriosis, and uterine fibroids pose a significant burden to people who menstruate, health care systems, and economies. Despite clinical guidelines for each condition, prolonged delays in diagnosis are commonplace, resulting in an increase to health care costs and risk of health complications. Symptom checker apps have the potential to significantly reduce time to diagnosis by providing users with health information and tools to better understand their symptoms. Objective: This study aims to study the prevalence and predictive importance of self-reported symptoms of PCOS, endometriosis, and uterine fibroids, and to explore the efficacy of 3 symptom checkers (developed by Flo Health UK Limited) that use self-reported symptoms when screening for each condition. Methods: Flo?s symptom checkers were transcribed into separate web-based surveys for PCOS, endometriosis, and uterine fibroids, asking respondents their diagnostic history for each condition. Participants were aged 18 years or older, female, and living in the United States. Participants either had a confirmed diagnosis (condition-positive) and reported symptoms retrospectively as experienced at the time of diagnosis, or they had not been examined for the condition (condition-negative) and reported their current symptoms as experienced at the time of surveying. Symptom prevalence was calculated for each condition based on the surveys. Least absolute shrinkage and selection operator regression was used to identify key symptoms for predicting each condition. Participants? symptoms were processed by Flo?s 3 single-condition symptom checkers, and accuracy was assessed by comparing the symptom checker output with the participant?s condition designation. Results: A total of 1317 participants were included with 418, 476, and 423 in the PCOS, endometriosis, and uterine fibroids groups, respectively. The most prevalent symptoms for PCOS were fatigue (92%), feeling anxious (87%), BMI over 25 (84%); for endometriosis: very regular lower abdominal pain (89%), fatigue (85%), and referred lower back pain (80%); for uterine fibroids: fatigue (76%), bloating (69%), and changing sanitary protection often (68%). Symptoms of anovulation and amenorrhea (long periods, irregular cycles, and absent periods), and hyperandrogenism (excess hair on chin and abdomen, scalp hair loss, and BMI over 25) were identified as the most predictive symptoms for PCOS, while symptoms related to abdominal pain and the effect pain has on life, bleeding, and fertility complications were among the most predictive symptoms for both endometriosis and uterine fibroids. Symptom checker accuracy was 78%, 73%, and 75% for PCOS, endometriosis, and uterine fibroids, respectively. Conclusions: This exploratory study characterizes self-reported symptomatology and identifies the key predictive symptoms for 3 reproductive conditions. The Flo symptom checkers were evaluated using real, self-reported symptoms and demonstrated high levels of accuracy. UR - https://formative.jmir.org/2024/1/e65469 UR - http://dx.doi.org/10.2196/65469 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/65469 ER - TY - JOUR AU - Wetzel, Anna-Jasmin AU - Preiser, Christine AU - Müller, Regina AU - Joos, Stefanie AU - Koch, Roland AU - Henking, Tanja AU - Haumann, Hannah PY - 2024/12/9 TI - Unveiling Usage Patterns and Explaining Usage of Symptom Checker Apps: Explorative Longitudinal Mixed Methods Study JO - J Med Internet Res SP - e55161 VL - 26 KW - self-triage KW - eHealth KW - self-diagnosis KW - mHealth KW - mobile health KW - usage KW - patterns KW - predicts KW - prediction KW - symptoms checker KW - apps KW - applications KW - explorative longitudinal study KW - self care KW - self management KW - self-rated KW - mixed method KW - circumstances KW - General Linear Mixed Models KW - GLMM KW - qualitative data KW - content analysis KW - Kuckartz KW - survey KW - participants KW - users N2 - Background: Symptom checker apps (SCA) aim to enable individuals without medical training to classify perceived symptoms and receive guidance on appropriate actions, such as self-care or seeking professional medical attention. However, there is a lack of detailed understanding regarding the contexts in which individuals use SCA and their opinions on these tools. Objective: This mixed methods study aims to explore the circumstances under which medical laypeople use SCA and to identify which aspects users find noteworthy after using SCA. Methods: A total of 48 SCA users documented their medical symptoms, provided open-ended responses, and recorded their SCA use along with other variables over 6 weeks in a longitudinal study. Generalized linear mixed models with and those without regularization were applied to consider the hierarchical structure of the data, and the models? outcomes were evaluated for comparison. Qualitative data were analyzed through Kuckartz qualitative content analysis. Results: Significant predictors of SCA use included the initial occurrence of symptoms, day of measurement (odds ratio [OR] 0.97), self-rated health (OR 0.80, P<.001), and the following International Classification in Primary Care-2?classified symptoms, that are general and unspecified (OR 3.33, P<.001), eye (OR 5.56, P=.001), cardiovascular (OR 8.33, P<.001), musculoskeletal (OR 5.26, P<.001), and skin (OR 4.76, P<.001). The day of measurement and self-rated health showed minor importance due to their small effect sizes. Qualitative analysis highlighted four main themes: (1) reasons for using SCA, (2) diverse affective responses, (3) a broad spectrum of behavioral reactions, and (4) unmet needs including a lack of personalization. Conclusions: The emergence of new and unfamiliar symptoms was a strong determinant for SCA use. Specific International Classification in Primary Care?rated symptom clusters, particularly those related to cardiovascular, eye, skin, general, and unspecified symptoms, were also highly predictive of SCA use. The varied applications of SCA fit into the concept of health literacy as bricolage, where SCA is leveraged as flexible tools by patients based on individual and situational requirements, functioning alongside other health care resources. UR - https://www.jmir.org/2024/1/e55161 UR - http://dx.doi.org/10.2196/55161 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/55161 ER - TY - JOUR AU - Preiser, Christine AU - Radionova, Natalia AU - Ög, Eylem AU - Koch, Roland AU - Klemmt, Malte AU - Müller, Regina AU - Ranisch, Robert AU - Joos, Stefanie AU - Rieger, A. Monika PY - 2024/11/18 TI - The Doctors, Their Patients, and the Symptom Checker App: Qualitative Interview Study With General Practitioners in Germany JO - JMIR Hum Factors SP - e57360 VL - 11 KW - symptom checker app KW - qualitative interviews KW - general practice KW - perceived work-related psychosocial stress KW - job satisfaction KW - professional identity KW - medical diagnosis N2 - Background: Symptom checkers are designed for laypeople and promise to provide a preliminary diagnosis, a sense of urgency, and a suggested course of action. Objective: We used the international symptom checker app (SCA) Ada App as an example to answer the following question: How do general practitioners (GPs) experience the SCA in relation to the macro, meso, and micro level of their daily work, and how does this interact with work-related psychosocial resources and demands? Methods: We conducted 8 semistructured interviews with GPs in Germany between December 2020 and February 2022. We analyzed the data using the integrative basic method, an interpretative-reconstructive method, to identify core themes and modes of thematization. Results: Although most GPs in this study were open to digitization in health care and their practice, only one was familiar with the SCA. GPs considered the SCA as part of the ?unorganized stage? of patients? searching about their conditions. Some preferred it to popular search engines. They considered it relevant to their work as soon as the SCA would influence patients? decisions to see a doctor. Some wanted to see the results of the SCA in advance in order to decide on the patient?s next steps. GPs described the diagnostic process as guided by shared decision-making, with the GP taking the lead and the patient deciding. They saw diagnosis as an act of making sense of data, which the SCA would not be able to do, despite the huge amounts of data. Conclusions: GPs took a techno-pragmatic view of SCA. They operate in a health care system of increasing scarcity. They saw the SCA as a potential work-related resource if it helped them to reduce administrative tasks and unnecessary patient contacts. The SCA was seen as a potential work-related demand if it increased workload, for example, if it increased patients? anxiety, was too risk-averse, or made patients more insistent on their own opinions. UR - https://humanfactors.jmir.org/2024/1/e57360 UR - http://dx.doi.org/10.2196/57360 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/57360 ER - TY - JOUR AU - King, Jean Alicia AU - Bilardi, Elissa Jade AU - Towns, Mary Janet AU - Maddaford, Kate AU - Fairley, Kincaid Christopher AU - Chow, F. Eric P. AU - Phillips, Renee Tiffany PY - 2024/11/4 TI - User Views on Online Sexual Health Symptom Checker Tool: Qualitative Research JO - JMIR Form Res SP - e54565 VL - 8 KW - sexual health KW - sexually transmitted diseases KW - risk assessment KW - risk factors KW - smartphone apps KW - help-seeking behavior KW - health literacy KW - information seeking behavior N2 - Background: Delayed diagnosis and treatment of sexually transmitted infections (STIs) contributes to poorer health outcomes and onward transmission to sexual partners. Access to best-practice sexual health care may be limited by barriers such as cost, distance to care providers, sexual stigma, and trust in health care providers. Online assessments of risk offer a novel means of supporting access to evidence-based sexual health information, testing, and treatment by providing more individualized sexual health information based on user inputs. Objective: This developmental evaluation aims to find potential users? views and experiences in relation to an online assessment of risk, called iSpySTI (Melbourne Sexual Health Center), including the likely impacts of use. Methods: Individuals presenting with urogenital symptoms to a specialist sexual health clinic were given the opportunity to trial a web-based, Bayesian-powered tool that provides a list of 2 to 4 potential causes of their symptoms based on inputs of known STI risk factors and symptoms. Those who tried the tool were invited to participate in a once-off, semistructured research interview. Descriptive, action, and emotion coding informed the comparative analysis of individual cases. Results: Findings from interviews with 14 people who had used the iSpySTI tool support the superiority of the online assessment of STI risk compared to existing sources of sexual health information (eg, internet search engines) in providing trusted and probabilistic information to users. Additionally, potential users reported benefits to their emotional well-being in the intervening period between noticing symptoms and being able to access care. Differences in current and imagined urgency of health care seeking and emotional impacts were found based on clinical diagnosis (eg, non-STI, curable and incurable but treatable STIs) and whether participants were born in Australia or elsewhere. Conclusions: Online assessments of risk provide users experiencing urogenital symptoms with more individualized and evidence-based health information that can improve their health care?seeking and provide reassurance in the period before they can access care. UR - https://formative.jmir.org/2024/1/e54565 UR - http://dx.doi.org/10.2196/54565 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/54565 ER - TY - JOUR AU - Zhu, Siying AU - Dong, Yan AU - Li, Yumei AU - Wang, Hong AU - Jiang, Xue AU - Guo, Mingen AU - Fan, Tiantian AU - Song, Yalan AU - Zhou, Ying AU - Han, Yuan PY - 2024/10/28 TI - Experiences of Patients With Cancer Using Electronic Symptom Management Systems: Qualitative Systematic Review and Meta-Synthesis JO - J Med Internet Res SP - e59061 VL - 26 KW - electronic symptom management systems KW - oncology care KW - access to care KW - symptom monitoring KW - self-management KW - patient-reported outcomes KW - health-related outcomes KW - quality of life N2 - Background: There are numerous symptoms related to cancer and its treatments that can affect the psychosomatic health and quality of life of patients with cancer. The use of electronic symptom management systems (ESMSs) can help patients with cancer monitor and manage their symptoms effectively, improving their health-related outcomes. However, patients? adhesion to ESMSs decreases over time, and little is known about their real experiences with them. Therefore, it is necessary to gain a deep understanding of patients? experiences with ESMSs. Objective: The purpose of this systematic review was to synthesize qualitative studies on the experiences of patients with cancer using ESMSs. Methods: A total of 12 electronic databases, including PubMed, Web of Science, Cochrane Library, EBSCOhost, Embase, PsycINFO, ProQuest, Scopus, Wanfang database, CNKI, CBM, and VIP, were searched to collect relevant studies from the earliest available record until January 2, 2024. Qualitative and mixed methods studies published in English or Chinese were included. The PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses statement checklist) and the ENTREQ (Enhancing Transparency in Reporting the Synthesis of Qualitative Research) statement were used to improve transparency in reporting the synthesis of the qualitative research. The Critical Appraisal Skills Program (CASP) checklist was used to appraise the methodological quality of the included studies, and a meta-synthesis was conducted to interpret and synthesize the findings. Results: A total of 21 studies were included in the meta-synthesis. The experiences of patients with cancer using ESMSs were summarized into three major categories: (1) perceptions and attitudes toward ESMSs; (2) the value of ESMSs; and (3) barriers, requirements, and suggestions for ESMSs. Subsequently, 10 subcategories emerged from the 3 major categories. The meta-synthesis revealed that patients with cancer had both positive and negative experiences with ESMSs. In general, patients recognized the value of ESMSs in symptom assessment and management and were willing to use them, but they still encountered barriers and wanted them to be improved. Conclusions: This systematic review provides implications for developing future ESMSs that improve health-related outcomes for patients with cancer. Future research should focus on strengthening electronic equipment and technical support for ESMSs, improving their functional contents and participation forms, and developing personalized applications tailored to the specific needs and characteristics of patients with cancer. Trial Registration: PROSPERO CRD42023421730; https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=421730 UR - https://www.jmir.org/2024/1/e59061 UR - http://dx.doi.org/10.2196/59061 UR - http://www.ncbi.nlm.nih.gov/pubmed/39466301 ID - info:doi/10.2196/59061 ER - TY - JOUR AU - Liu, Ville AU - Kaila, Minna AU - Koskela, Tuomas PY - 2024/9/26 TI - Triage Accuracy and the Safety of User-Initiated Symptom Assessment With an Electronic Symptom Checker in a Real-Life Setting: Instrument Validation Study JO - JMIR Hum Factors SP - e55099 VL - 11 KW - nurse triage KW - emergency department triage KW - triage KW - symptom assessment KW - health services accessibility KW - telemedicine KW - eHealth KW - remote consultation KW - primary health care KW - primary care KW - urgent care KW - health services research KW - health services N2 - Background: Previous studies have evaluated the accuracy of the diagnostics of electronic symptom checkers (ESCs) and triage using clinical case vignettes. National Omaolo digital services (Omaolo) in Finland consist of an ESC for various symptoms. Omaolo is a medical device with a Conformité Européenne marking (risk class: IIa), based on Duodecim Clinical Decision Support, EBMEDS. Objective: This study investigates how well triage performed by the ESC nurse triage within the chief symptom list available in Omaolo (anal region symptoms, cough, diarrhea, discharge from the eye or watery or reddish eye, headache, heartburn, knee symptom or injury, lower back pain or injury, oral health, painful or blocked ear, respiratory tract infection, sexually transmitted disease, shoulder pain or stiffness or injury, sore throat or throat symptom, and urinary tract infection). In addition, the accuracy, specificity, sensitivity, and safety of the Omaolo ESC were assessed. Methods: This is a clinical validation study in a real-life setting performed at multiple primary health care (PHC) centers across Finland. The included units were of the walk-in model of primary care, where no previous phone call or contact was required. Upon arriving at the PHC center, users (patients) answered the ESC questions and received a triage recommendation; a nurse then assessed their triage. Findings on 877 patients were analyzed by matching the ESC recommendations with triage by the triage nurse. Results: Safe assessments by the ESC accounted for 97.6% (856/877; 95% CI 95.6%-98.0%) of all assessments made. The mean of the exact match for all symptom assessments was 53.7% (471/877; 95% CI 49.2%-55.9%). The mean value of the exact match or overly conservative but suitable for all (ESC?s assessment was 1 triage level higher than the nurse?s triage) symptom assessments was 66.6% (584/877; 95% CI 63.4%-69.7%). When the nurse concluded that urgent treatment was needed, the ESC?s exactly matched accuracy was 70.9% (244/344; 95% CI 65.8%-75.7%). Sensitivity for the Omaolo ESC was 62.6% and specificity of 69.2%. A total of 21 critical assessments were identified for further analysis: there was no indication of compromised patient safety. Conclusions: The primary objectives of this study were to evaluate the safety and to explore the accuracy, specificity, and sensitivity of the Omaolo ESC. The results indicate that the ESC is safe in a real-life setting when appraised with assessments conducted by triage nurses. Furthermore, the Omaolo ESC exhibits the potential to guide patients to appropriate triage destinations effectively, helping them to receive timely and suitable care. International Registered Report Identifier (IRRID): RR2-10.2196/41423 UR - https://humanfactors.jmir.org/2024/1/e55099 UR - http://dx.doi.org/10.2196/55099 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/55099 ER - TY - JOUR AU - Knitza, Johannes AU - Hasanaj, Ragip AU - Beyer, Jonathan AU - Ganzer, Franziska AU - Slagman, Anna AU - Bolanaki, Myrto AU - Napierala, Hendrik AU - Schmieding, L. Malte AU - Al-Zaher, Nizam AU - Orlemann, Till AU - Muehlensiepen, Felix AU - Greenfield, Julia AU - Vuillerme, Nicolas AU - Kuhn, Sebastian AU - Schett, Georg AU - Achenbach, Stephan AU - Dechant, Katharina PY - 2024/8/20 TI - Comparison of Two Symptom Checkers (Ada and Symptoma) in the Emergency Department: Randomized, Crossover, Head-to-Head, Double-Blinded Study JO - J Med Internet Res SP - e56514 VL - 26 KW - symptom checker KW - triage KW - emergency KW - eHealth KW - diagnostic accuracy KW - apps, health service research KW - decision support system N2 - Background: Emergency departments (EDs) are frequently overcrowded and increasingly used by nonurgent patients. Symptom checkers (SCs) offer on-demand access to disease suggestions and recommended actions, potentially improving overall patient flow. Contrary to the increasing use of SCs, there is a lack of supporting evidence based on direct patient use. Objective: This study aimed to compare the diagnostic accuracy, safety, usability, and acceptance of 2 SCs, Ada and Symptoma. Methods: A randomized, crossover, head-to-head, double-blinded study including consecutive adult patients presenting to the ED at University Hospital Erlangen. Patients completed both SCs, Ada and Symptoma. The primary outcome was the diagnostic accuracy of SCs. In total, 6 blinded independent expert raters classified diagnostic concordance of SC suggestions with the final discharge diagnosis as (1) identical, (2) plausible, or (3) diagnostically different. SC suggestions per patient were additionally classified as safe or potentially life-threatening, and the concordance of Ada?s and physician-based triage category was assessed. Secondary outcomes were SC usability (5-point Likert-scale: 1=very easy to use to 5=very difficult to use) and SC acceptance net promoter score (NPS). Results: A total of 450 patients completed the study between April and November 2021. The most common chief complaint was chest pain (160/437, 37%). The identical diagnosis was ranked first (or within the top 5 diagnoses) by Ada and Symptoma in 14% (59/437; 27%, 117/437) and 4% (16/437; 13%, 55/437) of patients, respectively. An identical or plausible diagnosis was ranked first (or within the top 5 diagnoses) by Ada and Symptoma in 58% (253/437; 75%, 329/437) and 38% (164/437; 64%, 281/437) of patients, respectively. Ada and Symptoma did not suggest potentially life-threatening diagnoses in 13% (56/437) and 14% (61/437) of patients, respectively. Ada correctly triaged, undertriaged, and overtriaged 34% (149/437), 13% (58/437), and 53% (230/437) of patients, respectively. A total of 88% (385/437) and 78% (342/437) of participants rated Ada and Symptoma as very easy or easy to use, respectively. Ada?s NPS was ?34 (55% [239/437] detractors; 21% [93/437] promoters) and Symptoma?s NPS was ?47 (63% [275/437] detractors and 16% [70/437]) promoters. Conclusions: Ada demonstrated a higher diagnostic accuracy than Symptoma, and substantially more patients would recommend Ada and assessed Ada as easy to use. The high number of unrecognized potentially life-threatening diagnoses by both SCs and inappropriate triage advice by Ada was alarming. Overall, the trustworthiness of SC recommendations appears questionable. SC authorization should necessitate rigorous clinical evaluation studies to prevent misdiagnoses, fatal triage advice, and misuse of scarce medical resources. Trial Registration: German Register of Clinical Trials DRKS00024830; https://drks.de/search/en/trial/DRKS00024830 UR - https://www.jmir.org/2024/1/e56514 UR - http://dx.doi.org/10.2196/56514 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/56514 ER - TY - JOUR AU - Meczner, András AU - Cohen, Nathan AU - Qureshi, Aleem AU - Reza, Maria AU - Sutaria, Shailen AU - Blount, Emily AU - Bagyura, Zsolt AU - Malak, Tamer PY - 2024/5/31 TI - Controlling Inputter Variability in Vignette Studies Assessing Web-Based Symptom Checkers: Evaluation of Current Practice and Recommendations for Isolated Accuracy Metrics JO - JMIR Form Res SP - e49907 VL - 8 KW - symptom checker KW - accuracy KW - vignette studies KW - variability KW - methods KW - triage KW - evaluation KW - vignette KW - performance KW - metrics KW - mobile phone N2 - Background: The rapid growth of web-based symptom checkers (SCs) is not matched by advances in quality assurance. Currently, there are no widely accepted criteria assessing SCs? performance. Vignette studies are widely used to evaluate SCs, measuring the accuracy of outcome. Accuracy behaves as a composite metric as it is affected by a number of individual SC- and tester-dependent factors. In contrast to clinical studies, vignette studies have a small number of testers. Hence, measuring accuracy alone in vignette studies may not provide a reliable assessment of performance due to tester variability. Objective: This study aims to investigate the impact of tester variability on the accuracy of outcome of SCs, using clinical vignettes. It further aims to investigate the feasibility of measuring isolated aspects of performance. Methods: Healthily?s SC was assessed using 114 vignettes by 3 groups of 3 testers who processed vignettes with different instructions: free interpretation of vignettes (free testers), specified chief complaints (partially free testers), and specified chief complaints with strict instruction for answering additional symptoms (restricted testers). ? statistics were calculated to assess agreement of top outcome condition and recommended triage. Crude and adjusted accuracy was measured against a gold standard. Adjusted accuracy was calculated using only results of consultations identical to the vignette, following a review and selection process. A feasibility study for assessing symptom comprehension of SCs was performed using different variations of 51 chief complaints across 3 SCs. Results: Intertester agreement of most likely condition and triage was, respectively, 0.49 and 0.51 for the free tester group, 0.66 and 0.66 for the partially free group, and 0.72 and 0.71 for the restricted group. For the restricted group, accuracy ranged from 43.9% to 57% for individual testers, averaging 50.6% (SD 5.35%). Adjusted accuracy was 56.1%. Assessing symptom comprehension was feasible for all 3 SCs. Comprehension scores ranged from 52.9% and 68%. Conclusions: We demonstrated that by improving standardization of the vignette testing process, there is a significant improvement in the agreement of outcome between testers. However, significant variability remained due to uncontrollable tester-dependent factors, reflected by varying outcome accuracy. Tester-dependent factors, combined with a small number of testers, limit the reliability and generalizability of outcome accuracy when used as a composite measure in vignette studies. Measuring and reporting different aspects of SC performance in isolation provides a more reliable assessment of SC performance. We developed an adjusted accuracy measure using a review and selection process to assess data algorithm quality. In addition, we demonstrated that symptom comprehension with different input methods can be feasibly compared. Future studies reporting accuracy need to apply vignette testing standardization and isolated metrics. UR - https://formative.jmir.org/2024/1/e49907 UR - http://dx.doi.org/10.2196/49907 UR - http://www.ncbi.nlm.nih.gov/pubmed/38820578 ID - info:doi/10.2196/49907 ER - TY - JOUR AU - Savolainen, Kaisa AU - Kujala, Sari PY - 2024/3/8 TI - Testing Two Online Symptom Checkers With Vulnerable Groups: Usability Study to Improve Cognitive Accessibility of eHealth Services JO - JMIR Hum Factors SP - e45275 VL - 11 KW - eHealth KW - online symptom checkers KW - usability KW - cognitive accessibility KW - web accessibility KW - qualitative research N2 - Background: The popularity of eHealth services has surged significantly, underscoring the importance of ensuring their usability and accessibility for users with diverse needs, characteristics, and capabilities. These services can pose cognitive demands, especially for individuals who are unwell, fatigued, or experiencing distress. Additionally, numerous potentially vulnerable groups, including older adults, are susceptible to digital exclusion and may encounter cognitive limitations related to perception, attention, memory, and language comprehension. Regrettably, many studies overlook the preferences and needs of user groups likely to encounter challenges associated with these cognitive aspects. Objective: This study primarily aims to gain a deeper understanding of cognitive accessibility in the practical context of eHealth services. Additionally, we aimed to identify the specific challenges that vulnerable groups encounter when using eHealth services and determine key considerations for testing these services with such groups. Methods: As a case study of eHealth services, we conducted qualitative usability testing on 2 online symptom checkers used in Finnish public primary care. A total of 13 participants from 3 distinct groups participated in the study: older adults, individuals with mild intellectual disabilities, and nonnative Finnish speakers. The primary research methods used were the thinking-aloud method, questionnaires, and semistructured interviews. Results: We found that potentially vulnerable groups encountered numerous issues with the tested services, with similar problems observed across all 3 groups. Specifically, clarity and the use of terminology posed significant challenges. The services overwhelmed users with excessive information and choices, while the terminology consisted of numerous complex medical terms that were difficult to understand. When conducting tests with vulnerable groups, it is crucial to carefully plan the sessions to avoid being overly lengthy, as these users often require more time to complete tasks. Additionally, testing with vulnerable groups proved to be quite efficient, with results likely to benefit a wider audience as well. Conclusions: Based on the findings of this study, it is evident that older adults, individuals with mild intellectual disability, and nonnative speakers may encounter cognitive challenges when using eHealth services, which can impede or slow down their use and make the services more difficult to navigate. In the worst-case scenario, these challenges may lead to errors in using the services. We recommend expanding the scope of testing to include a broader range of eHealth services with vulnerable groups, incorporating users with diverse characteristics and capabilities who are likely to encounter difficulties in cognitive accessibility. UR - https://humanfactors.jmir.org/2024/1/e45275 UR - http://dx.doi.org/10.2196/45275 UR - http://www.ncbi.nlm.nih.gov/pubmed/38457214 ID - info:doi/10.2196/45275 ER - TY - JOUR AU - Lown, Mark AU - Smith, A. Kirsten AU - Muller, Ingrid AU - Woods, Catherine AU - Maund, Emma AU - Rogers, Kirsty AU - Becque, Taeko AU - Hayward, Gail AU - Moore, Michael AU - Little, Paul AU - Glogowska, Margaret AU - Hay, Alastair AU - Stuart, Beth AU - Mantzourani, Efi AU - Wilcox, R. Christopher AU - Thompson, Natalie AU - Francis, A. Nick PY - 2023/12/8 TI - Internet Tool to Support Self-Assessment and Self-Swabbing of Sore Throat: Development and Feasibility Study JO - J Med Internet Res SP - e39791 VL - 25 KW - sore throat KW - ear, neck, throat KW - pharyngitis KW - self-assessment KW - self-swabbing KW - primary care KW - throat KW - development KW - feasibility KW - web-based tool KW - tool KW - antibiotics KW - develop KW - self-assess KW - symptoms KW - diagnostic testing KW - acceptability KW - adult KW - children KW - social media KW - saliva KW - swab KW - inflammation KW - samples KW - support KW - clinical KW - antibiotic KW - web-based support tool KW - think-aloud KW - neck KW - tonsil KW - tongue KW - teeth KW - dental KW - dentist KW - tooth KW - laboratory KW - lab KW - oral KW - oral health KW - mouth KW - mobile phone N2 - Background: Sore throat is a common problem and a common reason for the overuse of antibiotics. A web-based tool that helps people assess their sore throat, through the use of clinical prediction rules, taking throat swabs or saliva samples, and taking throat photographs, has the potential to improve self-management and help identify those who are the most and least likely to benefit from antibiotics. Objective: We aimed to develop a web-based tool to help patients and parents or carers self-assess sore throat symptoms and take throat photographs, swabs, and saliva samples for diagnostic testing. We then explored the acceptability and feasibility of using the tool in adults and children with sore throats. Methods: We used the Person-Based Approach to develop a web-based tool and then recruited adults and children with sore throats who participated in this study by attending general practices or through social media advertising. Participants self-assessed the presence of FeverPAIN and Centor score criteria and attempted to photograph their throat and take throat swabs and saliva tests. Study processes were observed via video call, and participants were interviewed about their views on using the web-based tool. Self-assessed throat inflammation and pus were compared to clinician evaluation of patients? throat photographs. Results: A total of 45 participants (33 adults and 12 children) were recruited. Of these, 35 (78%) and 32 (71%) participants completed all scoring elements for FeverPAIN and Centor scores, respectively, and most (30/45, 67%) of them reported finding self-assessment relatively easy. No valid response was provided for swollen lymph nodes, throat inflammation, and pus on the throat by 11 (24%), 9 (20%), and 13 (29%) participants respectively. A total of 18 (40%) participants provided a throat photograph of adequate quality for clinical assessment. Patient assessment of inflammation had a sensitivity of 100% (3/3) and specificity of 47% (7/15) compared with the clinician-assessed photographs. For pus on the throat, the sensitivity was 100% (3/3) and the specificity was 71% (10/14). A total of 89% (40/45), 93% (42/45), 89% (40/45), and 80% (30/45) of participants provided analyzable bacterial swabs, viral swabs, saliva sponges, and saliva drool samples, respectively. Participants were generally happy and confident in providing samples, with saliva samples rated as slightly more acceptable than swab samples. Conclusions: Most adult and parent participants were able to use a web-based intervention to assess the clinical features of throat infections and generate scores using clinical prediction rules. However, some had difficulties assessing clinical signs, such as lymph nodes, throat pus, and inflammation, and scores were assessed as sensitive but not specific. Many participants had problems taking photographs of adequate quality, but most were able to take throat swabs and saliva samples. UR - https://www.jmir.org/2023/1/e39791 UR - http://dx.doi.org/10.2196/39791 UR - http://www.ncbi.nlm.nih.gov/pubmed/38064265 ID - info:doi/10.2196/39791 ER - TY - JOUR AU - Kuroiwa, Tomoyuki AU - Sarcon, Aida AU - Ibara, Takuya AU - Yamada, Eriku AU - Yamamoto, Akiko AU - Tsukamoto, Kazuya AU - Fujita, Koji PY - 2023/9/15 TI - The Potential of ChatGPT as a Self-Diagnostic Tool in Common Orthopedic Diseases: Exploratory Study JO - J Med Internet Res SP - e47621 VL - 25 KW - ChatGPT KW - generative pretrained transformer KW - natural language processing KW - artificial intelligence KW - chatbot KW - diagnosis KW - self-diagnosis KW - accuracy KW - precision KW - language model KW - orthopedic disease KW - AI model KW - health information N2 - Background: Artificial intelligence (AI) has gained tremendous popularity recently, especially the use of natural language processing (NLP). ChatGPT is a state-of-the-art chatbot capable of creating natural conversations using NLP. The use of AI in medicine can have a tremendous impact on health care delivery. Although some studies have evaluated ChatGPT?s accuracy in self-diagnosis, there is no research regarding its precision and the degree to which it recommends medical consultations. Objective: The aim of this study was to evaluate ChatGPT?s ability to accurately and precisely self-diagnose common orthopedic diseases, as well as the degree of recommendation it provides for medical consultations. Methods: Over a 5-day course, each of the study authors submitted the same questions to ChatGPT. The conditions evaluated were carpal tunnel syndrome (CTS), cervical myelopathy (CM), lumbar spinal stenosis (LSS), knee osteoarthritis (KOA), and hip osteoarthritis (HOA). Answers were categorized as either correct, partially correct, incorrect, or a differential diagnosis. The percentage of correct answers and reproducibility were calculated. The reproducibility between days and raters were calculated using the Fleiss ? coefficient. Answers that recommended that the patient seek medical attention were recategorized according to the strength of the recommendation as defined by the study. Results: The ratios of correct answers were 25/25, 1/25, 24/25, 16/25, and 17/25 for CTS, CM, LSS, KOA, and HOA, respectively. The ratios of incorrect answers were 23/25 for CM and 0/25 for all other conditions. The reproducibility between days was 1.0, 0.15, 0.7, 0.6, and 0.6 for CTS, CM, LSS, KOA, and HOA, respectively. The reproducibility between raters was 1.0, 0.1, 0.64, ?0.12, and 0.04 for CTS, CM, LSS, KOA, and HOA, respectively. Among the answers recommending medical attention, the phrases ?essential,? ?recommended,? ?best,? and ?important? were used. Specifically, ?essential? occurred in 4 out of 125, ?recommended? in 12 out of 125, ?best? in 6 out of 125, and ?important? in 94 out of 125 answers. Additionally, 7 out of the 125 answers did not include a recommendation to seek medical attention. Conclusions: The accuracy and reproducibility of ChatGPT to self-diagnose five common orthopedic conditions were inconsistent. The accuracy could potentially be improved by adding symptoms that could easily identify a specific location. Only a few answers were accompanied by a strong recommendation to seek medical attention according to our study standards. Although ChatGPT could serve as a potential first step in accessing care, we found variability in accurate self-diagnosis. Given the risk of harm with self-diagnosis without medical follow-up, it would be prudent for an NLP to include clear language alerting patients to seek expert medical opinions. We hope to shed further light on the use of AI in a future clinical study. UR - https://www.jmir.org/2023/1/e47621 UR - http://dx.doi.org/10.2196/47621 UR - http://www.ncbi.nlm.nih.gov/pubmed/37713254 ID - info:doi/10.2196/47621 ER - TY - JOUR AU - Liu, Ville AU - Koskela, H. Tuomas AU - Kaila, Minna PY - 2023/7/19 TI - User-Initiated Symptom Assessment With an Electronic Symptom Checker: Protocol for a Mixed Methods Validation Study JO - JMIR Res Protoc SP - e41423 VL - 12 KW - triage KW - symptom assessment KW - self-care KW - health service accessibility KW - telemedicine KW - health service research KW - internet KW - validation study KW - primary health care KW - clinical studies KW - telehealth N2 - Background: The national Omaolo digital social welfare and health care service of Finland provides a symptom checker, Omaolo, which is a medical device (based on Duodecim Clinical Decision Support EBMEDS software) with a CE marking (risk class IIa), manufactured by the government-owned DigiFinland Oy. Users of this service can perform their triage by using the questions in the symptom checker. By completing the symptom checker, the user receives a recommendation for action and a service assessment with appropriate guidance regarding their health problems on the basis of a selected specific symptom in the symptom checker. This allows users to be provided with appropriate health care services, regardless of time and place. Objective: This study describes the protocol for the mixed methods validation process of the symptom checker available in Omaolo digital services. Methods: This is a mixed methods study using quantitative and qualitative methods, which will be part of the clinical validation process that takes place in primary health care centers in Finland. Each organization provides a space where the study and the nurse triage can be done in order to include an unscreened target population of users. The primary health care units provide walk-in model services, where no prior phone call or contact is required. For the validation of the Omaolo symptom checker, case vignettes will be incorporated to supplement the triage accuracy of rare and acute cases that cannot be tested extensively in real-life settings. Vignettes are produced from a variety of clinical sources, and they test the symptom checker in different triage levels by using 1 standardized patient case example. Results: This study plan underwent an ethics review by the regional permission, which was requested from each organization participating in the research, and an ethics committee statement was requested and granted from Pirkanmaa hospital district?s ethics committee, which is in accordance with the University of Tampere?s regulations. Of 964 clinical user?filled symptom checker assessments, 877 cases were fully completed with a triage result, and therefore, they met the requirements for clinical validation studies. The goal for sufficient data has been reached for most of the chief symptoms. Data collection was completed in September 2019, and the first feasibility and patient experience results were published by the end of 2020. Case vignettes have been identified and are to be completed before further testing the symptom checker. The analysis and reporting are estimated to be finalized in 2024. Conclusions: The primary goals of this multimethod electronic symptom checker study are to assess safety and to provide crucial information regarding the accuracy and usability of the Omaolo electronic symptom checker. To our knowledge, this will be the first study to include real-life clinical cases along with case vignettes. International Registered Report Identifier (IRRID): DERR1-10.2196/41423 UR - https://www.researchprotocols.org/2023/1/e41423 UR - http://dx.doi.org/10.2196/41423 UR - http://www.ncbi.nlm.nih.gov/pubmed/37467041 ID - info:doi/10.2196/41423 ER - TY - JOUR AU - Kopka, Marvin AU - Scatturin, Lennart AU - Napierala, Hendrik AU - Fürstenau, Daniel AU - Feufel, A. Markus AU - Balzer, Felix AU - Schmieding, L. Malte PY - 2023/6/20 TI - Characteristics of Users and Nonusers of Symptom Checkers in Germany: Cross-Sectional Survey Study JO - J Med Internet Res SP - e46231 VL - 25 KW - symptom checker KW - cross-sectional study KW - user characteristic KW - digital public health KW - health information seeking KW - decision support KW - eHealth KW - mHealth KW - Germany KW - mobile health KW - health app KW - information seeking KW - technology use KW - usage KW - demographic KW - perception KW - awareness KW - adoption N2 - Background: Previous studies have revealed that users of symptom checkers (SCs, apps that support self-diagnosis and self-triage) are predominantly female, are younger than average, and have higher levels of formal education. Little data are available for Germany, and no study has so far compared usage patterns with people?s awareness of SCs and the perception of usefulness. Objective: We explored the sociodemographic and individual characteristics that are associated with the awareness, usage, and perceived usefulness of SCs in the German population. Methods: We conducted a cross-sectional online survey among 1084 German residents in July 2022 regarding personal characteristics and people?s awareness and usage of SCs. Using random sampling from a commercial panel, we collected participant responses stratified by gender, state of residence, income, and age to reflect the German population. We analyzed the collected data exploratively. Results: Of all respondents, 16.3% (177/1084) were aware of SCs and 6.5% (71/1084) had used them before. Those aware of SCs were younger (mean 38.8, SD 14.6 years, vs mean 48.3, SD 15.7 years), were more often female (107/177, 60.5%, vs 453/907, 49.9%), and had higher formal education levels (eg, 72/177, 40.7%, vs 238/907, 26.2%, with a university/college degree) than those unaware. The same observation applied to users compared to nonusers. It disappeared, however, when comparing users to nonusers who were aware of SCs. Among users, 40.8% (29/71) considered these tools useful. Those considering them useful reported higher self-efficacy (mean 4.21, SD 0.66, vs mean 3.63, SD 0.81, on a scale of 1-5) and a higher net household income (mean EUR 2591.63, SD EUR 1103.96 [mean US $2798.96, SD US $1192.28], vs mean EUR 1626.60, SD EUR 649.05 [mean US $1756.73, SD US $700.97]) than those who considered them not useful. More women considered SCs unhelpful (13/44, 29.5%) compared to men (4/26, 15.4%). Conclusions: Concurring with studies from other countries, our findings show associations between sociodemographic characteristics and SC usage in a German sample: users were on average younger, of higher socioeconomic status, and more commonly female compared to nonusers. However, usage cannot be explained by sociodemographic differences alone. It rather seems that sociodemographics explain who is or is not aware of the technology, but those who are aware of SCs are equally likely to use them, independently of sociodemographic differences. Although in some groups (eg, people with anxiety disorder), more participants reported to know and use SCs, they tended to perceive them as less useful. In other groups (eg, male participants), fewer respondents were aware of SCs, but those who used them perceived them to be more useful. Thus, SCs should be designed to fit specific user needs, and strategies should be developed to help reach individuals who could benefit but are not aware of SCs yet. UR - https://www.jmir.org/2023/1/e46231 UR - http://dx.doi.org/10.2196/46231 UR - http://www.ncbi.nlm.nih.gov/pubmed/37338970 ID - info:doi/10.2196/46231 ER - TY - JOUR AU - Riboli-Sasco, Eva AU - El-Osta, Austen AU - Alaa, Aos AU - Webber, Iman AU - Karki, Manisha AU - El Asmar, Line Marie AU - Purohit, Katie AU - Painter, Annabelle AU - Hayhoe, Benedict PY - 2023/6/2 TI - Triage and Diagnostic Accuracy of Online Symptom Checkers: Systematic Review JO - J Med Internet Res SP - e43803 VL - 25 KW - systematic review KW - digital triage KW - diagnosis KW - online symptom checker KW - safety KW - accuracy KW - mobile phone N2 - Background: In the context of a deepening global shortage of health workers and, in particular, the COVID-19 pandemic, there is growing international interest in, and use of, online symptom checkers (OSCs). However, the evidence surrounding the triage and diagnostic accuracy of these tools remains inconclusive. Objective: This systematic review aimed to summarize the existing peer-reviewed literature evaluating the triage accuracy (directing users to appropriate services based on their presenting symptoms) and diagnostic accuracy of OSCs aimed at lay users for general health concerns. Methods: Searches were conducted in MEDLINE, Embase, CINAHL, Health Management Information Consortium (HMIC), and Web of Science, as well as the citations of the studies selected for full-text screening. We included peer-reviewed studies published in English between January 1, 2010, and February 16, 2022, with a controlled and quantitative assessment of either or both triage and diagnostic accuracy of OSCs directed at lay users. We excluded tools supporting health care professionals, as well as disease- or specialty-specific OSCs. Screening and data extraction were carried out independently by 2 reviewers for each study. We performed a descriptive narrative synthesis. Results: A total of 21,296 studies were identified, of which 14 (0.07%) were included. The included studies used clinical vignettes, medical records, or direct input by patients. Of the 14 studies, 6 (43%) reported on triage and diagnostic accuracy, 7 (50%) focused on triage accuracy, and 1 (7%) focused on diagnostic accuracy. These outcomes were assessed based on the diagnostic and triage recommendations attached to the vignette in the case of vignette studies or on those provided by nurses or general practitioners, including through face-to-face and telephone consultations. Both diagnostic accuracy and triage accuracy varied greatly among OSCs. Overall diagnostic accuracy was deemed to be low and was almost always lower than that of the comparator. Similarly, most of the studies (9/13, 69 %) showed suboptimal triage accuracy overall, with a few exceptions (4/13, 31%). The main variables affecting the levels of diagnostic and triage accuracy were the severity and urgency of the condition, the use of artificial intelligence algorithms, and demographic questions. However, the impact of each variable differed across tools and studies, making it difficult to draw any solid conclusions. All included studies had at least one area with unclear risk of bias according to the revised Quality Assessment of Diagnostic Accuracy Studies-2 tool. Conclusions: Although OSCs have potential to provide accessible and accurate health advice and triage recommendations to users, more research is needed to validate their triage and diagnostic accuracy before widescale adoption in community and health care settings. Future studies should aim to use a common methodology and agreed standard for evaluation to facilitate objective benchmarking and validation. Trial Registration: PROSPERO CRD42020215210; https://tinyurl.com/3949zw83 UR - https://www.jmir.org/2023/1/e43803 UR - http://dx.doi.org/10.2196/43803 UR - http://www.ncbi.nlm.nih.gov/pubmed/37266983 ID - info:doi/10.2196/43803 ER - TY - JOUR AU - Radionova, Natalia AU - Ög, Eylem AU - Wetzel, Anna-Jasmin AU - Rieger, A. Monika AU - Preiser, Christine PY - 2023/5/29 TI - Impacts of Symptom Checkers for Laypersons? Self-diagnosis on Physicians in Primary Care: Scoping Review JO - J Med Internet Res SP - e39219 VL - 25 KW - mobile health KW - mHealth KW - symptom checkers KW - artificial intelligence?based technology KW - AI-based technology KW - self-diagnosis KW - general practice KW - scoping review KW - mobile phone N2 - Background: Symptom checkers (SCs) for laypersons? self-assessment and preliminary self-diagnosis are widely used by the public. Little is known about the impact of these tools on health care professionals (HCPs) in primary care and their work. This is relevant to understanding how technological changes might affect the working world and how this is linked to work-related psychosocial demands and resources for HCPs. Objective: This scoping review aimed to systematically explore the existing publications on the impacts of SCs on HCPs in primary care and to identify knowledge gaps. Methods: We used the Arksey and O?Malley framework. We based our search string on the participant, concept, and context scheme and searched PubMed (MEDLINE) and CINAHL in January and June 2021. We performed a reference search in August 2021 and a manual search in November 2021. We included publications of peer-reviewed journals that focused on artificial intelligence- or algorithm-based self-diagnosing apps and tools for laypersons and had primary care or nonclinical settings as a relevant context. The characteristics of these studies were described numerically. We used thematic analysis to identify core themes. We followed the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) checklist to report the study. Results: Of the 2729 publications identified through initial and follow-up database searches, 43 full texts were screened for eligibility, of which 9 were included. Further 8 publications were included through manual search. Two publications were excluded after receiving feedback in the peer-review process. Fifteen publications were included in the final sample, which comprised 5 (33%) commentaries or nonresearch publications, 3 (20%) literature reviews, and 7 (47%) research publications. The earliest publications stemmed from 2015. We identified 5 themes. The theme finding prediagnosis comprised the comparison between SCs and physicians. We identified the performance of the diagnosis and the relevance of human factors as topics. In the theme layperson-technology relationship, we identified potentials for laypersons? empowerment and harm through SCs. Our analysis showed potential disruptions of the physician-patient relationship and uncontested roles of HCPs in the theme (impacts on) physician-patient relationship. In the theme impacts on HCPs? tasks, we described the reduction or increase in HCPs? workload. We identified potential transformations of HCPs? work and impacts on the health care system in the theme future role of SCs in health care. Conclusions: The scoping review approach was suitable for this new field of research. The heterogeneity of technologies and wordings was challenging. We identified research gaps in the literature regarding the impact of artificial intelligence? or algorithm-based self-diagnosing apps or tools on the work of HCPs in primary care. Further empirical studies on HCPs? lived experiences are needed, as the current literature depicts expectations rather than empirical findings. UR - https://www.jmir.org/2023/1/e39219 UR - http://dx.doi.org/10.2196/39219 UR - http://www.ncbi.nlm.nih.gov/pubmed/37247214 ID - info:doi/10.2196/39219 ER - TY - JOUR AU - Berdahl, T. Carl AU - Henreid, J. Andrew AU - Pevnick, M. Joshua AU - Zheng, Kai AU - Nuckols, K. Teryl PY - 2022/11/17 TI - Digital Tools Designed to Obtain the History of Present Illness From Patients: Scoping Review JO - J Med Internet Res SP - e36074 VL - 24 IS - 11 KW - anamnesis KW - informatics KW - emergency medicine KW - human-computer interaction KW - medical history taking KW - mobile phone N2 - Background: Many medical conditions, perhaps 80% of them, can be diagnosed by taking a thorough history of present illness (HPI). However, in the clinical setting, situational factors such as interruptions and time pressure may cause interactions with patients to be brief and fragmented. One solution for improving clinicians? ability to collect a thorough HPI and maximize efficiency and quality of care could be to use a digital tool to obtain the HPI before face-to-face evaluation by a clinician. Objective: Our objective was to identify and characterize digital tools that have been designed to obtain the HPI directly from patients or caregivers and present this information to clinicians before a face-to-face encounter. We also sought to describe outcomes reported in testing of these tools, especially those related to usability, efficiency, and quality of care. Methods: We conducted a scoping review using predefined search terms in the following databases: MEDLINE, CINAHL, PsycINFO, Web of Science, Embase, IEEE Xplore Digital Library, ACM Digital Library, and ProQuest Dissertations & Theses Global. Two reviewers screened titles and abstracts for relevance, performed full-text reviews of articles meeting the inclusion criteria, and used a pile-sorting procedure to identify distinguishing characteristics of the tools. Information describing the tools was primarily obtained from identified peer-reviewed sources; in addition, supplementary information was obtained from tool websites and through direct communications with tool creators. Results: We identified 18 tools meeting the inclusion criteria. Of these 18 tools, 14 (78%) used primarily closed-ended and multiple-choice questions, 1 (6%) used free-text input, and 3 (17%) used conversational (chatbot) style. More than half (10/18, 56%) of the tools were tailored to specific patient subpopulations; the remaining (8/18, 44%) tools did not specify a target subpopulation. Of the 18 tools, 7 (39%) included multilingual support, and 12 (67%) had the capability to transfer data directly into the electronic health record. Studies of the tools reported on various outcome measures related to usability, efficiency, and quality of care. Conclusions: The HPI tools we identified (N=18) varied greatly in their purpose and functionality. There was no consensus on how patient-generated information should be collected or presented to clinicians. Existing tools have undergone inconsistent levels of testing, with a wide variety of different outcome measures used in evaluation, including some related to usability, efficiency, and quality of care. There is substantial interest in using digital tools to obtain the HPI from patients, but the outcomes measured have been inconsistent. Future research should focus on whether using HPI tools can lead to improved patient experience and health outcomes, although surrogate end points could instead be used so long as patient safety is monitored. UR - https://www.jmir.org/2022/11/e36074 UR - http://dx.doi.org/10.2196/36074 UR - http://www.ncbi.nlm.nih.gov/pubmed/36394945 ID - info:doi/10.2196/36074 ER - TY - JOUR AU - Painter, Annabelle AU - Hayhoe, Benedict AU - Riboli-Sasco, Eva AU - El-Osta, Austen PY - 2022/10/26 TI - Online Symptom Checkers: Recommendations for a Vignette-Based Clinical Evaluation Standard JO - J Med Internet Res SP - e37408 VL - 24 IS - 10 KW - online symptom checkers KW - clinical evaluation KW - validation KW - assessment KW - standards KW - third-party assessment KW - quality assurance UR - https://www.jmir.org/2022/10/e37408 UR - http://dx.doi.org/10.2196/37408 UR - http://www.ncbi.nlm.nih.gov/pubmed/36287594 ID - info:doi/10.2196/37408 ER - TY - JOUR AU - Liu, W. Andrew AU - Odisho, Y. Anobel AU - Brown III, William AU - Gonzales, Ralph AU - Neinstein, B. Aaron AU - Judson, J. Timothy PY - 2022/9/13 TI - Patient Experience and Feedback After Using an Electronic Health Record?Integrated COVID-19 Symptom Checker: Survey Study JO - JMIR Hum Factors SP - e40064 VL - 9 IS - 3 KW - COVID-19 KW - patient portals KW - digital health KW - diagnostic self evaluation KW - medical informatics KW - internet KW - telemedicine KW - triage KW - feedback KW - medical records systems KW - San Francisco KW - user experience KW - user satisfaction KW - self-triage KW - symptom checker KW - health system KW - workflow KW - integration KW - electronic health record N2 - Background: Symptom checkers have been widely used during the COVID-19 pandemic to alleviate strain on health systems and offer patients a 24-7 self-service triage option. Although studies suggest that users may positively perceive web-based symptom checkers, no studies have quantified user feedback after use of an electronic health record?integrated COVID-19 symptom checker with self-scheduling functionality. Objective: In this paper, we aimed to understand user experience, user satisfaction, and user-reported alternatives to the use of a COVID-19 symptom checker with self-triage and self-scheduling functionality. Methods: We launched a patient-portal?based self-triage and self-scheduling tool in March 2020 for patients with COVID-19 symptoms, exposures, or questions. We made an optional, anonymous Qualtrics survey available to patients immediately after they completed the symptom checker. Results: Between December 16, 2021, and March 28, 2022, there were 395 unique responses to the survey. Overall, the respondents reported high satisfaction across all demographics, with a median rating of 8 out of 10 and 288/395 (47.6%) of the respondents giving a rating of 9 or 10 out of 10. User satisfaction scores were not associated with any demographic factors. The most common user-reported alternatives had the web-based tool not been available were calling the COVID-19 telephone hotline and sending a patient-portal message to their physician for advice. The ability to schedule a test online was the most important symptom checker feature for the respondents. The most common categories of user feedback were regarding other COVID-19 services (eg, telephone hotline), policies, or procedures, and requesting additional features or functionality. Conclusions: This analysis suggests that COVID-19 symptom checkers with self-triage and self-scheduling functionality may have high overall user satisfaction, regardless of user demographics. By allowing users to self-triage and self-schedule tests and visits, tools such as this may prevent unnecessary calls and messages to clinicians. Individual feedback suggested that the user experience for this type of tool is highly dependent on the organization's operational workflows for COVID-19 testing and care. This study provides insight for the implementation and improvement of COVID-19 symptom checkers to ensure high user satisfaction. UR - https://humanfactors.jmir.org/2022/3/e40064 UR - http://dx.doi.org/10.2196/40064 UR - http://www.ncbi.nlm.nih.gov/pubmed/35960593 ID - info:doi/10.2196/40064 ER - TY - JOUR AU - Arellano Carmona, Kimberly AU - Chittamuru, Deepti AU - Kravitz, L. Richard AU - Ramondt, Steven AU - Ramírez, Susana A. PY - 2022/8/19 TI - Health Information Seeking From an Intelligent Web-Based Symptom Checker: Cross-sectional Questionnaire Study JO - J Med Internet Res SP - e36322 VL - 24 IS - 8 KW - health information seeking KW - health information KW - information seeking KW - information seeker KW - information behavior KW - artificial intelligence KW - medical information system KW - digital divide KW - information inequality KW - digital epidemiology KW - symptom checker KW - digital health KW - eHealth KW - online health information KW - user demographic KW - health information resource KW - health information tool KW - digital health assistant N2 - Background: The ever-growing amount of health information available on the web is increasing the demand for tools providing personalized and actionable health information. Such tools include symptom checkers that provide users with a potential diagnosis after responding to a set of probes about their symptoms. Although the potential for their utility is great, little is known about such tools? actual use and effects. Objective: We aimed to understand who uses a web-based artificial intelligence?powered symptom checker and its purposes, how they evaluate the experience of the web-based interview and quality of the information, what they intend to do with the recommendation, and predictors of future use. Methods: Cross-sectional survey of web-based health information seekers following the completion of a symptom checker visit (N=2437). Measures of comprehensibility, confidence, usefulness, health-related anxiety, empowerment, and intention to use in the future were assessed. ANOVAs and the Wilcoxon rank sum test examined mean outcome differences in racial, ethnic, and sex groups. The relationship between perceptions of the symptom checker and intention to follow recommended actions was assessed using multilevel logistic regression. Results: Buoy users were well-educated (1384/1704, 81.22% college or higher), primarily White (1227/1693, 72.47%), and female (2069/2437, 84.89%). Most had insurance (1449/1630, 88.89%), a regular health care provider (1307/1709, 76.48%), and reported good health (1000/1703, 58.72%). Three types of symptoms?pain (855/2437, 35.08%), gynecological issues (293/2437, 12.02%), and masses or lumps (204/2437, 8.37%)?accounted for almost half (1352/2437, 55.48%) of site visits. Buoy?s top three primary recommendations split across less-serious triage categories: primary care physician in 2 weeks (754/2141, 35.22%), self-treatment (452/2141, 21.11%), and primary care in 1 to 2 days (373/2141, 17.42%). Common diagnoses were musculoskeletal (303/2437, 12.43%), gynecological (304/2437, 12.47%) and skin conditions (297/2437, 12.19%), and infectious diseases (300/2437, 12.31%). Users generally reported high confidence in Buoy, found it useful and easy to understand, and said that Buoy made them feel less anxious and more empowered to seek medical help. Users for whom Buoy recommended ?Waiting/Watching? or ?Self-Treatment? had strongest intentions to comply, whereas those advised to seek primary care had weaker intentions. Compared with White users, Latino and Black users had significantly more confidence in Buoy (P<.05), and the former also found it significantly more useful (P<.05). Latino (odds ratio 1.96, 95% CI 1.22-3.25) and Black (odds ratio 2.37, 95% CI 1.57-3.66) users also had stronger intentions to discuss recommendations with a provider than White users. Conclusions: Results demonstrate the potential utility of a web-based health information tool to empower people to seek care and reduce health-related anxiety. However, despite encouraging results suggesting the tool may fulfill unmet health information needs among women and Black and Latino adults, analyses of the user base illustrate persistent second-level digital divide effects. UR - https://www.jmir.org/2022/8/e36322 UR - http://dx.doi.org/10.2196/36322 UR - http://www.ncbi.nlm.nih.gov/pubmed/35984690 ID - info:doi/10.2196/36322 ER - TY - JOUR AU - Wetzel, Anna-Jasmin AU - Koch, Roland AU - Preiser, Christine AU - Müller, Regina AU - Klemmt, Malte AU - Ranisch, Robert AU - Ehni, Hans-Jörg AU - Wiesing, Urban AU - Rieger, A. Monika AU - Henking, Tanja AU - Joos, Stefanie PY - 2022/5/16 TI - Ethical, Legal, and Social Implications of Symptom Checker Apps in Primary Health Care (CHECK.APP): Protocol for an Interdisciplinary Mixed Methods Study JO - JMIR Res Protoc SP - e34026 VL - 11 IS - 5 KW - symptom checker apps KW - self-diagnosis, self-triage, digitalization in primary care, general practitioners KW - symptom checker KW - app KW - mobile app KW - primary care N2 - Background: Symptom checker apps (SCAs) are accessible tools that provide early symptom assessment for users. The ethical, legal, and social implications of SCAs and their impact on the patient-physician relationship, the health care providers, and the health care system have sparsely been examined. This study protocol describes an approach to investigate the possible impacts and implications of SCAs on different levels of health care provision. It considers the perspectives of the users, nonusers, general practitioners (GPs), and health care experts. Objective: We aim to assess a comprehensive overview of the use of SCAs and address problematic issues, if any. The primary outcomes of this study are empirically informed multi-perspective recommendations for different stakeholders on the ethical, legal, and social implications of SCAs. Methods: Quantitative and qualitative methods will be used in several overlapping and interconnected study phases. In study phase 1, a comprehensive literature review will be conducted to assess the ethical, legal, social, and systemic impacts of SCAs. Study phase 2 comprises a survey that will be analyzed with a logistic regression. It aims to assess the user degree of SCAs in Germany as well as the predictors for SCA usage. Study phase 3 will investigate self-observational diaries and user interviews, which will be analyzed as integrated cases to assess user perspectives, usage pattern, and arising problems. Study phase 4 will comprise GP interviews to assess their experiences, perspectives, self-image, and concepts and will be analyzed with the basic procedure by Kruse. Moreover, interviews with health care experts will be conducted in study phase 3 and will be analyzed by using the reflexive thematical analysis approach of Braun and Clark. Results: Study phase 1 will be completed in November 2021. We expect the results of study phase 2 in December 2021 and February 2022. In study phase 3, interviews are currently being conducted. The final study endpoint will be in February 2023. Conclusions: The possible ethical, legal, social, and systemic impacts of a widespread use of SCAs that affect stakeholders and stakeholder groups on different levels of health care will be identified. The proposed methodological approach provides a multifaceted and diverse empirical basis for a broad discussion on these implications. Trial Registration: German Clinical Trials Register (DRKS) DRKS00022465; https://tinyurl.com/yx53er67 International Registered Report Identifier (IRRID): DERR1-10.2196/34026 UR - https://www.researchprotocols.org/2022/5/e34026 UR - http://dx.doi.org/10.2196/34026 UR - http://www.ncbi.nlm.nih.gov/pubmed/35576570 ID - info:doi/10.2196/34026 ER - TY - JOUR AU - Schmieding, L. Malte AU - Kopka, Marvin AU - Schmidt, Konrad AU - Schulz-Niethammer, Sven AU - Balzer, Felix AU - Feufel, A. Markus PY - 2022/5/10 TI - Triage Accuracy of Symptom Checker Apps: 5-Year Follow-up Evaluation JO - J Med Internet Res SP - e31810 VL - 24 IS - 5 KW - digital health KW - triage KW - symptom checker KW - patient-centered care KW - eHealth apps KW - mobile phone N2 - Background: Symptom checkers are digital tools assisting laypersons in self-assessing the urgency and potential causes of their medical complaints. They are widely used but face concerns from both patients and health care professionals, especially regarding their accuracy. A 2015 landmark study substantiated these concerns using case vignettes to demonstrate that symptom checkers commonly err in their triage assessment. Objective: This study aims to revisit the landmark index study to investigate whether and how symptom checkers? capabilities have evolved since 2015 and how they currently compare with laypersons? stand-alone triage appraisal. Methods: In early 2020, we searched for smartphone and web-based applications providing triage advice. We evaluated these apps on the same 45 case vignettes as the index study. Using descriptive statistics, we compared our findings with those of the index study and with publicly available data on laypersons? triage capability. Results: We retrieved 22 symptom checkers providing triage advice. The median triage accuracy in 2020 (55.8%, IQR 15.1%) was close to that in 2015 (59.1%, IQR 15.5%). The apps in 2020 were less risk averse (odds 1.11:1, the ratio of overtriage errors to undertriage errors) than those in 2015 (odds 2.82:1), missing >40% of emergencies. Few apps outperformed laypersons in either deciding whether emergency care was required or whether self-care was sufficient. No apps outperformed the laypersons on both decisions. Conclusions: Triage performance of symptom checkers has, on average, not improved over the course of 5 years. It decreased in 2 use cases (advice on when emergency care is required and when no health care is needed for the moment). However, triage capability varies widely within the sample of symptom checkers. Whether it is beneficial to seek advice from symptom checkers depends on the app chosen and on the specific question to be answered. Future research should develop resources (eg, case vignette repositories) to audit the capabilities of symptom checkers continuously and independently and provide guidance on when and to whom they should be recommended. UR - https://www.jmir.org/2022/5/e31810 UR - http://dx.doi.org/10.2196/31810 UR - http://www.ncbi.nlm.nih.gov/pubmed/35536633 ID - info:doi/10.2196/31810 ER - TY - JOUR AU - Kopka, Marvin AU - Schmieding, L. Malte AU - Rieger, Tobias AU - Roesler, Eileen AU - Balzer, Felix AU - Feufel, A. Markus PY - 2022/5/3 TI - Determinants of Laypersons? Trust in Medical Decision Aids: Randomized Controlled Trial JO - JMIR Hum Factors SP - e35219 VL - 9 IS - 2 KW - symptom checkers KW - disposition advice KW - anthropomorphism KW - artificial intelligence KW - urgency assessment KW - patient-centered care KW - human-computer interaction KW - consumer health KW - information technology KW - IT KW - mobile phone N2 - Background: Symptom checker apps are patient-facing decision support systems aimed at providing advice to laypersons on whether, where, and how to seek health care (disposition advice). Such advice can improve laypersons? self-assessment and ultimately improve medical outcomes. Past research has mainly focused on the accuracy of symptom checker apps? suggestions. To support decision-making, such apps need to provide not only accurate but also trustworthy advice. To date, only few studies have addressed the question of the extent to which laypersons trust symptom checker app advice or the factors that moderate their trust. Studies on general decision support systems have shown that framing automated systems (anthropomorphic or emphasizing expertise), for example, by using icons symbolizing artificial intelligence (AI), affects users? trust. Objective: This study aims to identify the factors influencing laypersons? trust in the advice provided by symptom checker apps. Primarily, we investigated whether designs using anthropomorphic framing or framing the app as an AI increases users? trust compared with no such framing. Methods: Through a web-based survey, we recruited 494 US residents with no professional medical training. The participants had to first appraise the urgency of a fictitious patient description (case vignette). Subsequently, a decision aid (mock symptom checker app) provided disposition advice contradicting the participants? appraisal, and they had to subsequently reappraise the vignette. Participants were randomized into 3 groups: 2 experimental groups using visual framing (anthropomorphic, 160/494, 32.4%, vs AI, 161/494, 32.6%) and a neutral group without such framing (173/494, 35%). Results: Most participants (384/494, 77.7%) followed the decision aid?s advice, regardless of its urgency level. Neither anthropomorphic framing (odds ratio 1.120, 95% CI 0.664-1.897) nor framing as AI (odds ratio 0.942, 95% CI 0.565-1.570) increased behavioral or subjective trust (P=.99) compared with the no-frame condition. Even participants who were extremely certain in their own decisions (ie, 100% certain) commonly changed it in favor of the symptom checker?s advice (19/34, 56%). Propensity to trust and eHealth literacy were associated with increased subjective trust in the symptom checker (propensity to trust b=0.25; eHealth literacy b=0.2), whereas sociodemographic variables showed no such link with either subjective or behavioral trust. Conclusions: Contrary to our expectation, neither the anthropomorphic framing nor the emphasis on AI increased trust in symptom checker advice compared with that of a neutral control condition. However, independent of the interface, most participants trusted the mock app?s advice, even when they were very certain of their own assessment. Thus, the question arises as to whether laypersons use such symptom checkers as substitutes rather than as aids in their own decision-making. With trust in symptom checkers already high at baseline, the benefit of symptom checkers depends on interface designs that enable users to adequately calibrate their trust levels during usage. Trial Registration: Deutsches Register Klinischer Studien DRKS00028561; https://tinyurl.com/rv4utcfb (retrospectively registered). UR - https://humanfactors.jmir.org/2022/2/e35219 UR - http://dx.doi.org/10.2196/35219 UR - http://www.ncbi.nlm.nih.gov/pubmed/35503248 ID - info:doi/10.2196/35219 ER - TY - JOUR AU - Hennemann, Severin AU - Kuhn, Sebastian AU - Witthöft, Michael AU - Jungmann, M. Stefanie PY - 2022/1/31 TI - Diagnostic Performance of an App-Based Symptom Checker in Mental Disorders: Comparative Study in Psychotherapy Outpatients JO - JMIR Ment Health SP - e32832 VL - 9 IS - 1 KW - mHealth KW - symptom checker KW - diagnostics KW - mental disorders KW - psychotherapy KW - mobile phone N2 - Background: Digital technologies have become a common starting point for health-related information-seeking. Web- or app-based symptom checkers aim to provide rapid and accurate condition suggestions and triage advice but have not yet been investigated for mental disorders in routine health care settings. Objective: This study aims to test the diagnostic performance of a widely available symptom checker in the context of formal diagnosis of mental disorders when compared with therapists? diagnoses based on structured clinical interviews. Methods: Adult patients from an outpatient psychotherapy clinic used the app-based symptom checker Ada?check your health (ADA; Ada Health GmbH) at intake. Accuracy was assessed as the agreement of the first and 1 of the first 5 condition suggestions of ADA with at least one of the interview-based therapist diagnoses. In addition, sensitivity, specificity, and interrater reliabilities (Gwet first-order agreement coefficient [AC1]) were calculated for the 3 most prevalent disorder categories. Self-reported usability (assessed using the System Usability Scale) and acceptance of ADA (assessed using an adapted feedback questionnaire) were evaluated. Results: A total of 49 patients (30/49, 61% women; mean age 33.41, SD 12.79 years) were included in this study. Across all patients, the interview-based diagnoses matched ADA?s first condition suggestion in 51% (25/49; 95% CI 37.5-64.4) of cases and 1 of the first 5 condition suggestions in 69% (34/49; 95% CI 55.4-80.6) of cases. Within the main disorder categories, the accuracy of ADA?s first condition suggestion was 0.82 for somatoform and associated disorders, 0.65 for affective disorders, and 0.53 for anxiety disorders. Interrater reliabilities ranged from low (AC1=0.15 for anxiety disorders) to good (AC1=0.76 for somatoform and associated disorders). The usability of ADA was rated as high in the System Usability Scale (mean 81.51, SD 11.82, score range 0-100). Approximately 71% (35/49) of participants would have preferred a face-to-face over an app-based diagnostic. Conclusions: Overall, our findings suggest that a widely available symptom checker used in the formal diagnosis of mental disorders could provide clinicians with a list of condition suggestions with moderate-to-good accuracy. However, diagnostic performance was heterogeneous between disorder categories and included low interrater reliability. Although symptom checkers have some potential to complement the diagnostic process as a screening tool, the diagnostic performance should be tested in larger samples and in comparison with further diagnostic instruments. UR - https://mental.jmir.org/2022/1/e32832 UR - http://dx.doi.org/10.2196/32832 UR - http://www.ncbi.nlm.nih.gov/pubmed/35099395 ID - info:doi/10.2196/32832 ER - TY - JOUR AU - Dunn, Taylor AU - Howlett, E. Susan AU - Stanojevic, Sanja AU - Shehzad, Aaqib AU - Stanley, Justin AU - Rockwood, Kenneth PY - 2022/1/27 TI - Patterns of Symptom Tracking by Caregivers and Patients With Dementia and Mild Cognitive Impairment: Cross-sectional Study JO - J Med Internet Res SP - e29219 VL - 24 IS - 1 KW - dementia KW - mild cognitive impairment KW - real-world evidence KW - patient-centric outcomes KW - machine learning KW - dementia stage KW - Alzheimer disease KW - symptom tracking N2 - Background: Individuals with dementia and mild cognitive impairment (MCI) experience a wide variety of symptoms and challenges that trouble them. To address this heterogeneity, numerous standardized tests are used for diagnosis and prognosis. myGoalNav Dementia is a web-based tool that allows individuals with impairments and their caregivers to identify and track outcomes of greatest importance to them, which may be a less arbitrary and more sensitive way of capturing meaningful change. Objective: We aim to explore the most frequent and important symptoms and challenges reported by caregivers and people with dementia and MCI and how this varies according to disease severity. Methods: This cross-sectional study involved 3909 web-based myGoalNav users (mostly caregivers of people with dementia or MCI) who completed symptom profiles between 2006 and 2019. To make a symptom profile, users selected their most personally meaningful or troublesome dementia-related symptoms to track over time. Users were also asked to rank their chosen symptoms from least to most important, which we called the symptom potency. As the stage of disease for these web-based users was unknown, we applied a supervised staging algorithm, previously trained on clinician-derived data, to classify each profile into 1 of 4 stages: MCI and mild, moderate, and severe dementia. Across these stages, we compared symptom tracking frequency, symptom potency, and the relationship between frequency and potency. Results: Applying the staging algorithm to the 3909 user profiles resulted in 917 (23.46%) MCI, 1596 (40.83%) mild dementia, 514 (13.15%) moderate dementia, and 882 (22.56%) severe dementia profiles. We found that the most frequent symptoms in MCI and mild dementia profiles were similar and comprised early hallmarks of dementia (eg, recent memory and language difficulty). As the stage increased to moderate and severe, the most frequent symptoms were characteristic of loss of independent function (eg, incontinence) and behavioral problems (eg, aggression). The most potent symptoms were similar between stages and generally reflected disruptions in everyday life (eg, problems with hobbies or games, travel, and looking after grandchildren). Symptom frequency was negatively correlated with potency at all stages, and the strength of this relationship increased with increasing disease severity. Conclusions: Our results emphasize the importance of patient-centricity in MCI and dementia studies and illustrate the valuable real-world evidence that can be collected with digital tools. Here, the most frequent symptoms across the stages reflected our understanding of the typical disease progression. However, the symptoms that were ranked as most personally important by users were generally among the least frequently selected. Through individualization, patient-centered instruments such as myGoalNav can complement standardized measures by capturing these infrequent but potent outcomes. UR - https://www.jmir.org/2022/1/e29219 UR - http://dx.doi.org/10.2196/29219 UR - http://www.ncbi.nlm.nih.gov/pubmed/35084341 ID - info:doi/10.2196/29219 ER - TY - JOUR AU - Janvrin, Lynn Miranda AU - Korona-Bailey, Jessica AU - Koehlmoos, Pérez Tracey PY - 2021/12/6 TI - Re-examining COVID-19 Self-Reported Symptom Tracking Programs in the United States: Updated Framework Synthesis JO - JMIR Form Res SP - e31271 VL - 5 IS - 12 KW - COVID-19 KW - coronavirus KW - framework analysis KW - information resources KW - monitoring KW - patient-reported outcome measures KW - self-reported KW - surveillance KW - symptom tracking KW - synthesis KW - digital health N2 - Background: Early in the pandemic, in 2020, Koehlmoos et al completed a framework synthesis of currently available self-reported symptom tracking programs for COVID-19. This framework described relevant programs, partners and affiliates, funding, responses, platform, and intended audience, among other considerations. Objective: This study seeks to update the existing framework with the aim of identifying developments in the landscape and highlighting how programs have adapted to changes in pandemic response. Methods: Our team developed a framework to collate information on current COVID-19 self-reported symptom tracking programs using the ?best-fit? framework synthesis approach. All programs from the previous study were included to document changes. New programs were discovered using a Google search for target keywords. The time frame for the search for programs ranged from March 1, 2021, to May 6, 2021. Results: We screened 33 programs, of which 8 were included in our final framework synthesis. We identified multiple common data elements, including demographic information such as race, age, gender, and affiliation (all were associated with universities, medical schools, or schools of public health). Dissimilarities included questions regarding vaccination status, vaccine hesitancy, adherence to social distancing, COVID-19 testing, and mental health. Conclusions: At this time, the future of self-reported symptom tracking for COVID-19 is unclear. Some sources have speculated that COVID-19 may become a yearly occurrence much like the flu, and if so, the data that these programs generate is still valuable. However, it is unclear whether the public will maintain the same level of interest in reporting their symptoms on a regular basis if the prevalence of COVID-19 becomes more common. UR - https://formative.jmir.org/2021/12/e31271 UR - http://dx.doi.org/10.2196/31271 UR - http://www.ncbi.nlm.nih.gov/pubmed/34792469 ID - info:doi/10.2196/31271 ER - TY - JOUR AU - Kummer, Benjamin AU - Shakir, Lubaina AU - Kwon, Rachel AU - Habboushe, Joseph AU - Jetté, Nathalie PY - 2021/8/2 TI - Usage Patterns of Web-Based Stroke Calculators in Clinical Decision Support: Retrospective Analysis JO - JMIR Med Inform SP - e28266 VL - 9 IS - 8 KW - medical informatics KW - clinical informatics KW - mhealth KW - digital health KW - cerebrovascular disease KW - medical calculators KW - health information KW - health information technology KW - information technology KW - economic health KW - clinical health KW - electronic health records N2 - Background: Clinical scores are frequently used in the diagnosis and management of stroke. While medical calculators are increasingly important support tools for clinical decisions, the uptake and use of common medical calculators for stroke remain poorly characterized. Objective: We aimed to describe use patterns in frequently used stroke-related medical calculators for clinical decisions from a web-based support system. Methods: We conducted a retrospective study of calculators from MDCalc, a web-based and mobile app?based medical calculator platform based in the United States. We analyzed metadata tags from MDCalc?s calculator use data to identify all calculators related to stroke. Using relative page views as a measure of calculator use, we determined the 5 most frequently used stroke-related calculators between January 2016 and December 2018. For all 5 calculators, we determined cumulative and quarterly use, mode of access (eg, app or web browser), and both US and international distributions of use. We compared cumulative use in the 2016-2018 period with use from January 2011 to December 2015. Results: Over the study period, we identified 454 MDCalc calculators, of which 48 (10.6%) were related to stroke. Of these, the 5 most frequently used calculators were the CHA2DS2-VASc score for atrial fibrillation stroke risk calculator (5.5% of total and 32% of stroke-related page views), the Mean Arterial Pressure calculator (2.4% of total and 14.0% of stroke-related page views), the HAS-BLED score for major bleeding risk (1.9% of total and 11.4% of stroke-related page views), the National Institutes of Health Stroke Scale (NIHSS) score calculator (1.7% of total and 10.1% of stroke-related page views), and the CHADS2 score for atrial fibrillation stroke risk calculator (1.4% of total and 8.1% of stroke-related page views). Web browser was the most common mode of access, accounting for 82.7%-91.2% of individual stroke calculator page views. Access originated most frequently from the most populated regions within the United States. Internationally, use originated mostly from English-language countries. The NIHSS score calculator demonstrated the greatest increase in page views (238.1% increase) between the first and last quarters of the study period. Conclusions: The most frequently used stroke calculators were the CHA2DS2-VASc, Mean Arterial Pressure, HAS-BLED, NIHSS, and CHADS2. These were mainly accessed by web browser, from English-speaking countries, and from highly populated areas. Further studies should investigate barriers to stroke calculator adoption and the effect of calculator use on the application of best practices in cerebrovascular disease. UR - https://medinform.jmir.org/2021/8/e28266 UR - http://dx.doi.org/10.2196/28266 UR - http://www.ncbi.nlm.nih.gov/pubmed/34338647 ID - info:doi/10.2196/28266 ER - TY - JOUR AU - Montazeri, Maryam AU - Multmeier, Jan AU - Novorol, Claire AU - Upadhyay, Shubhanan AU - Wicks, Paul AU - Gilbert, Stephen PY - 2021/5/21 TI - Optimization of Patient Flow in Urgent Care Centers Using a Digital Tool for Recording Patient Symptoms and History: Simulation Study JO - JMIR Form Res SP - e26402 VL - 5 IS - 5 KW - symptom assessment app KW - discrete event simulation KW - health care system KW - patient flow modeling KW - patient flow KW - simulation KW - urgent care KW - waiting times N2 - Background: Crowding can negatively affect patient and staff experience, and consequently the performance of health care facilities. Crowding can potentially be eased through streamlining and the reduction of duplication in patient history-taking through the use of a digital symptom-taking app. Objective: We simulated the introduction of a digital symptom-taking app on patient flow. We hypothesized that waiting times and crowding in an urgent care center (UCC) could be reduced, and that this would be more efficient than simply adding more staff. Methods: A discrete-event approach was used to simulate patient flow in a UCC during a 4-hour time frame. The baseline scenario was a small UCC with 2 triage nurses, 2 doctors, 1 treatment/examination nurse, and 1 discharge administrator in service. We simulated 33 scenarios with different staff numbers or different potential time savings through the app. We explored average queue length, waiting time, idle time, and staff utilization for each scenario. Results: Discrete-event simulation showed that even a few minutes saved through patient app-based self-history recording during triage could result in significantly increased efficiency. A modest estimated time saving per patient of 2.5 minutes decreased the average patient wait time for triage by 26.17%, whereas a time saving of 5 minutes led to a 54.88% reduction in patient wait times. Alternatively, adding an additional triage nurse was less efficient, as the additional staff were only required at the busiest times. Conclusions: Small time savings in the history-taking process have potential to result in substantial reductions in total patient waiting time for triage nurses, with likely effects of reduced patient anxiety, staff anxiety, and improved patient care. Patient self-history recording could be carried out at home or in the waiting room via a check-in kiosk or a portable tablet computer. This formative simulation study has potential to impact service provision and approaches to digitalization at scale. UR - https://formative.jmir.org/2021/5/e26402 UR - http://dx.doi.org/10.2196/26402 UR - http://www.ncbi.nlm.nih.gov/pubmed/34018963 ID - info:doi/10.2196/26402 ER - TY - JOUR AU - Munsch, Nicolas AU - Martin, Alistair AU - Gruarin, Stefanie AU - Nateqi, Jama AU - Abdarahmane, Isselmou AU - Weingartner-Ortner, Rafael AU - Knapp, Bernhard PY - 2021/5/21 TI - Authors? Reply to: Screening Tools: Their Intended Audiences and Purposes. Comment on ?Diagnostic Accuracy of Web-Based COVID-19 Symptom Checkers: Comparison Study? JO - J Med Internet Res SP - e26543 VL - 23 IS - 5 KW - COVID-19 KW - symptom checkers KW - benchmark KW - digital health KW - symptom KW - chatbot KW - accuracy UR - https://www.jmir.org/2021/5/e26543 UR - http://dx.doi.org/10.2196/26543 UR - http://www.ncbi.nlm.nih.gov/pubmed/33989162 ID - info:doi/10.2196/26543 ER - TY - JOUR AU - Millen, Elizabeth AU - Gilsdorf, Andreas AU - Fenech, Matthew AU - Gilbert, Stephen PY - 2021/5/21 TI - Screening Tools: Their Intended Audiences and Purposes. Comment on ?Diagnostic Accuracy of Web-Based COVID-19 Symptom Checkers: Comparison Study? JO - J Med Internet Res SP - e26148 VL - 23 IS - 5 KW - COVID-19 KW - symptom checkers KW - benchmark KW - digital health KW - symptom KW - chatbot KW - accuracy UR - https://www.jmir.org/2021/5/e26148 UR - http://dx.doi.org/10.2196/26148 UR - http://www.ncbi.nlm.nih.gov/pubmed/33989169 ID - info:doi/10.2196/26148 ER - TY - JOUR AU - Brandberg, Helge AU - Sundberg, Johan Carl AU - Spaak, Jonas AU - Koch, Sabine AU - Zakim, David AU - Kahan, Thomas PY - 2021/4/27 TI - Use of Self-Reported Computerized Medical History Taking for Acute Chest Pain in the Emergency Department ? the Clinical Expert Operating System Chest Pain Danderyd Study (CLEOS-CPDS): Prospective Cohort Study JO - J Med Internet Res SP - e25493 VL - 23 IS - 4 KW - chest pain KW - computerized history taking KW - coronary artery disease KW - eHealth KW - emergency department KW - health informatics KW - medical history KW - risk management N2 - Background: Chest pain is one of the most common chief complaints in emergency departments (EDs). Collecting an adequate medical history is challenging but essential in order to use recommended risk scores such as the HEART score (based on history, electrocardiogram, age, risk factors, and troponin). Self-reported computerized history taking (CHT) is a novel method to collect structured medical history data directly from the patient through a digital device. CHT is rarely used in clinical practice, and there is a lack of evidence for utility in an acute setting. Objective: This substudy of the Clinical Expert Operating System Chest Pain Danderyd Study (CLEOS-CPDS) aimed to evaluate whether patients with acute chest pain can interact effectively with CHT in the ED. Methods: Prospective cohort study on self-reported medical histories collected from acute chest pain patients using a CHT program on a tablet. Clinically stable patients aged 18 years and older with a chief complaint of chest pain, fluency in Swedish, and a nondiagnostic electrocardiogram or serum markers for acute coronary syndrome were eligible for inclusion. Patients unable to carry out an interview with CHT (eg, inadequate eyesight, confusion or agitation) were excluded. Effectiveness was assessed as the proportion of patients completing the interview and the time required in order to collect a medical history sufficient for cardiovascular risk stratification according to HEART score. Results: During 2017-2018, 500 participants were consecutively enrolled. The age and sex distribution (mean 54.3, SD 17.0 years; 213/500, 42.6% women) was similar to that of the general chest pain population (mean 57.5, SD 19.2 years; 49.6% women). Common reasons for noninclusion were language issues (182/1000, 18.2%), fatigue (158/1000, 15.8%), and inability to use a tablet (152/1000, 15.2%). Sufficient data to calculate HEART score were collected in 70.4% (352/500) of the patients. Key modules for chief complaint, cardiovascular history, and respiratory history were completed by 408 (81.6%), 339 (67.8%), and 291 (58.2%) of the 500 participants, respectively, while 148 (29.6%) completed the entire interview (in all 14 modules). Factors associated with completeness were age 18-69 years (all key modules: Ps<.001), male sex (cardiovascular: P=.04), active workers (all key modules: Ps<.005), not arriving by ambulance (chief complaint: P=.03; cardiovascular: P=.045), and ongoing chest pain (complete interview: P=.002). The median time to collect HEART score data was 23 (IQR 18-31) minutes and to complete an interview was 64 (IQR 53-77) minutes. The main reasons for discontinuing the interview prior to completion (n=352) were discharge from the ED (101, 28.7%) and tiredness (95, 27.0%). Conclusions: A majority of patients with acute chest pain can interact effectively with CHT on a tablet in the ED to provide sufficient data for risk stratification with a well-established risk score. The utility was somewhat lower in patients 70 years and older, in patients arriving by ambulance, and in patients without ongoing chest pain. Further studies are warranted to assess whether CHT can contribute to improved management and prognosis in this large patient group. Trial Registration: ClinicalTrials.gov NCT03439449; https://clinicaltrials.gov/ct2/show/NCT03439449 International Registered Report Identifier (IRRID): RR2-10.1136/bmjopen-2019-031871 UR - https://www.jmir.org/2021/4/e25493 UR - http://dx.doi.org/10.2196/25493 UR - http://www.ncbi.nlm.nih.gov/pubmed/33904821 ID - info:doi/10.2196/25493 ER - TY - JOUR AU - Schmieding, L. Malte AU - Mörgeli, Rudolf AU - Schmieding, L. Maike A. AU - Feufel, A. Markus AU - Balzer, Felix PY - 2021/3/10 TI - Benchmarking Triage Capability of Symptom Checkers Against That of Medical Laypersons: Survey Study JO - J Med Internet Res SP - e24475 VL - 23 IS - 3 KW - digital health KW - triage KW - symptom checker KW - patient-centered care KW - eHealth apps KW - mobile phone KW - decision support systems KW - clinical KW - consumer health information KW - health literacy N2 - Background: Symptom checkers (SCs) are tools developed to provide clinical decision support to laypersons. Apart from suggesting probable diagnoses, they commonly advise when users should seek care (triage advice). SCs have become increasingly popular despite prior studies rating their performance as mediocre. To date, it is unclear whether SCs can triage better than those who might choose to use them. Objective: This study aims to compare triage accuracy between SCs and their potential users (ie, laypersons). Methods: On Amazon Mechanical Turk, we recruited 91 adults from the United States who had no professional medical background. In a web-based survey, the participants evaluated 45 fictitious clinical case vignettes. Data for 15 SCs that had processed the same vignettes were obtained from a previous study. As main outcome measures, we assessed the accuracy of the triage assessments made by participants and SCs for each of the three triage levels (ie, emergency care, nonemergency care, self-care) and overall, the proportion of participants outperforming each SC in terms of accuracy, and the risk aversion of participants and SCs by comparing the proportion of cases that were overtriaged. Results: The mean overall triage accuracy was similar for participants (60.9%, SD 6.8%; 95% CI 59.5%-62.3%) and SCs (58%, SD 12.8%). Most participants outperformed all but 5 SCs. On average, SCs more reliably detected emergencies (80.6%, SD 17.9%) than laypersons did (67.5%, SD 16.4%; 95% CI 64.1%-70.8%). Although both SCs and participants struggled with cases requiring self-care (the least urgent triage category), SCs more often wrongly classified these cases as emergencies (43/174, 24.7%) compared with laypersons (56/1365, 4.10%). Conclusions: Most SCs had no greater triage capability than an average layperson, although the triage accuracy of the five best SCs was superior to the accuracy of most participants. SCs might improve early detection of emergencies but might also needlessly increase resource utilization in health care. Laypersons sometimes require support in deciding when to rely on self-care but it is in that very situation where SCs perform the worst. Further research is needed to determine how to best combine the strengths of humans and SCs. UR - https://www.jmir.org/2021/3/e24475 UR - http://dx.doi.org/10.2196/24475 UR - http://www.ncbi.nlm.nih.gov/pubmed/33688845 ID - info:doi/10.2196/24475 ER - TY - JOUR AU - Shehzad, Aaqib AU - Rockwood, Kenneth AU - Stanley, Justin AU - Dunn, Taylor AU - Howlett, E. Susan PY - 2020/11/11 TI - Use of Patient-Reported Symptoms from an Online Symptom Tracking Tool for Dementia Severity Staging: Development and Validation of a Machine Learning Approach JO - J Med Internet Res SP - e20840 VL - 22 IS - 11 KW - dementia stage KW - Alzheimer disease KW - mild cognitive impairment KW - machine learning N2 - Background: SymptomGuide Dementia (DGI Clinical Inc) is a publicly available online symptom tracking tool to support caregivers of persons living with dementia. The value of such data are enhanced when the specific dementia stage is identified. Objective: We aimed to develop a supervised machine learning algorithm to classify dementia stages based on tracked symptoms. Methods: We employed clinical data from 717 people from 3 sources: (1) a memory clinic; (2) long-term care; and (3) an open-label trial of donepezil in vascular and mixed dementia (VASPECT). Symptoms were captured with SymptomGuide Dementia. A clinician classified participants into 4 groups using either the Functional Assessment Staging Test or the Global Deterioration Scale as mild cognitive impairment, mild dementia, moderate dementia, or severe dementia. Individualized symptom profiles from the pooled data were used to train machine learning models to predict dementia severity. Models trained with 6 different machine learning algorithms were compared using nested cross-validation to identify the best performing model. Model performance was assessed using measures of balanced accuracy, precision, recall, Cohen ?, area under the receiver operating characteristic curve (AUROC), and area under the precision-recall curve (AUPRC). The best performing algorithm was used to train a model optimized for balanced accuracy. Results: The study population was mostly female (424/717, 59.1%), older adults (mean 77.3 years, SD 10.6, range 40-100) with mild to moderate dementia (332/717, 46.3%). Age, duration of symptoms, 37 unique dementia symptoms, and 10 symptom-derived variables were used to distinguish dementia stages. A model trained with a support vector machine learning algorithm using a one-versus-rest approach showed the best performance. The correct dementia stage was identified with 83% balanced accuracy (Cohen ?=0.81, AUPRC 0.91, AUROC 0.96). The best performance was seen when classifying severe dementia (AUROC 0.99). Conclusions: A supervised machine learning algorithm exhibited excellent performance in identifying dementia stages based on dementia symptoms reported in an online environment. This novel dementia staging algorithm can be used to describe dementia stage based on user-reported symptoms. This type of symptom recording offers real-world data that reflect important symptoms in people with dementia. UR - http://www.jmir.org/2020/11/e20840/ UR - http://dx.doi.org/10.2196/20840 UR - http://www.ncbi.nlm.nih.gov/pubmed/33174853 ID - info:doi/10.2196/20840 ER - TY - JOUR AU - Munsch, Nicolas AU - Martin, Alistair AU - Gruarin, Stefanie AU - Nateqi, Jama AU - Abdarahmane, Isselmou AU - Weingartner-Ortner, Rafael AU - Knapp, Bernhard PY - 2020/10/6 TI - Diagnostic Accuracy of Web-Based COVID-19 Symptom Checkers: Comparison Study JO - J Med Internet Res SP - e21299 VL - 22 IS - 10 KW - COVID-19 KW - symptom checkers KW - benchmark KW - digital health KW - symptom KW - chatbot KW - accuracy N2 - Background: A large number of web-based COVID-19 symptom checkers and chatbots have been developed; however, anecdotal evidence suggests that their conclusions are highly variable. To our knowledge, no study has evaluated the accuracy of COVID-19 symptom checkers in a statistically rigorous manner. Objective: The aim of this study is to evaluate and compare the diagnostic accuracies of web-based COVID-19 symptom checkers. Methods: We identified 10 web-based COVID-19 symptom checkers, all of which were included in the study. We evaluated the COVID-19 symptom checkers by assessing 50 COVID-19 case reports alongside 410 non?COVID-19 control cases. A bootstrapping method was used to counter the unbalanced sample sizes and obtain confidence intervals (CIs). Results are reported as sensitivity, specificity, F1 score, and Matthews correlation coefficient (MCC). Results: The classification task between COVID-19?positive and COVID-19?negative for ?high risk? cases among the 460 test cases yielded (sorted by F1 score): Symptoma (F1=0.92, MCC=0.85), Infermedica (F1=0.80, MCC=0.61), US Centers for Disease Control and Prevention (CDC) (F1=0.71, MCC=0.30), Babylon (F1=0.70, MCC=0.29), Cleveland Clinic (F1=0.40, MCC=0.07), Providence (F1=0.40, MCC=0.05), Apple (F1=0.29, MCC=-0.10), Docyet (F1=0.27, MCC=0.29), Ada (F1=0.24, MCC=0.27) and Your.MD (F1=0.24, MCC=0.27). For ?high risk? and ?medium risk? combined the performance was: Symptoma (F1=0.91, MCC=0.83) Infermedica (F1=0.80, MCC=0.61), Cleveland Clinic (F1=0.76, MCC=0.47), Providence (F1=0.75, MCC=0.45), Your.MD (F1=0.72, MCC=0.33), CDC (F1=0.71, MCC=0.30), Babylon (F1=0.70, MCC=0.29), Apple (F1=0.70, MCC=0.25), Ada (F1=0.42, MCC=0.03), and Docyet (F1=0.27, MCC=0.29). Conclusions: We found that the number of correctly assessed COVID-19 and control cases varies considerably between symptom checkers, with different symptom checkers showing different strengths with respect to sensitivity and specificity. A good balance between sensitivity and specificity was only achieved by two symptom checkers. UR - http://www.jmir.org/2020/10/e21299/ UR - http://dx.doi.org/10.2196/21299 UR - http://www.ncbi.nlm.nih.gov/pubmed/33001828 ID - info:doi/10.2196/21299 ER -