@Article{info:doi/10.2196/64028, author="Schmieding, L. Malte and Kopka, Marvin and Bolanaki, Myrto and Napierala, Hendrik and Altendorf, B. Maria and Kuschick, Doreen and Piper, K. Sophie and Scatturin, Lennart and Schmidt, Konrad and Schorr, Claudia and Thissen, Alica and W{\"a}scher, Cornelia and Heintze, Christoph and M{\"o}ckel, Martin and Balzer, Felix and Slagman, Anna", title="Impact of a Symptom Checker App on Patient-Physician Interaction Among Self-Referred Walk-In Patients in the Emergency Department: Multicenter, Parallel-Group, Randomized, Controlled Trial", journal="J Med Internet Res", year="2025", month="Apr", day="2", volume="27", pages="e64028", keywords="digital health", keywords="triage", keywords="symptom checker", keywords="patient-centered care", keywords="eHealth apps", keywords="mobile phone", keywords="decision support systems", keywords="consumer health information", keywords="health literacy", keywords="randomized controlled trials", keywords="null results", keywords="emergency care", keywords="patient-physician-interaction", keywords="patient satisfaction", abstract="Background: Symptom checker apps (SCAs) are layperson-facing tools that advise on whether and where to seek care, or possible diagnoses. Previous research has primarily focused on evaluating the accuracy, safety, and usability of their recommendations. However, studies examining SCAs' impact on clinical care, including the patient-physician interaction and satisfaction with care, remain scarce. Objective: This study aims to evaluate the effects of an SCA on satisfaction with the patient-physician interaction in acute care settings. Additionally, we examined its influence on patients' anxiety and trust in the treating physician. Methods: This parallel-group, randomized controlled trial was conducted at 2 emergency departments of an academic medical center and an emergency practice in Berlin, Germany. Low-acuity patients seeking care at these sites were randomly assigned to either self-assess their health complaints using a widely available commercial SCA (Ada Health) before their first encounter with the treating physician or receive usual care. The primary endpoint was patients' satisfaction with the patient-physician interaction, measured by the Patient Satisfaction Questionnaire (PSQ). The secondary outcomes were patients' satisfaction with care, their anxiety levels, and physicians' satisfaction with the patient-physician interaction. We used linear mixed models to assess the statistical significance of primary and secondary outcomes. Exploratory descriptive analyses examined patients' and physicians' perceptions of the SCA's utility and the frequency of patients questioning their physician's authority. Results: Between April 11, 2022, and January 25, 2023, we approached 665 patients. A total of 363 patients were included in the intention-to-treat analysis of the primary outcome (intervention: n=173, control: n=190). PSQ scores in the intervention group were similar to those in the control group (mean 78.5, SD 20.0 vs mean 80.8, SD 19.6; estimated difference --2.4, 95\% CI --6.3 to 1.1, P=.24). Secondary outcomes, including patients' and physicians' satisfaction with care and patient anxiety, showed no significant group differences (all P>.05). Patients in the intervention group were more likely to report that the SCA had a beneficial (66/164, 40.2\%) rather than a detrimental (3/164, 1.8\%) impact on the patient-physician interaction, with most reporting no effect (95/164, 57.9\%). Similar patterns were observed regarding the SCA's perceived effect on care. In both groups, physicians rarely reported that their authority had been questioned by a patient (intervention: 2/188, 1.1\%; control: 4/184, 2.2\%). While physicians more often found the SCA helpful rather than unhelpful, the majority indicated it was neither helpful nor unhelpful for the encounter. Conclusions: We found no evidence that the SCA improved satisfaction with the patient-physician interaction or care in an acute care setting. By contrast, both patients and their treating physicians predominantly described the SCA's impact as beneficial. Our study did not identify negative effects of SCA use commonly reported in the literature, such as increased anxiety or diminished trust in health care professionals. Trial Registration: German Clinical Trial Register DRKS00028598; https://drks.de/search/en/trial/DRKS00028598/entails International Registered Report Identifier (IRRID): RR2-10.1186/s13063-022-06688-w ", doi="10.2196/64028", url="https://www.jmir.org/2025/1/e64028" } @Article{info:doi/10.2196/60647, author="Koch, Roland and Steffen, Marie-Theres and Wetzel, Anna-Jasmin and Preiser, Christine and Klemmt, Malte and Ehni, Hans-J{\"o}rg and Mueller, Regina and Joos, Stefanie", title="Exploring Laypersons' Experiences With a Mobile Symptom Checker App as an Interface Between eHealth Literacy, Health Literacy, and Health-Related Behavior: Qualitative Interview Study", journal="JMIR Form Res", year="2025", month="Mar", day="21", volume="9", pages="e60647", keywords="symptom checker apps", keywords="health literacy", keywords="eHealth literacy", keywords="qualitative research", keywords="interview study", keywords="artificial intelligence", keywords="AI", abstract="Background: Symptom checkers aim to help users recognize medical symptoms and recommend actions. However, they are not yet reliable for self-triage or diagnostics. Health literacy plays a role in their use, but the process from symptom recognition to health care consultation remains unclear. Objective: This qualitative observatory study explored how laypersons use symptom checkers, focusing on the process of use, entry points and outcomes, and the role of health literacy. Laypersons are defined as individuals who are neither medical professionals nor developers of such apps. Three research questions were addressed: (1) How do such users describe the process of using symptom checkers? (2) What are entry points and possible outcomes of symptom checker app use? (3) How are health literacy and eHealth literacy expressed during the use of symptom checker apps? Methods: As part of the Ethical, Legal, and Social Implications of Symptom Checker Apps in Primary Health Care project, 15 laypersons (n=9, 60\% female and n=6, 40\% male; mean age 30.7, SD 13.6 years) were interviewed about their experiences with the symptom checker Ada. The interviews were analyzed using an integrative approach combining social positioning, agency, and the Rubicon model as a heuristic framework. Results: App use follows a cyclic process comprising 4 steps: motivation (influenced by biography and context), intention formation (assigning a purpose), intention implementation (recruiting resources), and evaluation (transforming interactions into health-related insights). Biographical, social, and contextual factors shape process initiation. Users use symptom checkers for 3 main purposes: understanding their condition, receiving recommendations for action, and documenting or communicating health-related information. Each purpose requires specific planning and integration into health-related behaviors drawing on personal, social, and technological resources. Evaluation depends on contextual factors, app outputs, and the outcomes of users' health-related actions. Users assess whether the app aligns with their expectations, condition severity, and previous experiences, with health literacy playing a critical role in validation processes. Conclusions: Symptom checker use is a complex, cyclic process shaped by context, biography, and health literacy. Users are motivated by health concerns influenced by personal, social, and contextual factors, with trust and attitudes impacting initial engagement. Intention formation reflects a balance between user skills and context, where app outputs inform decisions but may not always lead to action, especially in ambiguous situations. Users rely on personal resources and social networks to integrate app use into health-related behaviors, highlighting the limitations of symptom checkers in providing social or empathetic support. Symptom checkers have the potential to serve as an interface between users and health care, but future development must address the complexity of their use to unlock this potential. International Registered Report Identifier (IRRID): RR2-10.2196/34026 ", doi="10.2196/60647", url="https://formative.jmir.org/2025/1/e60647" } @Article{info:doi/10.2196/65469, author="Wickham, P. Aidan and Hewings-Martin, Yella and Goddard, GB Frederick and Rodgers, K. Allison and Cunningham, C. Adam and Prentice, Carley and Wilks, Octavia and Kaplan, C. Yusuf and Marhol, Andrei and Meczner, Andr{\'a}s and Stsefanovich, Heorhi and Klepchukova, Anna and Zhaunova, Liudmila", title="Exploring Self-Reported Symptoms for Developing and Evaluating Digital Symptom Checkers for Polycystic Ovarian Syndrome, Endometriosis, and Uterine Fibroids: Exploratory Survey Study", journal="JMIR Form Res", year="2024", month="Dec", day="12", volume="8", pages="e65469", keywords="polycystic ovary syndrome", keywords="PCOS", keywords="self-assessment", keywords="self-reported", keywords="endometriosis", keywords="uterine fibroids", keywords="symptoms", keywords="digital symptom checker", keywords="women's health", keywords="gynecological conditions", keywords="reproductive health", abstract="Background: Reproductive health conditions such as polycystic ovary syndrome (PCOS), endometriosis, and uterine fibroids pose a significant burden to people who menstruate, health care systems, and economies. Despite clinical guidelines for each condition, prolonged delays in diagnosis are commonplace, resulting in an increase to health care costs and risk of health complications. Symptom checker apps have the potential to significantly reduce time to diagnosis by providing users with health information and tools to better understand their symptoms. Objective: This study aims to study the prevalence and predictive importance of self-reported symptoms of PCOS, endometriosis, and uterine fibroids, and to explore the efficacy of 3 symptom checkers (developed by Flo Health UK Limited) that use self-reported symptoms when screening for each condition. Methods: Flo's symptom checkers were transcribed into separate web-based surveys for PCOS, endometriosis, and uterine fibroids, asking respondents their diagnostic history for each condition. Participants were aged 18 years or older, female, and living in the United States. Participants either had a confirmed diagnosis (condition-positive) and reported symptoms retrospectively as experienced at the time of diagnosis, or they had not been examined for the condition (condition-negative) and reported their current symptoms as experienced at the time of surveying. Symptom prevalence was calculated for each condition based on the surveys. Least absolute shrinkage and selection operator regression was used to identify key symptoms for predicting each condition. Participants' symptoms were processed by Flo's 3 single-condition symptom checkers, and accuracy was assessed by comparing the symptom checker output with the participant's condition designation. Results: A total of 1317 participants were included with 418, 476, and 423 in the PCOS, endometriosis, and uterine fibroids groups, respectively. The most prevalent symptoms for PCOS were fatigue (92\%), feeling anxious (87\%), BMI over 25 (84\%); for endometriosis: very regular lower abdominal pain (89\%), fatigue (85\%), and referred lower back pain (80\%); for uterine fibroids: fatigue (76\%), bloating (69\%), and changing sanitary protection often (68\%). Symptoms of anovulation and amenorrhea (long periods, irregular cycles, and absent periods), and hyperandrogenism (excess hair on chin and abdomen, scalp hair loss, and BMI over 25) were identified as the most predictive symptoms for PCOS, while symptoms related to abdominal pain and the effect pain has on life, bleeding, and fertility complications were among the most predictive symptoms for both endometriosis and uterine fibroids. Symptom checker accuracy was 78\%, 73\%, and 75\% for PCOS, endometriosis, and uterine fibroids, respectively. Conclusions: This exploratory study characterizes self-reported symptomatology and identifies the key predictive symptoms for 3 reproductive conditions. The Flo symptom checkers were evaluated using real, self-reported symptoms and demonstrated high levels of accuracy. ", doi="10.2196/65469", url="https://formative.jmir.org/2024/1/e65469" } @Article{info:doi/10.2196/55161, author="Wetzel, Anna-Jasmin and Preiser, Christine and M{\"u}ller, Regina and Joos, Stefanie and Koch, Roland and Henking, Tanja and Haumann, Hannah", title="Unveiling Usage Patterns and Explaining Usage of Symptom Checker Apps: Explorative Longitudinal Mixed Methods Study", journal="J Med Internet Res", year="2024", month="Dec", day="9", volume="26", pages="e55161", keywords="self-triage", keywords="eHealth", keywords="self-diagnosis", keywords="mHealth", keywords="mobile health", keywords="usage", keywords="patterns", keywords="predicts", keywords="prediction", keywords="symptoms checker", keywords="apps", keywords="applications", keywords="explorative longitudinal study", keywords="self care", keywords="self management", keywords="self-rated", keywords="mixed method", keywords="circumstances", keywords="General Linear Mixed Models", keywords="GLMM", keywords="qualitative data", keywords="content analysis", keywords="Kuckartz", keywords="survey", keywords="participants", keywords="users", abstract="Background: Symptom checker apps (SCA) aim to enable individuals without medical training to classify perceived symptoms and receive guidance on appropriate actions, such as self-care or seeking professional medical attention. However, there is a lack of detailed understanding regarding the contexts in which individuals use SCA and their opinions on these tools. Objective: This mixed methods study aims to explore the circumstances under which medical laypeople use SCA and to identify which aspects users find noteworthy after using SCA. Methods: A total of 48 SCA users documented their medical symptoms, provided open-ended responses, and recorded their SCA use along with other variables over 6 weeks in a longitudinal study. Generalized linear mixed models with and those without regularization were applied to consider the hierarchical structure of the data, and the models' outcomes were evaluated for comparison. Qualitative data were analyzed through Kuckartz qualitative content analysis. Results: Significant predictors of SCA use included the initial occurrence of symptoms, day of measurement (odds ratio [OR] 0.97), self-rated health (OR 0.80, P<.001), and the following International Classification in Primary Care-2--classified symptoms, that are general and unspecified (OR 3.33, P<.001), eye (OR 5.56, P=.001), cardiovascular (OR 8.33, P<.001), musculoskeletal (OR 5.26, P<.001), and skin (OR 4.76, P<.001). The day of measurement and self-rated health showed minor importance due to their small effect sizes. Qualitative analysis highlighted four main themes: (1) reasons for using SCA, (2) diverse affective responses, (3) a broad spectrum of behavioral reactions, and (4) unmet needs including a lack of personalization. Conclusions: The emergence of new and unfamiliar symptoms was a strong determinant for SCA use. Specific International Classification in Primary Care--rated symptom clusters, particularly those related to cardiovascular, eye, skin, general, and unspecified symptoms, were also highly predictive of SCA use. The varied applications of SCA fit into the concept of health literacy as bricolage, where SCA is leveraged as flexible tools by patients based on individual and situational requirements, functioning alongside other health care resources. ", doi="10.2196/55161", url="https://www.jmir.org/2024/1/e55161" } @Article{info:doi/10.2196/57360, author="Preiser, Christine and Radionova, Natalia and {\"O}g, Eylem and Koch, Roland and Klemmt, Malte and M{\"u}ller, Regina and Ranisch, Robert and Joos, Stefanie and Rieger, A. Monika", title="The Doctors, Their Patients, and the Symptom Checker App: Qualitative Interview Study With General Practitioners in Germany", journal="JMIR Hum Factors", year="2024", month="Nov", day="18", volume="11", pages="e57360", keywords="symptom checker app", keywords="qualitative interviews", keywords="general practice", keywords="perceived work-related psychosocial stress", keywords="job satisfaction", keywords="professional identity", keywords="medical diagnosis", abstract="Background: Symptom checkers are designed for laypeople and promise to provide a preliminary diagnosis, a sense of urgency, and a suggested course of action. Objective: We used the international symptom checker app (SCA) Ada App as an example to answer the following question: How do general practitioners (GPs) experience the SCA in relation to the macro, meso, and micro level of their daily work, and how does this interact with work-related psychosocial resources and demands? Methods: We conducted 8 semistructured interviews with GPs in Germany between December 2020 and February 2022. We analyzed the data using the integrative basic method, an interpretative-reconstructive method, to identify core themes and modes of thematization. Results: Although most GPs in this study were open to digitization in health care and their practice, only one was familiar with the SCA. GPs considered the SCA as part of the ``unorganized stage'' of patients' searching about their conditions. Some preferred it to popular search engines. They considered it relevant to their work as soon as the SCA would influence patients' decisions to see a doctor. Some wanted to see the results of the SCA in advance in order to decide on the patient's next steps. GPs described the diagnostic process as guided by shared decision-making, with the GP taking the lead and the patient deciding. They saw diagnosis as an act of making sense of data, which the SCA would not be able to do, despite the huge amounts of data. Conclusions: GPs took a techno-pragmatic view of SCA. They operate in a health care system of increasing scarcity. They saw the SCA as a potential work-related resource if it helped them to reduce administrative tasks and unnecessary patient contacts. The SCA was seen as a potential work-related demand if it increased workload, for example, if it increased patients' anxiety, was too risk-averse, or made patients more insistent on their own opinions. ", doi="10.2196/57360", url="https://humanfactors.jmir.org/2024/1/e57360" } @Article{info:doi/10.2196/54565, author="King, Jean Alicia and Bilardi, Elissa Jade and Towns, Mary Janet and Maddaford, Kate and Fairley, Kincaid Christopher and Chow, F. Eric P. and Phillips, Renee Tiffany", title="User Views on Online Sexual Health Symptom Checker Tool: Qualitative Research", journal="JMIR Form Res", year="2024", month="Nov", day="4", volume="8", pages="e54565", keywords="sexual health", keywords="sexually transmitted diseases", keywords="risk assessment", keywords="risk factors", keywords="smartphone apps", keywords="help-seeking behavior", keywords="health literacy", keywords="information seeking behavior", abstract="Background: Delayed diagnosis and treatment of sexually transmitted infections (STIs) contributes to poorer health outcomes and onward transmission to sexual partners. Access to best-practice sexual health care may be limited by barriers such as cost, distance to care providers, sexual stigma, and trust in health care providers. Online assessments of risk offer a novel means of supporting access to evidence-based sexual health information, testing, and treatment by providing more individualized sexual health information based on user inputs. Objective: This developmental evaluation aims to find potential users' views and experiences in relation to an online assessment of risk, called iSpySTI (Melbourne Sexual Health Center), including the likely impacts of use. Methods: Individuals presenting with urogenital symptoms to a specialist sexual health clinic were given the opportunity to trial a web-based, Bayesian-powered tool that provides a list of 2 to 4 potential causes of their symptoms based on inputs of known STI risk factors and symptoms. Those who tried the tool were invited to participate in a once-off, semistructured research interview. Descriptive, action, and emotion coding informed the comparative analysis of individual cases. Results: Findings from interviews with 14 people who had used the iSpySTI tool support the superiority of the online assessment of STI risk compared to existing sources of sexual health information (eg, internet search engines) in providing trusted and probabilistic information to users. Additionally, potential users reported benefits to their emotional well-being in the intervening period between noticing symptoms and being able to access care. Differences in current and imagined urgency of health care seeking and emotional impacts were found based on clinical diagnosis (eg, non-STI, curable and incurable but treatable STIs) and whether participants were born in Australia or elsewhere. Conclusions: Online assessments of risk provide users experiencing urogenital symptoms with more individualized and evidence-based health information that can improve their health care--seeking and provide reassurance in the period before they can access care. ", doi="10.2196/54565", url="https://formative.jmir.org/2024/1/e54565" } @Article{info:doi/10.2196/59061, author="Zhu, Siying and Dong, Yan and Li, Yumei and Wang, Hong and Jiang, Xue and Guo, Mingen and Fan, Tiantian and Song, Yalan and Zhou, Ying and Han, Yuan", title="Experiences of Patients With Cancer Using Electronic Symptom Management Systems: Qualitative Systematic Review and Meta-Synthesis", journal="J Med Internet Res", year="2024", month="Oct", day="28", volume="26", pages="e59061", keywords="electronic symptom management systems", keywords="oncology care", keywords="access to care", keywords="symptom monitoring", keywords="self-management", keywords="patient-reported outcomes", keywords="health-related outcomes", keywords="quality of life", abstract="Background: There are numerous symptoms related to cancer and its treatments that can affect the psychosomatic health and quality of life of patients with cancer. The use of electronic symptom management systems (ESMSs) can help patients with cancer monitor and manage their symptoms effectively, improving their health-related outcomes. However, patients' adhesion to ESMSs decreases over time, and little is known about their real experiences with them. Therefore, it is necessary to gain a deep understanding of patients' experiences with ESMSs. Objective: The purpose of this systematic review was to synthesize qualitative studies on the experiences of patients with cancer using ESMSs. Methods: A total of 12 electronic databases, including PubMed, Web of Science, Cochrane Library, EBSCOhost, Embase, PsycINFO, ProQuest, Scopus, Wanfang database, CNKI, CBM, and VIP, were searched to collect relevant studies from the earliest available record until January 2, 2024. Qualitative and mixed methods studies published in English or Chinese were included. The PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses statement checklist) and the ENTREQ (Enhancing Transparency in Reporting the Synthesis of Qualitative Research) statement were used to improve transparency in reporting the synthesis of the qualitative research. The Critical Appraisal Skills Program (CASP) checklist was used to appraise the methodological quality of the included studies, and a meta-synthesis was conducted to interpret and synthesize the findings. Results: A total of 21 studies were included in the meta-synthesis. The experiences of patients with cancer using ESMSs were summarized into three major categories: (1) perceptions and attitudes toward ESMSs; (2) the value of ESMSs; and (3) barriers, requirements, and suggestions for ESMSs. Subsequently, 10 subcategories emerged from the 3 major categories. The meta-synthesis revealed that patients with cancer had both positive and negative experiences with ESMSs. In general, patients recognized the value of ESMSs in symptom assessment and management and were willing to use them, but they still encountered barriers and wanted them to be improved. Conclusions: This systematic review provides implications for developing future ESMSs that improve health-related outcomes for patients with cancer. Future research should focus on strengthening electronic equipment and technical support for ESMSs, improving their functional contents and participation forms, and developing personalized applications tailored to the specific needs and characteristics of patients with cancer. Trial Registration: PROSPERO CRD42023421730; https://www.crd.york.ac.uk/prospero/display\_record.php?RecordID=421730 ", doi="10.2196/59061", url="https://www.jmir.org/2024/1/e59061", url="http://www.ncbi.nlm.nih.gov/pubmed/39466301" } @Article{info:doi/10.2196/55099, author="Liu, Ville and Kaila, Minna and Koskela, Tuomas", title="Triage Accuracy and the Safety of User-Initiated Symptom Assessment With an Electronic Symptom Checker in a Real-Life Setting: Instrument Validation Study", journal="JMIR Hum Factors", year="2024", month="Sep", day="26", volume="11", pages="e55099", keywords="nurse triage", keywords="emergency department triage", keywords="triage", keywords="symptom assessment", keywords="health services accessibility", keywords="telemedicine", keywords="eHealth", keywords="remote consultation", keywords="primary health care", keywords="primary care", keywords="urgent care", keywords="health services research", keywords="health services", abstract="Background: Previous studies have evaluated the accuracy of the diagnostics of electronic symptom checkers (ESCs) and triage using clinical case vignettes. National Omaolo digital services (Omaolo) in Finland consist of an ESC for various symptoms. Omaolo is a medical device with a Conformit{\'e} Europ{\'e}enne marking (risk class: IIa), based on Duodecim Clinical Decision Support, EBMEDS. Objective: This study investigates how well triage performed by the ESC nurse triage within the chief symptom list available in Omaolo (anal region symptoms, cough, diarrhea, discharge from the eye or watery or reddish eye, headache, heartburn, knee symptom or injury, lower back pain or injury, oral health, painful or blocked ear, respiratory tract infection, sexually transmitted disease, shoulder pain or stiffness or injury, sore throat or throat symptom, and urinary tract infection). In addition, the accuracy, specificity, sensitivity, and safety of the Omaolo ESC were assessed. Methods: This is a clinical validation study in a real-life setting performed at multiple primary health care (PHC) centers across Finland. The included units were of the walk-in model of primary care, where no previous phone call or contact was required. Upon arriving at the PHC center, users (patients) answered the ESC questions and received a triage recommendation; a nurse then assessed their triage. Findings on 877 patients were analyzed by matching the ESC recommendations with triage by the triage nurse. Results: Safe assessments by the ESC accounted for 97.6\% (856/877; 95\% CI 95.6\%-98.0\%) of all assessments made. The mean of the exact match for all symptom assessments was 53.7\% (471/877; 95\% CI 49.2\%-55.9\%). The mean value of the exact match or overly conservative but suitable for all (ESC's assessment was 1 triage level higher than the nurse's triage) symptom assessments was 66.6\% (584/877; 95\% CI 63.4\%-69.7\%). When the nurse concluded that urgent treatment was needed, the ESC's exactly matched accuracy was 70.9\% (244/344; 95\% CI 65.8\%-75.7\%). Sensitivity for the Omaolo ESC was 62.6\% and specificity of 69.2\%. A total of 21 critical assessments were identified for further analysis: there was no indication of compromised patient safety. Conclusions: The primary objectives of this study were to evaluate the safety and to explore the accuracy, specificity, and sensitivity of the Omaolo ESC. The results indicate that the ESC is safe in a real-life setting when appraised with assessments conducted by triage nurses. Furthermore, the Omaolo ESC exhibits the potential to guide patients to appropriate triage destinations effectively, helping them to receive timely and suitable care. International Registered Report Identifier (IRRID): RR2-10.2196/41423 ", doi="10.2196/55099", url="https://humanfactors.jmir.org/2024/1/e55099" } @Article{info:doi/10.2196/56514, author="Knitza, Johannes and Hasanaj, Ragip and Beyer, Jonathan and Ganzer, Franziska and Slagman, Anna and Bolanaki, Myrto and Napierala, Hendrik and Schmieding, L. Malte and Al-Zaher, Nizam and Orlemann, Till and Muehlensiepen, Felix and Greenfield, Julia and Vuillerme, Nicolas and Kuhn, Sebastian and Schett, Georg and Achenbach, Stephan and Dechant, Katharina", title="Comparison of Two Symptom Checkers (Ada and Symptoma) in the Emergency Department: Randomized, Crossover, Head-to-Head, Double-Blinded Study", journal="J Med Internet Res", year="2024", month="Aug", day="20", volume="26", pages="e56514", keywords="symptom checker", keywords="triage", keywords="emergency", keywords="eHealth", keywords="diagnostic accuracy", keywords="apps, health service research", keywords="decision support system", abstract="Background: Emergency departments (EDs) are frequently overcrowded and increasingly used by nonurgent patients. Symptom checkers (SCs) offer on-demand access to disease suggestions and recommended actions, potentially improving overall patient flow. Contrary to the increasing use of SCs, there is a lack of supporting evidence based on direct patient use. Objective: This study aimed to compare the diagnostic accuracy, safety, usability, and acceptance of 2 SCs, Ada and Symptoma. Methods: A randomized, crossover, head-to-head, double-blinded study including consecutive adult patients presenting to the ED at University Hospital Erlangen. Patients completed both SCs, Ada and Symptoma. The primary outcome was the diagnostic accuracy of SCs. In total, 6 blinded independent expert raters classified diagnostic concordance of SC suggestions with the final discharge diagnosis as (1) identical, (2) plausible, or (3) diagnostically different. SC suggestions per patient were additionally classified as safe or potentially life-threatening, and the concordance of Ada's and physician-based triage category was assessed. Secondary outcomes were SC usability (5-point Likert-scale: 1=very easy to use to 5=very difficult to use) and SC acceptance net promoter score (NPS). Results: A total of 450 patients completed the study between April and November 2021. The most common chief complaint was chest pain (160/437, 37\%). The identical diagnosis was ranked first (or within the top 5 diagnoses) by Ada and Symptoma in 14\% (59/437; 27\%, 117/437) and 4\% (16/437; 13\%, 55/437) of patients, respectively. An identical or plausible diagnosis was ranked first (or within the top 5 diagnoses) by Ada and Symptoma in 58\% (253/437; 75\%, 329/437) and 38\% (164/437; 64\%, 281/437) of patients, respectively. Ada and Symptoma did not suggest potentially life-threatening diagnoses in 13\% (56/437) and 14\% (61/437) of patients, respectively. Ada correctly triaged, undertriaged, and overtriaged 34\% (149/437), 13\% (58/437), and 53\% (230/437) of patients, respectively. A total of 88\% (385/437) and 78\% (342/437) of participants rated Ada and Symptoma as very easy or easy to use, respectively. Ada's NPS was --34 (55\% [239/437] detractors; 21\% [93/437] promoters) and Symptoma's NPS was --47 (63\% [275/437] detractors and 16\% [70/437]) promoters. Conclusions: Ada demonstrated a higher diagnostic accuracy than Symptoma, and substantially more patients would recommend Ada and assessed Ada as easy to use. The high number of unrecognized potentially life-threatening diagnoses by both SCs and inappropriate triage advice by Ada was alarming. Overall, the trustworthiness of SC recommendations appears questionable. SC authorization should necessitate rigorous clinical evaluation studies to prevent misdiagnoses, fatal triage advice, and misuse of scarce medical resources. Trial Registration: German Register of Clinical Trials DRKS00024830; https://drks.de/search/en/trial/DRKS00024830 ", doi="10.2196/56514", url="https://www.jmir.org/2024/1/e56514" } @Article{info:doi/10.2196/49907, author="Meczner, Andr{\'a}s and Cohen, Nathan and Qureshi, Aleem and Reza, Maria and Sutaria, Shailen and Blount, Emily and Bagyura, Zsolt and Malak, Tamer", title="Controlling Inputter Variability in Vignette Studies Assessing Web-Based Symptom Checkers: Evaluation of Current Practice and Recommendations for Isolated Accuracy Metrics", journal="JMIR Form Res", year="2024", month="May", day="31", volume="8", pages="e49907", keywords="symptom checker", keywords="accuracy", keywords="vignette studies", keywords="variability", keywords="methods", keywords="triage", keywords="evaluation", keywords="vignette", keywords="performance", keywords="metrics", keywords="mobile phone", abstract="Background: The rapid growth of web-based symptom checkers (SCs) is not matched by advances in quality assurance. Currently, there are no widely accepted criteria assessing SCs' performance. Vignette studies are widely used to evaluate SCs, measuring the accuracy of outcome. Accuracy behaves as a composite metric as it is affected by a number of individual SC- and tester-dependent factors. In contrast to clinical studies, vignette studies have a small number of testers. Hence, measuring accuracy alone in vignette studies may not provide a reliable assessment of performance due to tester variability. Objective: This study aims to investigate the impact of tester variability on the accuracy of outcome of SCs, using clinical vignettes. It further aims to investigate the feasibility of measuring isolated aspects of performance. Methods: Healthily's SC was assessed using 114 vignettes by 3 groups of 3 testers who processed vignettes with different instructions: free interpretation of vignettes (free testers), specified chief complaints (partially free testers), and specified chief complaints with strict instruction for answering additional symptoms (restricted testers). $\kappa$ statistics were calculated to assess agreement of top outcome condition and recommended triage. Crude and adjusted accuracy was measured against a gold standard. Adjusted accuracy was calculated using only results of consultations identical to the vignette, following a review and selection process. A feasibility study for assessing symptom comprehension of SCs was performed using different variations of 51 chief complaints across 3 SCs. Results: Intertester agreement of most likely condition and triage was, respectively, 0.49 and 0.51 for the free tester group, 0.66 and 0.66 for the partially free group, and 0.72 and 0.71 for the restricted group. For the restricted group, accuracy ranged from 43.9\% to 57\% for individual testers, averaging 50.6\% (SD 5.35\%). Adjusted accuracy was 56.1\%. Assessing symptom comprehension was feasible for all 3 SCs. Comprehension scores ranged from 52.9\% and 68\%. Conclusions: We demonstrated that by improving standardization of the vignette testing process, there is a significant improvement in the agreement of outcome between testers. However, significant variability remained due to uncontrollable tester-dependent factors, reflected by varying outcome accuracy. Tester-dependent factors, combined with a small number of testers, limit the reliability and generalizability of outcome accuracy when used as a composite measure in vignette studies. Measuring and reporting different aspects of SC performance in isolation provides a more reliable assessment of SC performance. We developed an adjusted accuracy measure using a review and selection process to assess data algorithm quality. In addition, we demonstrated that symptom comprehension with different input methods can be feasibly compared. Future studies reporting accuracy need to apply vignette testing standardization and isolated metrics. ", doi="10.2196/49907", url="https://formative.jmir.org/2024/1/e49907", url="http://www.ncbi.nlm.nih.gov/pubmed/38820578" } @Article{info:doi/10.2196/45275, author="Savolainen, Kaisa and Kujala, Sari", title="Testing Two Online Symptom Checkers With Vulnerable Groups: Usability Study to Improve Cognitive Accessibility of eHealth Services", journal="JMIR Hum Factors", year="2024", month="Mar", day="8", volume="11", pages="e45275", keywords="eHealth", keywords="online symptom checkers", keywords="usability", keywords="cognitive accessibility", keywords="web accessibility", keywords="qualitative research", abstract="Background: The popularity of eHealth services has surged significantly, underscoring the importance of ensuring their usability and accessibility for users with diverse needs, characteristics, and capabilities. These services can pose cognitive demands, especially for individuals who are unwell, fatigued, or experiencing distress. Additionally, numerous potentially vulnerable groups, including older adults, are susceptible to digital exclusion and may encounter cognitive limitations related to perception, attention, memory, and language comprehension. Regrettably, many studies overlook the preferences and needs of user groups likely to encounter challenges associated with these cognitive aspects. Objective: This study primarily aims to gain a deeper understanding of cognitive accessibility in the practical context of eHealth services. Additionally, we aimed to identify the specific challenges that vulnerable groups encounter when using eHealth services and determine key considerations for testing these services with such groups. Methods: As a case study of eHealth services, we conducted qualitative usability testing on 2 online symptom checkers used in Finnish public primary care. A total of 13 participants from 3 distinct groups participated in the study: older adults, individuals with mild intellectual disabilities, and nonnative Finnish speakers. The primary research methods used were the thinking-aloud method, questionnaires, and semistructured interviews. Results: We found that potentially vulnerable groups encountered numerous issues with the tested services, with similar problems observed across all 3 groups. Specifically, clarity and the use of terminology posed significant challenges. The services overwhelmed users with excessive information and choices, while the terminology consisted of numerous complex medical terms that were difficult to understand. When conducting tests with vulnerable groups, it is crucial to carefully plan the sessions to avoid being overly lengthy, as these users often require more time to complete tasks. Additionally, testing with vulnerable groups proved to be quite efficient, with results likely to benefit a wider audience as well. Conclusions: Based on the findings of this study, it is evident that older adults, individuals with mild intellectual disability, and nonnative speakers may encounter cognitive challenges when using eHealth services, which can impede or slow down their use and make the services more difficult to navigate. In the worst-case scenario, these challenges may lead to errors in using the services. We recommend expanding the scope of testing to include a broader range of eHealth services with vulnerable groups, incorporating users with diverse characteristics and capabilities who are likely to encounter difficulties in cognitive accessibility. ", doi="10.2196/45275", url="https://humanfactors.jmir.org/2024/1/e45275", url="http://www.ncbi.nlm.nih.gov/pubmed/38457214" } @Article{info:doi/10.2196/39791, author="Lown, Mark and Smith, A. Kirsten and Muller, Ingrid and Woods, Catherine and Maund, Emma and Rogers, Kirsty and Becque, Taeko and Hayward, Gail and Moore, Michael and Little, Paul and Glogowska, Margaret and Hay, Alastair and Stuart, Beth and Mantzourani, Efi and Wilcox, R. Christopher and Thompson, Natalie and Francis, A. Nick", title="Internet Tool to Support Self-Assessment and Self-Swabbing of Sore Throat: Development and Feasibility Study", journal="J Med Internet Res", year="2023", month="Dec", day="8", volume="25", pages="e39791", keywords="sore throat", keywords="ear, neck, throat", keywords="pharyngitis", keywords="self-assessment", keywords="self-swabbing", keywords="primary care", keywords="throat", keywords="development", keywords="feasibility", keywords="web-based tool", keywords="tool", keywords="antibiotics", keywords="develop", keywords="self-assess", keywords="symptoms", keywords="diagnostic testing", keywords="acceptability", keywords="adult", keywords="children", keywords="social media", keywords="saliva", keywords="swab", keywords="inflammation", keywords="samples", keywords="support", keywords="clinical", keywords="antibiotic", keywords="web-based support tool", keywords="think-aloud", keywords="neck", keywords="tonsil", keywords="tongue", keywords="teeth", keywords="dental", keywords="dentist", keywords="tooth", keywords="laboratory", keywords="lab", keywords="oral", keywords="oral health", keywords="mouth", keywords="mobile phone", abstract="Background: Sore throat is a common problem and a common reason for the overuse of antibiotics. A web-based tool that helps people assess their sore throat, through the use of clinical prediction rules, taking throat swabs or saliva samples, and taking throat photographs, has the potential to improve self-management and help identify those who are the most and least likely to benefit from antibiotics. Objective: We aimed to develop a web-based tool to help patients and parents or carers self-assess sore throat symptoms and take throat photographs, swabs, and saliva samples for diagnostic testing. We then explored the acceptability and feasibility of using the tool in adults and children with sore throats. Methods: We used the Person-Based Approach to develop a web-based tool and then recruited adults and children with sore throats who participated in this study by attending general practices or through social media advertising. Participants self-assessed the presence of FeverPAIN and Centor score criteria and attempted to photograph their throat and take throat swabs and saliva tests. Study processes were observed via video call, and participants were interviewed about their views on using the web-based tool. Self-assessed throat inflammation and pus were compared to clinician evaluation of patients' throat photographs. Results: A total of 45 participants (33 adults and 12 children) were recruited. Of these, 35 (78\%) and 32 (71\%) participants completed all scoring elements for FeverPAIN and Centor scores, respectively, and most (30/45, 67\%) of them reported finding self-assessment relatively easy. No valid response was provided for swollen lymph nodes, throat inflammation, and pus on the throat by 11 (24\%), 9 (20\%), and 13 (29\%) participants respectively. A total of 18 (40\%) participants provided a throat photograph of adequate quality for clinical assessment. Patient assessment of inflammation had a sensitivity of 100\% (3/3) and specificity of 47\% (7/15) compared with the clinician-assessed photographs. For pus on the throat, the sensitivity was 100\% (3/3) and the specificity was 71\% (10/14). A total of 89\% (40/45), 93\% (42/45), 89\% (40/45), and 80\% (30/45) of participants provided analyzable bacterial swabs, viral swabs, saliva sponges, and saliva drool samples, respectively. Participants were generally happy and confident in providing samples, with saliva samples rated as slightly more acceptable than swab samples. Conclusions: Most adult and parent participants were able to use a web-based intervention to assess the clinical features of throat infections and generate scores using clinical prediction rules. However, some had difficulties assessing clinical signs, such as lymph nodes, throat pus, and inflammation, and scores were assessed as sensitive but not specific. Many participants had problems taking photographs of adequate quality, but most were able to take throat swabs and saliva samples. ", doi="10.2196/39791", url="https://www.jmir.org/2023/1/e39791", url="http://www.ncbi.nlm.nih.gov/pubmed/38064265" } @Article{info:doi/10.2196/47621, author="Kuroiwa, Tomoyuki and Sarcon, Aida and Ibara, Takuya and Yamada, Eriku and Yamamoto, Akiko and Tsukamoto, Kazuya and Fujita, Koji", title="The Potential of ChatGPT as a Self-Diagnostic Tool in Common Orthopedic Diseases: Exploratory Study", journal="J Med Internet Res", year="2023", month="Sep", day="15", volume="25", pages="e47621", keywords="ChatGPT", keywords="generative pretrained transformer", keywords="natural language processing", keywords="artificial intelligence", keywords="chatbot", keywords="diagnosis", keywords="self-diagnosis", keywords="accuracy", keywords="precision", keywords="language model", keywords="orthopedic disease", keywords="AI model", keywords="health information", abstract="Background: Artificial intelligence (AI) has gained tremendous popularity recently, especially the use of natural language processing (NLP). ChatGPT is a state-of-the-art chatbot capable of creating natural conversations using NLP. The use of AI in medicine can have a tremendous impact on health care delivery. Although some studies have evaluated ChatGPT's accuracy in self-diagnosis, there is no research regarding its precision and the degree to which it recommends medical consultations. Objective: The aim of this study was to evaluate ChatGPT's ability to accurately and precisely self-diagnose common orthopedic diseases, as well as the degree of recommendation it provides for medical consultations. Methods: Over a 5-day course, each of the study authors submitted the same questions to ChatGPT. The conditions evaluated were carpal tunnel syndrome (CTS), cervical myelopathy (CM), lumbar spinal stenosis (LSS), knee osteoarthritis (KOA), and hip osteoarthritis (HOA). Answers were categorized as either correct, partially correct, incorrect, or a differential diagnosis. The percentage of correct answers and reproducibility were calculated. The reproducibility between days and raters were calculated using the Fleiss $\kappa$ coefficient. Answers that recommended that the patient seek medical attention were recategorized according to the strength of the recommendation as defined by the study. Results: The ratios of correct answers were 25/25, 1/25, 24/25, 16/25, and 17/25 for CTS, CM, LSS, KOA, and HOA, respectively. The ratios of incorrect answers were 23/25 for CM and 0/25 for all other conditions. The reproducibility between days was 1.0, 0.15, 0.7, 0.6, and 0.6 for CTS, CM, LSS, KOA, and HOA, respectively. The reproducibility between raters was 1.0, 0.1, 0.64, --0.12, and 0.04 for CTS, CM, LSS, KOA, and HOA, respectively. Among the answers recommending medical attention, the phrases ``essential,'' ``recommended,'' ``best,'' and ``important'' were used. Specifically, ``essential'' occurred in 4 out of 125, ``recommended'' in 12 out of 125, ``best'' in 6 out of 125, and ``important'' in 94 out of 125 answers. Additionally, 7 out of the 125 answers did not include a recommendation to seek medical attention. Conclusions: The accuracy and reproducibility of ChatGPT to self-diagnose five common orthopedic conditions were inconsistent. The accuracy could potentially be improved by adding symptoms that could easily identify a specific location. Only a few answers were accompanied by a strong recommendation to seek medical attention according to our study standards. Although ChatGPT could serve as a potential first step in accessing care, we found variability in accurate self-diagnosis. Given the risk of harm with self-diagnosis without medical follow-up, it would be prudent for an NLP to include clear language alerting patients to seek expert medical opinions. We hope to shed further light on the use of AI in a future clinical study. ", doi="10.2196/47621", url="https://www.jmir.org/2023/1/e47621", url="http://www.ncbi.nlm.nih.gov/pubmed/37713254" } @Article{info:doi/10.2196/41423, author="Liu, Ville and Koskela, H. Tuomas and Kaila, Minna", title="User-Initiated Symptom Assessment With an Electronic Symptom Checker: Protocol for a Mixed Methods Validation Study", journal="JMIR Res Protoc", year="2023", month="Jul", day="19", volume="12", pages="e41423", keywords="triage", keywords="symptom assessment", keywords="self-care", keywords="health service accessibility", keywords="telemedicine", keywords="health service research", keywords="internet", keywords="validation study", keywords="primary health care", keywords="clinical studies", keywords="telehealth", abstract="Background: The national Omaolo digital social welfare and health care service of Finland provides a symptom checker, Omaolo, which is a medical device (based on Duodecim Clinical Decision Support EBMEDS software) with a CE marking (risk class IIa), manufactured by the government-owned DigiFinland Oy. Users of this service can perform their triage by using the questions in the symptom checker. By completing the symptom checker, the user receives a recommendation for action and a service assessment with appropriate guidance regarding their health problems on the basis of a selected specific symptom in the symptom checker. This allows users to be provided with appropriate health care services, regardless of time and place. Objective: This study describes the protocol for the mixed methods validation process of the symptom checker available in Omaolo digital services. Methods: This is a mixed methods study using quantitative and qualitative methods, which will be part of the clinical validation process that takes place in primary health care centers in Finland. Each organization provides a space where the study and the nurse triage can be done in order to include an unscreened target population of users. The primary health care units provide walk-in model services, where no prior phone call or contact is required. For the validation of the Omaolo symptom checker, case vignettes will be incorporated to supplement the triage accuracy of rare and acute cases that cannot be tested extensively in real-life settings. Vignettes are produced from a variety of clinical sources, and they test the symptom checker in different triage levels by using 1 standardized patient case example. Results: This study plan underwent an ethics review by the regional permission, which was requested from each organization participating in the research, and an ethics committee statement was requested and granted from Pirkanmaa hospital district's ethics committee, which is in accordance with the University of Tampere's regulations. Of 964 clinical user--filled symptom checker assessments, 877 cases were fully completed with a triage result, and therefore, they met the requirements for clinical validation studies. The goal for sufficient data has been reached for most of the chief symptoms. Data collection was completed in September 2019, and the first feasibility and patient experience results were published by the end of 2020. Case vignettes have been identified and are to be completed before further testing the symptom checker. The analysis and reporting are estimated to be finalized in 2024. Conclusions: The primary goals of this multimethod electronic symptom checker study are to assess safety and to provide crucial information regarding the accuracy and usability of the Omaolo electronic symptom checker. To our knowledge, this will be the first study to include real-life clinical cases along with case vignettes. International Registered Report Identifier (IRRID): DERR1-10.2196/41423 ", doi="10.2196/41423", url="https://www.researchprotocols.org/2023/1/e41423", url="http://www.ncbi.nlm.nih.gov/pubmed/37467041" } @Article{info:doi/10.2196/46231, author="Kopka, Marvin and Scatturin, Lennart and Napierala, Hendrik and F{\"u}rstenau, Daniel and Feufel, A. Markus and Balzer, Felix and Schmieding, L. Malte", title="Characteristics of Users and Nonusers of Symptom Checkers in Germany: Cross-Sectional Survey Study", journal="J Med Internet Res", year="2023", month="Jun", day="20", volume="25", pages="e46231", keywords="symptom checker", keywords="cross-sectional study", keywords="user characteristic", keywords="digital public health", keywords="health information seeking", keywords="decision support", keywords="eHealth", keywords="mHealth", keywords="Germany", keywords="mobile health", keywords="health app", keywords="information seeking", keywords="technology use", keywords="usage", keywords="demographic", keywords="perception", keywords="awareness", keywords="adoption", abstract="Background: Previous studies have revealed that users of symptom checkers (SCs, apps that support self-diagnosis and self-triage) are predominantly female, are younger than average, and have higher levels of formal education. Little data are available for Germany, and no study has so far compared usage patterns with people's awareness of SCs and the perception of usefulness. Objective: We explored the sociodemographic and individual characteristics that are associated with the awareness, usage, and perceived usefulness of SCs in the German population. Methods: We conducted a cross-sectional online survey among 1084 German residents in July 2022 regarding personal characteristics and people's awareness and usage of SCs. Using random sampling from a commercial panel, we collected participant responses stratified by gender, state of residence, income, and age to reflect the German population. We analyzed the collected data exploratively. Results: Of all respondents, 16.3\% (177/1084) were aware of SCs and 6.5\% (71/1084) had used them before. Those aware of SCs were younger (mean 38.8, SD 14.6 years, vs mean 48.3, SD 15.7 years), were more often female (107/177, 60.5\%, vs 453/907, 49.9\%), and had higher formal education levels (eg, 72/177, 40.7\%, vs 238/907, 26.2\%, with a university/college degree) than those unaware. The same observation applied to users compared to nonusers. It disappeared, however, when comparing users to nonusers who were aware of SCs. Among users, 40.8\% (29/71) considered these tools useful. Those considering them useful reported higher self-efficacy (mean 4.21, SD 0.66, vs mean 3.63, SD 0.81, on a scale of 1-5) and a higher net household income (mean EUR 2591.63, SD EUR 1103.96 [mean US \$2798.96, SD US \$1192.28], vs mean EUR 1626.60, SD EUR 649.05 [mean US \$1756.73, SD US \$700.97]) than those who considered them not useful. More women considered SCs unhelpful (13/44, 29.5\%) compared to men (4/26, 15.4\%). Conclusions: Concurring with studies from other countries, our findings show associations between sociodemographic characteristics and SC usage in a German sample: users were on average younger, of higher socioeconomic status, and more commonly female compared to nonusers. However, usage cannot be explained by sociodemographic differences alone. It rather seems that sociodemographics explain who is or is not aware of the technology, but those who are aware of SCs are equally likely to use them, independently of sociodemographic differences. Although in some groups (eg, people with anxiety disorder), more participants reported to know and use SCs, they tended to perceive them as less useful. In other groups (eg, male participants), fewer respondents were aware of SCs, but those who used them perceived them to be more useful. Thus, SCs should be designed to fit specific user needs, and strategies should be developed to help reach individuals who could benefit but are not aware of SCs yet. ", doi="10.2196/46231", url="https://www.jmir.org/2023/1/e46231", url="http://www.ncbi.nlm.nih.gov/pubmed/37338970" } @Article{info:doi/10.2196/43803, author="Riboli-Sasco, Eva and El-Osta, Austen and Alaa, Aos and Webber, Iman and Karki, Manisha and El Asmar, Line Marie and Purohit, Katie and Painter, Annabelle and Hayhoe, Benedict", title="Triage and Diagnostic Accuracy of Online Symptom Checkers: Systematic Review", journal="J Med Internet Res", year="2023", month="Jun", day="2", volume="25", pages="e43803", keywords="systematic review", keywords="digital triage", keywords="diagnosis", keywords="online symptom checker", keywords="safety", keywords="accuracy", keywords="mobile phone", abstract="Background: In the context of a deepening global shortage of health workers and, in particular, the COVID-19 pandemic, there is growing international interest in, and use of, online symptom checkers (OSCs). However, the evidence surrounding the triage and diagnostic accuracy of these tools remains inconclusive. Objective: This systematic review aimed to summarize the existing peer-reviewed literature evaluating the triage accuracy (directing users to appropriate services based on their presenting symptoms) and diagnostic accuracy of OSCs aimed at lay users for general health concerns. Methods: Searches were conducted in MEDLINE, Embase, CINAHL, Health Management Information Consortium (HMIC), and Web of Science, as well as the citations of the studies selected for full-text screening. We included peer-reviewed studies published in English between January 1, 2010, and February 16, 2022, with a controlled and quantitative assessment of either or both triage and diagnostic accuracy of OSCs directed at lay users. We excluded tools supporting health care professionals, as well as disease- or specialty-specific OSCs. Screening and data extraction were carried out independently by 2 reviewers for each study. We performed a descriptive narrative synthesis. Results: A total of 21,296 studies were identified, of which 14 (0.07\%) were included. The included studies used clinical vignettes, medical records, or direct input by patients. Of the 14 studies, 6 (43\%) reported on triage and diagnostic accuracy, 7 (50\%) focused on triage accuracy, and 1 (7\%) focused on diagnostic accuracy. These outcomes were assessed based on the diagnostic and triage recommendations attached to the vignette in the case of vignette studies or on those provided by nurses or general practitioners, including through face-to-face and telephone consultations. Both diagnostic accuracy and triage accuracy varied greatly among OSCs. Overall diagnostic accuracy was deemed to be low and was almost always lower than that of the comparator. Similarly, most of the studies (9/13, 69 \%) showed suboptimal triage accuracy overall, with a few exceptions (4/13, 31\%). The main variables affecting the levels of diagnostic and triage accuracy were the severity and urgency of the condition, the use of artificial intelligence algorithms, and demographic questions. However, the impact of each variable differed across tools and studies, making it difficult to draw any solid conclusions. All included studies had at least one area with unclear risk of bias according to the revised Quality Assessment of Diagnostic Accuracy Studies-2 tool. Conclusions: Although OSCs have potential to provide accessible and accurate health advice and triage recommendations to users, more research is needed to validate their triage and diagnostic accuracy before widescale adoption in community and health care settings. Future studies should aim to use a common methodology and agreed standard for evaluation to facilitate objective benchmarking and validation. Trial Registration: PROSPERO CRD42020215210; https://tinyurl.com/3949zw83 ", doi="10.2196/43803", url="https://www.jmir.org/2023/1/e43803", url="http://www.ncbi.nlm.nih.gov/pubmed/37266983" } @Article{info:doi/10.2196/39219, author="Radionova, Natalia and {\"O}g, Eylem and Wetzel, Anna-Jasmin and Rieger, A. Monika and Preiser, Christine", title="Impacts of Symptom Checkers for Laypersons' Self-diagnosis on Physicians in Primary Care: Scoping Review", journal="J Med Internet Res", year="2023", month="May", day="29", volume="25", pages="e39219", keywords="mobile health", keywords="mHealth", keywords="symptom checkers", keywords="artificial intelligence--based technology", keywords="AI-based technology", keywords="self-diagnosis", keywords="general practice", keywords="scoping review", keywords="mobile phone", abstract="Background: Symptom checkers (SCs) for laypersons' self-assessment and preliminary self-diagnosis are widely used by the public. Little is known about the impact of these tools on health care professionals (HCPs) in primary care and their work. This is relevant to understanding how technological changes might affect the working world and how this is linked to work-related psychosocial demands and resources for HCPs. Objective: This scoping review aimed to systematically explore the existing publications on the impacts of SCs on HCPs in primary care and to identify knowledge gaps. Methods: We used the Arksey and O'Malley framework. We based our search string on the participant, concept, and context scheme and searched PubMed (MEDLINE) and CINAHL in January and June 2021. We performed a reference search in August 2021 and a manual search in November 2021. We included publications of peer-reviewed journals that focused on artificial intelligence- or algorithm-based self-diagnosing apps and tools for laypersons and had primary care or nonclinical settings as a relevant context. The characteristics of these studies were described numerically. We used thematic analysis to identify core themes. We followed the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) checklist to report the study. Results: Of the 2729 publications identified through initial and follow-up database searches, 43 full texts were screened for eligibility, of which 9 were included. Further 8 publications were included through manual search. Two publications were excluded after receiving feedback in the peer-review process. Fifteen publications were included in the final sample, which comprised 5 (33\%) commentaries or nonresearch publications, 3 (20\%) literature reviews, and 7 (47\%) research publications. The earliest publications stemmed from 2015. We identified 5 themes. The theme finding prediagnosis comprised the comparison between SCs and physicians. We identified the performance of the diagnosis and the relevance of human factors as topics. In the theme layperson-technology relationship, we identified potentials for laypersons' empowerment and harm through SCs. Our analysis showed potential disruptions of the physician-patient relationship and uncontested roles of HCPs in the theme (impacts on) physician-patient relationship. In the theme impacts on HCPs' tasks, we described the reduction or increase in HCPs' workload. We identified potential transformations of HCPs' work and impacts on the health care system in the theme future role of SCs in health care. Conclusions: The scoping review approach was suitable for this new field of research. The heterogeneity of technologies and wordings was challenging. We identified research gaps in the literature regarding the impact of artificial intelligence-- or algorithm-based self-diagnosing apps or tools on the work of HCPs in primary care. Further empirical studies on HCPs' lived experiences are needed, as the current literature depicts expectations rather than empirical findings. ", doi="10.2196/39219", url="https://www.jmir.org/2023/1/e39219", url="http://www.ncbi.nlm.nih.gov/pubmed/37247214" } @Article{info:doi/10.2196/36074, author="Berdahl, T. Carl and Henreid, J. Andrew and Pevnick, M. Joshua and Zheng, Kai and Nuckols, K. Teryl", title="Digital Tools Designed to Obtain the History of Present Illness From Patients: Scoping Review", journal="J Med Internet Res", year="2022", month="Nov", day="17", volume="24", number="11", pages="e36074", keywords="anamnesis", keywords="informatics", keywords="emergency medicine", keywords="human-computer interaction", keywords="medical history taking", keywords="mobile phone", abstract="Background: Many medical conditions, perhaps 80\% of them, can be diagnosed by taking a thorough history of present illness (HPI). However, in the clinical setting, situational factors such as interruptions and time pressure may cause interactions with patients to be brief and fragmented. One solution for improving clinicians' ability to collect a thorough HPI and maximize efficiency and quality of care could be to use a digital tool to obtain the HPI before face-to-face evaluation by a clinician. Objective: Our objective was to identify and characterize digital tools that have been designed to obtain the HPI directly from patients or caregivers and present this information to clinicians before a face-to-face encounter. We also sought to describe outcomes reported in testing of these tools, especially those related to usability, efficiency, and quality of care. Methods: We conducted a scoping review using predefined search terms in the following databases: MEDLINE, CINAHL, PsycINFO, Web of Science, Embase, IEEE Xplore Digital Library, ACM Digital Library, and ProQuest Dissertations \& Theses Global. Two reviewers screened titles and abstracts for relevance, performed full-text reviews of articles meeting the inclusion criteria, and used a pile-sorting procedure to identify distinguishing characteristics of the tools. Information describing the tools was primarily obtained from identified peer-reviewed sources; in addition, supplementary information was obtained from tool websites and through direct communications with tool creators. Results: We identified 18 tools meeting the inclusion criteria. Of these 18 tools, 14 (78\%) used primarily closed-ended and multiple-choice questions, 1 (6\%) used free-text input, and 3 (17\%) used conversational (chatbot) style. More than half (10/18, 56\%) of the tools were tailored to specific patient subpopulations; the remaining (8/18, 44\%) tools did not specify a target subpopulation. Of the 18 tools, 7 (39\%) included multilingual support, and 12 (67\%) had the capability to transfer data directly into the electronic health record. Studies of the tools reported on various outcome measures related to usability, efficiency, and quality of care. Conclusions: The HPI tools we identified (N=18) varied greatly in their purpose and functionality. There was no consensus on how patient-generated information should be collected or presented to clinicians. Existing tools have undergone inconsistent levels of testing, with a wide variety of different outcome measures used in evaluation, including some related to usability, efficiency, and quality of care. There is substantial interest in using digital tools to obtain the HPI from patients, but the outcomes measured have been inconsistent. Future research should focus on whether using HPI tools can lead to improved patient experience and health outcomes, although surrogate end points could instead be used so long as patient safety is monitored. ", doi="10.2196/36074", url="https://www.jmir.org/2022/11/e36074", url="http://www.ncbi.nlm.nih.gov/pubmed/36394945" } @Article{info:doi/10.2196/37408, author="Painter, Annabelle and Hayhoe, Benedict and Riboli-Sasco, Eva and El-Osta, Austen", title="Online Symptom Checkers: Recommendations for a Vignette-Based Clinical Evaluation Standard", journal="J Med Internet Res", year="2022", month="Oct", day="26", volume="24", number="10", pages="e37408", keywords="online symptom checkers", keywords="clinical evaluation", keywords="validation", keywords="assessment", keywords="standards", keywords="third-party assessment", keywords="quality assurance", doi="10.2196/37408", url="https://www.jmir.org/2022/10/e37408", url="http://www.ncbi.nlm.nih.gov/pubmed/36287594" } @Article{info:doi/10.2196/40064, author="Liu, W. Andrew and Odisho, Y. Anobel and Brown III, William and Gonzales, Ralph and Neinstein, B. Aaron and Judson, J. Timothy", title="Patient Experience and Feedback After Using an Electronic Health Record--Integrated COVID-19 Symptom Checker: Survey Study", journal="JMIR Hum Factors", year="2022", month="Sep", day="13", volume="9", number="3", pages="e40064", keywords="COVID-19", keywords="patient portals", keywords="digital health", keywords="diagnostic self evaluation", keywords="medical informatics", keywords="internet", keywords="telemedicine", keywords="triage", keywords="feedback", keywords="medical records systems", keywords="San Francisco", keywords="user experience", keywords="user satisfaction", keywords="self-triage", keywords="symptom checker", keywords="health system", keywords="workflow", keywords="integration", keywords="electronic health record", abstract="Background: Symptom checkers have been widely used during the COVID-19 pandemic to alleviate strain on health systems and offer patients a 24-7 self-service triage option. Although studies suggest that users may positively perceive web-based symptom checkers, no studies have quantified user feedback after use of an electronic health record--integrated COVID-19 symptom checker with self-scheduling functionality. Objective: In this paper, we aimed to understand user experience, user satisfaction, and user-reported alternatives to the use of a COVID-19 symptom checker with self-triage and self-scheduling functionality. Methods: We launched a patient-portal--based self-triage and self-scheduling tool in March 2020 for patients with COVID-19 symptoms, exposures, or questions. We made an optional, anonymous Qualtrics survey available to patients immediately after they completed the symptom checker. Results: Between December 16, 2021, and March 28, 2022, there were 395 unique responses to the survey. Overall, the respondents reported high satisfaction across all demographics, with a median rating of 8 out of 10 and 288/395 (47.6\%) of the respondents giving a rating of 9 or 10 out of 10. User satisfaction scores were not associated with any demographic factors. The most common user-reported alternatives had the web-based tool not been available were calling the COVID-19 telephone hotline and sending a patient-portal message to their physician for advice. The ability to schedule a test online was the most important symptom checker feature for the respondents. The most common categories of user feedback were regarding other COVID-19 services (eg, telephone hotline), policies, or procedures, and requesting additional features or functionality. Conclusions: This analysis suggests that COVID-19 symptom checkers with self-triage and self-scheduling functionality may have high overall user satisfaction, regardless of user demographics. By allowing users to self-triage and self-schedule tests and visits, tools such as this may prevent unnecessary calls and messages to clinicians. Individual feedback suggested that the user experience for this type of tool is highly dependent on the organization's operational workflows for COVID-19 testing and care. This study provides insight for the implementation and improvement of COVID-19 symptom checkers to ensure high user satisfaction. ", doi="10.2196/40064", url="https://humanfactors.jmir.org/2022/3/e40064", url="http://www.ncbi.nlm.nih.gov/pubmed/35960593" } @Article{info:doi/10.2196/36322, author="Arellano Carmona, Kimberly and Chittamuru, Deepti and Kravitz, L. Richard and Ramondt, Steven and Ram{\'i}rez, Susana A.", title="Health Information Seeking From an Intelligent Web-Based Symptom Checker: Cross-sectional Questionnaire Study", journal="J Med Internet Res", year="2022", month="Aug", day="19", volume="24", number="8", pages="e36322", keywords="health information seeking", keywords="health information", keywords="information seeking", keywords="information seeker", keywords="information behavior", keywords="artificial intelligence", keywords="medical information system", keywords="digital divide", keywords="information inequality", keywords="digital epidemiology", keywords="symptom checker", keywords="digital health", keywords="eHealth", keywords="online health information", keywords="user demographic", keywords="health information resource", keywords="health information tool", keywords="digital health assistant", abstract="Background: The ever-growing amount of health information available on the web is increasing the demand for tools providing personalized and actionable health information. Such tools include symptom checkers that provide users with a potential diagnosis after responding to a set of probes about their symptoms. Although the potential for their utility is great, little is known about such tools' actual use and effects. Objective: We aimed to understand who uses a web-based artificial intelligence--powered symptom checker and its purposes, how they evaluate the experience of the web-based interview and quality of the information, what they intend to do with the recommendation, and predictors of future use. Methods: Cross-sectional survey of web-based health information seekers following the completion of a symptom checker visit (N=2437). Measures of comprehensibility, confidence, usefulness, health-related anxiety, empowerment, and intention to use in the future were assessed. ANOVAs and the Wilcoxon rank sum test examined mean outcome differences in racial, ethnic, and sex groups. The relationship between perceptions of the symptom checker and intention to follow recommended actions was assessed using multilevel logistic regression. Results: Buoy users were well-educated (1384/1704, 81.22\% college or higher), primarily White (1227/1693, 72.47\%), and female (2069/2437, 84.89\%). Most had insurance (1449/1630, 88.89\%), a regular health care provider (1307/1709, 76.48\%), and reported good health (1000/1703, 58.72\%). Three types of symptoms---pain (855/2437, 35.08\%), gynecological issues (293/2437, 12.02\%), and masses or lumps (204/2437, 8.37\%)---accounted for almost half (1352/2437, 55.48\%) of site visits. Buoy's top three primary recommendations split across less-serious triage categories: primary care physician in 2 weeks (754/2141, 35.22\%), self-treatment (452/2141, 21.11\%), and primary care in 1 to 2 days (373/2141, 17.42\%). Common diagnoses were musculoskeletal (303/2437, 12.43\%), gynecological (304/2437, 12.47\%) and skin conditions (297/2437, 12.19\%), and infectious diseases (300/2437, 12.31\%). Users generally reported high confidence in Buoy, found it useful and easy to understand, and said that Buoy made them feel less anxious and more empowered to seek medical help. Users for whom Buoy recommended ``Waiting/Watching'' or ``Self-Treatment'' had strongest intentions to comply, whereas those advised to seek primary care had weaker intentions. Compared with White users, Latino and Black users had significantly more confidence in Buoy (P<.05), and the former also found it significantly more useful (P<.05). Latino (odds ratio 1.96, 95\% CI 1.22-3.25) and Black (odds ratio 2.37, 95\% CI 1.57-3.66) users also had stronger intentions to discuss recommendations with a provider than White users. Conclusions: Results demonstrate the potential utility of a web-based health information tool to empower people to seek care and reduce health-related anxiety. However, despite encouraging results suggesting the tool may fulfill unmet health information needs among women and Black and Latino adults, analyses of the user base illustrate persistent second-level digital divide effects. ", doi="10.2196/36322", url="https://www.jmir.org/2022/8/e36322", url="http://www.ncbi.nlm.nih.gov/pubmed/35984690" } @Article{info:doi/10.2196/34026, author="Wetzel, Anna-Jasmin and Koch, Roland and Preiser, Christine and M{\"u}ller, Regina and Klemmt, Malte and Ranisch, Robert and Ehni, Hans-J{\"o}rg and Wiesing, Urban and Rieger, A. Monika and Henking, Tanja and Joos, Stefanie", title="Ethical, Legal, and Social Implications of Symptom Checker Apps in Primary Health Care (CHECK.APP): Protocol for an Interdisciplinary Mixed Methods Study", journal="JMIR Res Protoc", year="2022", month="May", day="16", volume="11", number="5", pages="e34026", keywords="symptom checker apps", keywords="self-diagnosis, self-triage, digitalization in primary care, general practitioners", keywords="symptom checker", keywords="app", keywords="mobile app", keywords="primary care", abstract="Background: Symptom checker apps (SCAs) are accessible tools that provide early symptom assessment for users. The ethical, legal, and social implications of SCAs and their impact on the patient-physician relationship, the health care providers, and the health care system have sparsely been examined. This study protocol describes an approach to investigate the possible impacts and implications of SCAs on different levels of health care provision. It considers the perspectives of the users, nonusers, general practitioners (GPs), and health care experts. Objective: We aim to assess a comprehensive overview of the use of SCAs and address problematic issues, if any. The primary outcomes of this study are empirically informed multi-perspective recommendations for different stakeholders on the ethical, legal, and social implications of SCAs. Methods: Quantitative and qualitative methods will be used in several overlapping and interconnected study phases. In study phase 1, a comprehensive literature review will be conducted to assess the ethical, legal, social, and systemic impacts of SCAs. Study phase 2 comprises a survey that will be analyzed with a logistic regression. It aims to assess the user degree of SCAs in Germany as well as the predictors for SCA usage. Study phase 3 will investigate self-observational diaries and user interviews, which will be analyzed as integrated cases to assess user perspectives, usage pattern, and arising problems. Study phase 4 will comprise GP interviews to assess their experiences, perspectives, self-image, and concepts and will be analyzed with the basic procedure by Kruse. Moreover, interviews with health care experts will be conducted in study phase 3 and will be analyzed by using the reflexive thematical analysis approach of Braun and Clark. Results: Study phase 1 will be completed in November 2021. We expect the results of study phase 2 in December 2021 and February 2022. In study phase 3, interviews are currently being conducted. The final study endpoint will be in February 2023. Conclusions: The possible ethical, legal, social, and systemic impacts of a widespread use of SCAs that affect stakeholders and stakeholder groups on different levels of health care will be identified. The proposed methodological approach provides a multifaceted and diverse empirical basis for a broad discussion on these implications. Trial Registration: German Clinical Trials Register (DRKS) DRKS00022465; https://tinyurl.com/yx53er67 International Registered Report Identifier (IRRID): DERR1-10.2196/34026 ", doi="10.2196/34026", url="https://www.researchprotocols.org/2022/5/e34026", url="http://www.ncbi.nlm.nih.gov/pubmed/35576570" } @Article{info:doi/10.2196/31810, author="Schmieding, L. Malte and Kopka, Marvin and Schmidt, Konrad and Schulz-Niethammer, Sven and Balzer, Felix and Feufel, A. Markus", title="Triage Accuracy of Symptom Checker Apps: 5-Year Follow-up Evaluation", journal="J Med Internet Res", year="2022", month="May", day="10", volume="24", number="5", pages="e31810", keywords="digital health", keywords="triage", keywords="symptom checker", keywords="patient-centered care", keywords="eHealth apps", keywords="mobile phone", abstract="Background: Symptom checkers are digital tools assisting laypersons in self-assessing the urgency and potential causes of their medical complaints. They are widely used but face concerns from both patients and health care professionals, especially regarding their accuracy. A 2015 landmark study substantiated these concerns using case vignettes to demonstrate that symptom checkers commonly err in their triage assessment. Objective: This study aims to revisit the landmark index study to investigate whether and how symptom checkers' capabilities have evolved since 2015 and how they currently compare with laypersons' stand-alone triage appraisal. Methods: In early 2020, we searched for smartphone and web-based applications providing triage advice. We evaluated these apps on the same 45 case vignettes as the index study. Using descriptive statistics, we compared our findings with those of the index study and with publicly available data on laypersons' triage capability. Results: We retrieved 22 symptom checkers providing triage advice. The median triage accuracy in 2020 (55.8\%, IQR 15.1\%) was close to that in 2015 (59.1\%, IQR 15.5\%). The apps in 2020 were less risk averse (odds 1.11:1, the ratio of overtriage errors to undertriage errors) than those in 2015 (odds 2.82:1), missing >40\% of emergencies. Few apps outperformed laypersons in either deciding whether emergency care was required or whether self-care was sufficient. No apps outperformed the laypersons on both decisions. Conclusions: Triage performance of symptom checkers has, on average, not improved over the course of 5 years. It decreased in 2 use cases (advice on when emergency care is required and when no health care is needed for the moment). However, triage capability varies widely within the sample of symptom checkers. Whether it is beneficial to seek advice from symptom checkers depends on the app chosen and on the specific question to be answered. Future research should develop resources (eg, case vignette repositories) to audit the capabilities of symptom checkers continuously and independently and provide guidance on when and to whom they should be recommended. ", doi="10.2196/31810", url="https://www.jmir.org/2022/5/e31810", url="http://www.ncbi.nlm.nih.gov/pubmed/35536633" } @Article{info:doi/10.2196/35219, author="Kopka, Marvin and Schmieding, L. Malte and Rieger, Tobias and Roesler, Eileen and Balzer, Felix and Feufel, A. Markus", title="Determinants of Laypersons' Trust in Medical Decision Aids: Randomized Controlled Trial", journal="JMIR Hum Factors", year="2022", month="May", day="3", volume="9", number="2", pages="e35219", keywords="symptom checkers", keywords="disposition advice", keywords="anthropomorphism", keywords="artificial intelligence", keywords="urgency assessment", keywords="patient-centered care", keywords="human-computer interaction", keywords="consumer health", keywords="information technology", keywords="IT", keywords="mobile phone", abstract="Background: Symptom checker apps are patient-facing decision support systems aimed at providing advice to laypersons on whether, where, and how to seek health care (disposition advice). Such advice can improve laypersons' self-assessment and ultimately improve medical outcomes. Past research has mainly focused on the accuracy of symptom checker apps' suggestions. To support decision-making, such apps need to provide not only accurate but also trustworthy advice. To date, only few studies have addressed the question of the extent to which laypersons trust symptom checker app advice or the factors that moderate their trust. Studies on general decision support systems have shown that framing automated systems (anthropomorphic or emphasizing expertise), for example, by using icons symbolizing artificial intelligence (AI), affects users' trust. Objective: This study aims to identify the factors influencing laypersons' trust in the advice provided by symptom checker apps. Primarily, we investigated whether designs using anthropomorphic framing or framing the app as an AI increases users' trust compared with no such framing. Methods: Through a web-based survey, we recruited 494 US residents with no professional medical training. The participants had to first appraise the urgency of a fictitious patient description (case vignette). Subsequently, a decision aid (mock symptom checker app) provided disposition advice contradicting the participants' appraisal, and they had to subsequently reappraise the vignette. Participants were randomized into 3 groups: 2 experimental groups using visual framing (anthropomorphic, 160/494, 32.4\%, vs AI, 161/494, 32.6\%) and a neutral group without such framing (173/494, 35\%). Results: Most participants (384/494, 77.7\%) followed the decision aid's advice, regardless of its urgency level. Neither anthropomorphic framing (odds ratio 1.120, 95\% CI 0.664-1.897) nor framing as AI (odds ratio 0.942, 95\% CI 0.565-1.570) increased behavioral or subjective trust (P=.99) compared with the no-frame condition. Even participants who were extremely certain in their own decisions (ie, 100\% certain) commonly changed it in favor of the symptom checker's advice (19/34, 56\%). Propensity to trust and eHealth literacy were associated with increased subjective trust in the symptom checker (propensity to trust b=0.25; eHealth literacy b=0.2), whereas sociodemographic variables showed no such link with either subjective or behavioral trust. Conclusions: Contrary to our expectation, neither the anthropomorphic framing nor the emphasis on AI increased trust in symptom checker advice compared with that of a neutral control condition. However, independent of the interface, most participants trusted the mock app's advice, even when they were very certain of their own assessment. Thus, the question arises as to whether laypersons use such symptom checkers as substitutes rather than as aids in their own decision-making. With trust in symptom checkers already high at baseline, the benefit of symptom checkers depends on interface designs that enable users to adequately calibrate their trust levels during usage. Trial Registration: Deutsches Register Klinischer Studien DRKS00028561; https://tinyurl.com/rv4utcfb (retrospectively registered). ", doi="10.2196/35219", url="https://humanfactors.jmir.org/2022/2/e35219", url="http://www.ncbi.nlm.nih.gov/pubmed/35503248" } @Article{info:doi/10.2196/32832, author="Hennemann, Severin and Kuhn, Sebastian and Witth{\"o}ft, Michael and Jungmann, M. Stefanie", title="Diagnostic Performance of an App-Based Symptom Checker in Mental Disorders: Comparative Study in Psychotherapy Outpatients", journal="JMIR Ment Health", year="2022", month="Jan", day="31", volume="9", number="1", pages="e32832", keywords="mHealth", keywords="symptom checker", keywords="diagnostics", keywords="mental disorders", keywords="psychotherapy", keywords="mobile phone", abstract="Background: Digital technologies have become a common starting point for health-related information-seeking. Web- or app-based symptom checkers aim to provide rapid and accurate condition suggestions and triage advice but have not yet been investigated for mental disorders in routine health care settings. Objective: This study aims to test the diagnostic performance of a widely available symptom checker in the context of formal diagnosis of mental disorders when compared with therapists' diagnoses based on structured clinical interviews. Methods: Adult patients from an outpatient psychotherapy clinic used the app-based symptom checker Ada--check your health (ADA; Ada Health GmbH) at intake. Accuracy was assessed as the agreement of the first and 1 of the first 5 condition suggestions of ADA with at least one of the interview-based therapist diagnoses. In addition, sensitivity, specificity, and interrater reliabilities (Gwet first-order agreement coefficient [AC1]) were calculated for the 3 most prevalent disorder categories. Self-reported usability (assessed using the System Usability Scale) and acceptance of ADA (assessed using an adapted feedback questionnaire) were evaluated. Results: A total of 49 patients (30/49, 61\% women; mean age 33.41, SD 12.79 years) were included in this study. Across all patients, the interview-based diagnoses matched ADA's first condition suggestion in 51\% (25/49; 95\% CI 37.5-64.4) of cases and 1 of the first 5 condition suggestions in 69\% (34/49; 95\% CI 55.4-80.6) of cases. Within the main disorder categories, the accuracy of ADA's first condition suggestion was 0.82 for somatoform and associated disorders, 0.65 for affective disorders, and 0.53 for anxiety disorders. Interrater reliabilities ranged from low (AC1=0.15 for anxiety disorders) to good (AC1=0.76 for somatoform and associated disorders). The usability of ADA was rated as high in the System Usability Scale (mean 81.51, SD 11.82, score range 0-100). Approximately 71\% (35/49) of participants would have preferred a face-to-face over an app-based diagnostic. Conclusions: Overall, our findings suggest that a widely available symptom checker used in the formal diagnosis of mental disorders could provide clinicians with a list of condition suggestions with moderate-to-good accuracy. However, diagnostic performance was heterogeneous between disorder categories and included low interrater reliability. Although symptom checkers have some potential to complement the diagnostic process as a screening tool, the diagnostic performance should be tested in larger samples and in comparison with further diagnostic instruments. ", doi="10.2196/32832", url="https://mental.jmir.org/2022/1/e32832", url="http://www.ncbi.nlm.nih.gov/pubmed/35099395" } @Article{info:doi/10.2196/29219, author="Dunn, Taylor and Howlett, E. Susan and Stanojevic, Sanja and Shehzad, Aaqib and Stanley, Justin and Rockwood, Kenneth", title="Patterns of Symptom Tracking by Caregivers and Patients With Dementia and Mild Cognitive Impairment: Cross-sectional Study", journal="J Med Internet Res", year="2022", month="Jan", day="27", volume="24", number="1", pages="e29219", keywords="dementia", keywords="mild cognitive impairment", keywords="real-world evidence", keywords="patient-centric outcomes", keywords="machine learning", keywords="dementia stage", keywords="Alzheimer disease", keywords="symptom tracking", abstract="Background: Individuals with dementia and mild cognitive impairment (MCI) experience a wide variety of symptoms and challenges that trouble them. To address this heterogeneity, numerous standardized tests are used for diagnosis and prognosis. myGoalNav Dementia is a web-based tool that allows individuals with impairments and their caregivers to identify and track outcomes of greatest importance to them, which may be a less arbitrary and more sensitive way of capturing meaningful change. Objective: We aim to explore the most frequent and important symptoms and challenges reported by caregivers and people with dementia and MCI and how this varies according to disease severity. Methods: This cross-sectional study involved 3909 web-based myGoalNav users (mostly caregivers of people with dementia or MCI) who completed symptom profiles between 2006 and 2019. To make a symptom profile, users selected their most personally meaningful or troublesome dementia-related symptoms to track over time. Users were also asked to rank their chosen symptoms from least to most important, which we called the symptom potency. As the stage of disease for these web-based users was unknown, we applied a supervised staging algorithm, previously trained on clinician-derived data, to classify each profile into 1 of 4 stages: MCI and mild, moderate, and severe dementia. Across these stages, we compared symptom tracking frequency, symptom potency, and the relationship between frequency and potency. Results: Applying the staging algorithm to the 3909 user profiles resulted in 917 (23.46\%) MCI, 1596 (40.83\%) mild dementia, 514 (13.15\%) moderate dementia, and 882 (22.56\%) severe dementia profiles. We found that the most frequent symptoms in MCI and mild dementia profiles were similar and comprised early hallmarks of dementia (eg, recent memory and language difficulty). As the stage increased to moderate and severe, the most frequent symptoms were characteristic of loss of independent function (eg, incontinence) and behavioral problems (eg, aggression). The most potent symptoms were similar between stages and generally reflected disruptions in everyday life (eg, problems with hobbies or games, travel, and looking after grandchildren). Symptom frequency was negatively correlated with potency at all stages, and the strength of this relationship increased with increasing disease severity. Conclusions: Our results emphasize the importance of patient-centricity in MCI and dementia studies and illustrate the valuable real-world evidence that can be collected with digital tools. Here, the most frequent symptoms across the stages reflected our understanding of the typical disease progression. However, the symptoms that were ranked as most personally important by users were generally among the least frequently selected. Through individualization, patient-centered instruments such as myGoalNav can complement standardized measures by capturing these infrequent but potent outcomes. ", doi="10.2196/29219", url="https://www.jmir.org/2022/1/e29219", url="http://www.ncbi.nlm.nih.gov/pubmed/35084341" } @Article{info:doi/10.2196/31271, author="Janvrin, Lynn Miranda and Korona-Bailey, Jessica and Koehlmoos, P{\'e}rez Tracey", title="Re-examining COVID-19 Self-Reported Symptom Tracking Programs in the United States: Updated Framework Synthesis", journal="JMIR Form Res", year="2021", month="Dec", day="6", volume="5", number="12", pages="e31271", keywords="COVID-19", keywords="coronavirus", keywords="framework analysis", keywords="information resources", keywords="monitoring", keywords="patient-reported outcome measures", keywords="self-reported", keywords="surveillance", keywords="symptom tracking", keywords="synthesis", keywords="digital health", abstract="Background: Early in the pandemic, in 2020, Koehlmoos et al completed a framework synthesis of currently available self-reported symptom tracking programs for COVID-19. This framework described relevant programs, partners and affiliates, funding, responses, platform, and intended audience, among other considerations. Objective: This study seeks to update the existing framework with the aim of identifying developments in the landscape and highlighting how programs have adapted to changes in pandemic response. Methods: Our team developed a framework to collate information on current COVID-19 self-reported symptom tracking programs using the ``best-fit'' framework synthesis approach. All programs from the previous study were included to document changes. New programs were discovered using a Google search for target keywords. The time frame for the search for programs ranged from March 1, 2021, to May 6, 2021. Results: We screened 33 programs, of which 8 were included in our final framework synthesis. We identified multiple common data elements, including demographic information such as race, age, gender, and affiliation (all were associated with universities, medical schools, or schools of public health). Dissimilarities included questions regarding vaccination status, vaccine hesitancy, adherence to social distancing, COVID-19 testing, and mental health. Conclusions: At this time, the future of self-reported symptom tracking for COVID-19 is unclear. Some sources have speculated that COVID-19 may become a yearly occurrence much like the flu, and if so, the data that these programs generate is still valuable. However, it is unclear whether the public will maintain the same level of interest in reporting their symptoms on a regular basis if the prevalence of COVID-19 becomes more common. ", doi="10.2196/31271", url="https://formative.jmir.org/2021/12/e31271", url="http://www.ncbi.nlm.nih.gov/pubmed/34792469" } @Article{info:doi/10.2196/28266, author="Kummer, Benjamin and Shakir, Lubaina and Kwon, Rachel and Habboushe, Joseph and Jett{\'e}, Nathalie", title="Usage Patterns of Web-Based Stroke Calculators in Clinical Decision Support: Retrospective Analysis", journal="JMIR Med Inform", year="2021", month="Aug", day="2", volume="9", number="8", pages="e28266", keywords="medical informatics", keywords="clinical informatics", keywords="mhealth", keywords="digital health", keywords="cerebrovascular disease", keywords="medical calculators", keywords="health information", keywords="health information technology", keywords="information technology", keywords="economic health", keywords="clinical health", keywords="electronic health records", abstract="Background: Clinical scores are frequently used in the diagnosis and management of stroke. While medical calculators are increasingly important support tools for clinical decisions, the uptake and use of common medical calculators for stroke remain poorly characterized. Objective: We aimed to describe use patterns in frequently used stroke-related medical calculators for clinical decisions from a web-based support system. Methods: We conducted a retrospective study of calculators from MDCalc, a web-based and mobile app--based medical calculator platform based in the United States. We analyzed metadata tags from MDCalc's calculator use data to identify all calculators related to stroke. Using relative page views as a measure of calculator use, we determined the 5 most frequently used stroke-related calculators between January 2016 and December 2018. For all 5 calculators, we determined cumulative and quarterly use, mode of access (eg, app or web browser), and both US and international distributions of use. We compared cumulative use in the 2016-2018 period with use from January 2011 to December 2015. Results: Over the study period, we identified 454 MDCalc calculators, of which 48 (10.6\%) were related to stroke. Of these, the 5 most frequently used calculators were the CHA2DS2-VASc score for atrial fibrillation stroke risk calculator (5.5\% of total and 32\% of stroke-related page views), the Mean Arterial Pressure calculator (2.4\% of total and 14.0\% of stroke-related page views), the HAS-BLED score for major bleeding risk (1.9\% of total and 11.4\% of stroke-related page views), the National Institutes of Health Stroke Scale (NIHSS) score calculator (1.7\% of total and 10.1\% of stroke-related page views), and the CHADS2 score for atrial fibrillation stroke risk calculator (1.4\% of total and 8.1\% of stroke-related page views). Web browser was the most common mode of access, accounting for 82.7\%-91.2\% of individual stroke calculator page views. Access originated most frequently from the most populated regions within the United States. Internationally, use originated mostly from English-language countries. The NIHSS score calculator demonstrated the greatest increase in page views (238.1\% increase) between the first and last quarters of the study period. Conclusions: The most frequently used stroke calculators were the CHA2DS2-VASc, Mean Arterial Pressure, HAS-BLED, NIHSS, and CHADS2. These were mainly accessed by web browser, from English-speaking countries, and from highly populated areas. Further studies should investigate barriers to stroke calculator adoption and the effect of calculator use on the application of best practices in cerebrovascular disease. ", doi="10.2196/28266", url="https://medinform.jmir.org/2021/8/e28266", url="http://www.ncbi.nlm.nih.gov/pubmed/34338647" } @Article{info:doi/10.2196/26402, author="Montazeri, Maryam and Multmeier, Jan and Novorol, Claire and Upadhyay, Shubhanan and Wicks, Paul and Gilbert, Stephen", title="Optimization of Patient Flow in Urgent Care Centers Using a Digital Tool for Recording Patient Symptoms and History: Simulation Study", journal="JMIR Form Res", year="2021", month="May", day="21", volume="5", number="5", pages="e26402", keywords="symptom assessment app", keywords="discrete event simulation", keywords="health care system", keywords="patient flow modeling", keywords="patient flow", keywords="simulation", keywords="urgent care", keywords="waiting times", abstract="Background: Crowding can negatively affect patient and staff experience, and consequently the performance of health care facilities. Crowding can potentially be eased through streamlining and the reduction of duplication in patient history-taking through the use of a digital symptom-taking app. Objective: We simulated the introduction of a digital symptom-taking app on patient flow. We hypothesized that waiting times and crowding in an urgent care center (UCC) could be reduced, and that this would be more efficient than simply adding more staff. Methods: A discrete-event approach was used to simulate patient flow in a UCC during a 4-hour time frame. The baseline scenario was a small UCC with 2 triage nurses, 2 doctors, 1 treatment/examination nurse, and 1 discharge administrator in service. We simulated 33 scenarios with different staff numbers or different potential time savings through the app. We explored average queue length, waiting time, idle time, and staff utilization for each scenario. Results: Discrete-event simulation showed that even a few minutes saved through patient app-based self-history recording during triage could result in significantly increased efficiency. A modest estimated time saving per patient of 2.5 minutes decreased the average patient wait time for triage by 26.17\%, whereas a time saving of 5 minutes led to a 54.88\% reduction in patient wait times. Alternatively, adding an additional triage nurse was less efficient, as the additional staff were only required at the busiest times. Conclusions: Small time savings in the history-taking process have potential to result in substantial reductions in total patient waiting time for triage nurses, with likely effects of reduced patient anxiety, staff anxiety, and improved patient care. Patient self-history recording could be carried out at home or in the waiting room via a check-in kiosk or a portable tablet computer. This formative simulation study has potential to impact service provision and approaches to digitalization at scale. ", doi="10.2196/26402", url="https://formative.jmir.org/2021/5/e26402", url="http://www.ncbi.nlm.nih.gov/pubmed/34018963" } @Article{info:doi/10.2196/26543, author="Munsch, Nicolas and Martin, Alistair and Gruarin, Stefanie and Nateqi, Jama and Abdarahmane, Isselmou and Weingartner-Ortner, Rafael and Knapp, Bernhard", title="Authors' Reply to: Screening Tools: Their Intended Audiences and Purposes. Comment on ``Diagnostic Accuracy of Web-Based COVID-19 Symptom Checkers: Comparison Study''", journal="J Med Internet Res", year="2021", month="May", day="21", volume="23", number="5", pages="e26543", keywords="COVID-19", keywords="symptom checkers", keywords="benchmark", keywords="digital health", keywords="symptom", keywords="chatbot", keywords="accuracy", doi="10.2196/26543", url="https://www.jmir.org/2021/5/e26543", url="http://www.ncbi.nlm.nih.gov/pubmed/33989162" } @Article{info:doi/10.2196/26148, author="Millen, Elizabeth and Gilsdorf, Andreas and Fenech, Matthew and Gilbert, Stephen", title="Screening Tools: Their Intended Audiences and Purposes. Comment on ``Diagnostic Accuracy of Web-Based COVID-19 Symptom Checkers: Comparison Study''", journal="J Med Internet Res", year="2021", month="May", day="21", volume="23", number="5", pages="e26148", keywords="COVID-19", keywords="symptom checkers", keywords="benchmark", keywords="digital health", keywords="symptom", keywords="chatbot", keywords="accuracy", doi="10.2196/26148", url="https://www.jmir.org/2021/5/e26148", url="http://www.ncbi.nlm.nih.gov/pubmed/33989169" } @Article{info:doi/10.2196/25493, author="Brandberg, Helge and Sundberg, Johan Carl and Spaak, Jonas and Koch, Sabine and Zakim, David and Kahan, Thomas", title="Use of Self-Reported Computerized Medical History Taking for Acute Chest Pain in the Emergency Department -- the Clinical Expert Operating System Chest Pain Danderyd Study (CLEOS-CPDS): Prospective Cohort Study", journal="J Med Internet Res", year="2021", month="Apr", day="27", volume="23", number="4", pages="e25493", keywords="chest pain", keywords="computerized history taking", keywords="coronary artery disease", keywords="eHealth", keywords="emergency department", keywords="health informatics", keywords="medical history", keywords="risk management", abstract="Background: Chest pain is one of the most common chief complaints in emergency departments (EDs). Collecting an adequate medical history is challenging but essential in order to use recommended risk scores such as the HEART score (based on history, electrocardiogram, age, risk factors, and troponin). Self-reported computerized history taking (CHT) is a novel method to collect structured medical history data directly from the patient through a digital device. CHT is rarely used in clinical practice, and there is a lack of evidence for utility in an acute setting. Objective: This substudy of the Clinical Expert Operating System Chest Pain Danderyd Study (CLEOS-CPDS) aimed to evaluate whether patients with acute chest pain can interact effectively with CHT in the ED. Methods: Prospective cohort study on self-reported medical histories collected from acute chest pain patients using a CHT program on a tablet. Clinically stable patients aged 18 years and older with a chief complaint of chest pain, fluency in Swedish, and a nondiagnostic electrocardiogram or serum markers for acute coronary syndrome were eligible for inclusion. Patients unable to carry out an interview with CHT (eg, inadequate eyesight, confusion or agitation) were excluded. Effectiveness was assessed as the proportion of patients completing the interview and the time required in order to collect a medical history sufficient for cardiovascular risk stratification according to HEART score. Results: During 2017-2018, 500 participants were consecutively enrolled. The age and sex distribution (mean 54.3, SD 17.0 years; 213/500, 42.6\% women) was similar to that of the general chest pain population (mean 57.5, SD 19.2 years; 49.6\% women). Common reasons for noninclusion were language issues (182/1000, 18.2\%), fatigue (158/1000, 15.8\%), and inability to use a tablet (152/1000, 15.2\%). Sufficient data to calculate HEART score were collected in 70.4\% (352/500) of the patients. Key modules for chief complaint, cardiovascular history, and respiratory history were completed by 408 (81.6\%), 339 (67.8\%), and 291 (58.2\%) of the 500 participants, respectively, while 148 (29.6\%) completed the entire interview (in all 14 modules). Factors associated with completeness were age 18-69 years (all key modules: Ps<.001), male sex (cardiovascular: P=.04), active workers (all key modules: Ps<.005), not arriving by ambulance (chief complaint: P=.03; cardiovascular: P=.045), and ongoing chest pain (complete interview: P=.002). The median time to collect HEART score data was 23 (IQR 18-31) minutes and to complete an interview was 64 (IQR 53-77) minutes. The main reasons for discontinuing the interview prior to completion (n=352) were discharge from the ED (101, 28.7\%) and tiredness (95, 27.0\%). Conclusions: A majority of patients with acute chest pain can interact effectively with CHT on a tablet in the ED to provide sufficient data for risk stratification with a well-established risk score. The utility was somewhat lower in patients 70 years and older, in patients arriving by ambulance, and in patients without ongoing chest pain. Further studies are warranted to assess whether CHT can contribute to improved management and prognosis in this large patient group. Trial Registration: ClinicalTrials.gov NCT03439449; https://clinicaltrials.gov/ct2/show/NCT03439449 International Registered Report Identifier (IRRID): RR2-10.1136/bmjopen-2019-031871 ", doi="10.2196/25493", url="https://www.jmir.org/2021/4/e25493", url="http://www.ncbi.nlm.nih.gov/pubmed/33904821" } @Article{info:doi/10.2196/24475, author="Schmieding, L. Malte and M{\"o}rgeli, Rudolf and Schmieding, L. Maike A. and Feufel, A. Markus and Balzer, Felix", title="Benchmarking Triage Capability of Symptom Checkers Against That of Medical Laypersons: Survey Study", journal="J Med Internet Res", year="2021", month="Mar", day="10", volume="23", number="3", pages="e24475", keywords="digital health", keywords="triage", keywords="symptom checker", keywords="patient-centered care", keywords="eHealth apps", keywords="mobile phone", keywords="decision support systems", keywords="clinical", keywords="consumer health information", keywords="health literacy", abstract="Background: Symptom checkers (SCs) are tools developed to provide clinical decision support to laypersons. Apart from suggesting probable diagnoses, they commonly advise when users should seek care (triage advice). SCs have become increasingly popular despite prior studies rating their performance as mediocre. To date, it is unclear whether SCs can triage better than those who might choose to use them. Objective: This study aims to compare triage accuracy between SCs and their potential users (ie, laypersons). Methods: On Amazon Mechanical Turk, we recruited 91 adults from the United States who had no professional medical background. In a web-based survey, the participants evaluated 45 fictitious clinical case vignettes. Data for 15 SCs that had processed the same vignettes were obtained from a previous study. As main outcome measures, we assessed the accuracy of the triage assessments made by participants and SCs for each of the three triage levels (ie, emergency care, nonemergency care, self-care) and overall, the proportion of participants outperforming each SC in terms of accuracy, and the risk aversion of participants and SCs by comparing the proportion of cases that were overtriaged. Results: The mean overall triage accuracy was similar for participants (60.9\%, SD 6.8\%; 95\% CI 59.5\%-62.3\%) and SCs (58\%, SD 12.8\%). Most participants outperformed all but 5 SCs. On average, SCs more reliably detected emergencies (80.6\%, SD 17.9\%) than laypersons did (67.5\%, SD 16.4\%; 95\% CI 64.1\%-70.8\%). Although both SCs and participants struggled with cases requiring self-care (the least urgent triage category), SCs more often wrongly classified these cases as emergencies (43/174, 24.7\%) compared with laypersons (56/1365, 4.10\%). Conclusions: Most SCs had no greater triage capability than an average layperson, although the triage accuracy of the five best SCs was superior to the accuracy of most participants. SCs might improve early detection of emergencies but might also needlessly increase resource utilization in health care. Laypersons sometimes require support in deciding when to rely on self-care but it is in that very situation where SCs perform the worst. Further research is needed to determine how to best combine the strengths of humans and SCs. ", doi="10.2196/24475", url="https://www.jmir.org/2021/3/e24475", url="http://www.ncbi.nlm.nih.gov/pubmed/33688845" } @Article{info:doi/10.2196/20840, author="Shehzad, Aaqib and Rockwood, Kenneth and Stanley, Justin and Dunn, Taylor and Howlett, E. Susan", title="Use of Patient-Reported Symptoms from an Online Symptom Tracking Tool for Dementia Severity Staging: Development and Validation of a Machine Learning Approach", journal="J Med Internet Res", year="2020", month="Nov", day="11", volume="22", number="11", pages="e20840", keywords="dementia stage", keywords="Alzheimer disease", keywords="mild cognitive impairment", keywords="machine learning", abstract="Background: SymptomGuide Dementia (DGI Clinical Inc) is a publicly available online symptom tracking tool to support caregivers of persons living with dementia. The value of such data are enhanced when the specific dementia stage is identified. Objective: We aimed to develop a supervised machine learning algorithm to classify dementia stages based on tracked symptoms. Methods: We employed clinical data from 717 people from 3 sources: (1) a memory clinic; (2) long-term care; and (3) an open-label trial of donepezil in vascular and mixed dementia (VASPECT). Symptoms were captured with SymptomGuide Dementia. A clinician classified participants into 4 groups using either the Functional Assessment Staging Test or the Global Deterioration Scale as mild cognitive impairment, mild dementia, moderate dementia, or severe dementia. Individualized symptom profiles from the pooled data were used to train machine learning models to predict dementia severity. Models trained with 6 different machine learning algorithms were compared using nested cross-validation to identify the best performing model. Model performance was assessed using measures of balanced accuracy, precision, recall, Cohen $\kappa$, area under the receiver operating characteristic curve (AUROC), and area under the precision-recall curve (AUPRC). The best performing algorithm was used to train a model optimized for balanced accuracy. Results: The study population was mostly female (424/717, 59.1\%), older adults (mean 77.3 years, SD 10.6, range 40-100) with mild to moderate dementia (332/717, 46.3\%). Age, duration of symptoms, 37 unique dementia symptoms, and 10 symptom-derived variables were used to distinguish dementia stages. A model trained with a support vector machine learning algorithm using a one-versus-rest approach showed the best performance. The correct dementia stage was identified with 83\% balanced accuracy (Cohen $\kappa$=0.81, AUPRC 0.91, AUROC 0.96). The best performance was seen when classifying severe dementia (AUROC 0.99). Conclusions: A supervised machine learning algorithm exhibited excellent performance in identifying dementia stages based on dementia symptoms reported in an online environment. This novel dementia staging algorithm can be used to describe dementia stage based on user-reported symptoms. This type of symptom recording offers real-world data that reflect important symptoms in people with dementia. ", doi="10.2196/20840", url="http://www.jmir.org/2020/11/e20840/", url="http://www.ncbi.nlm.nih.gov/pubmed/33174853" } @Article{info:doi/10.2196/21299, author="Munsch, Nicolas and Martin, Alistair and Gruarin, Stefanie and Nateqi, Jama and Abdarahmane, Isselmou and Weingartner-Ortner, Rafael and Knapp, Bernhard", title="Diagnostic Accuracy of Web-Based COVID-19 Symptom Checkers: Comparison Study", journal="J Med Internet Res", year="2020", month="Oct", day="6", volume="22", number="10", pages="e21299", keywords="COVID-19", keywords="symptom checkers", keywords="benchmark", keywords="digital health", keywords="symptom", keywords="chatbot", keywords="accuracy", abstract="Background: A large number of web-based COVID-19 symptom checkers and chatbots have been developed; however, anecdotal evidence suggests that their conclusions are highly variable. To our knowledge, no study has evaluated the accuracy of COVID-19 symptom checkers in a statistically rigorous manner. Objective: The aim of this study is to evaluate and compare the diagnostic accuracies of web-based COVID-19 symptom checkers. Methods: We identified 10 web-based COVID-19 symptom checkers, all of which were included in the study. We evaluated the COVID-19 symptom checkers by assessing 50 COVID-19 case reports alongside 410 non--COVID-19 control cases. A bootstrapping method was used to counter the unbalanced sample sizes and obtain confidence intervals (CIs). Results are reported as sensitivity, specificity, F1 score, and Matthews correlation coefficient (MCC). Results: The classification task between COVID-19--positive and COVID-19--negative for ``high risk'' cases among the 460 test cases yielded (sorted by F1 score): Symptoma (F1=0.92, MCC=0.85), Infermedica (F1=0.80, MCC=0.61), US Centers for Disease Control and Prevention (CDC) (F1=0.71, MCC=0.30), Babylon (F1=0.70, MCC=0.29), Cleveland Clinic (F1=0.40, MCC=0.07), Providence (F1=0.40, MCC=0.05), Apple (F1=0.29, MCC=-0.10), Docyet (F1=0.27, MCC=0.29), Ada (F1=0.24, MCC=0.27) and Your.MD (F1=0.24, MCC=0.27). For ``high risk'' and ``medium risk'' combined the performance was: Symptoma (F1=0.91, MCC=0.83) Infermedica (F1=0.80, MCC=0.61), Cleveland Clinic (F1=0.76, MCC=0.47), Providence (F1=0.75, MCC=0.45), Your.MD (F1=0.72, MCC=0.33), CDC (F1=0.71, MCC=0.30), Babylon (F1=0.70, MCC=0.29), Apple (F1=0.70, MCC=0.25), Ada (F1=0.42, MCC=0.03), and Docyet (F1=0.27, MCC=0.29). Conclusions: We found that the number of correctly assessed COVID-19 and control cases varies considerably between symptom checkers, with different symptom checkers showing different strengths with respect to sensitivity and specificity. A good balance between sensitivity and specificity was only achieved by two symptom checkers. ", doi="10.2196/21299", url="http://www.jmir.org/2020/10/e21299/", url="http://www.ncbi.nlm.nih.gov/pubmed/33001828" }