%0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e66616 %T Data Interoperability in Context: The Importance of Open-Source Implementations When Choosing Open Standards %A Kapitan,Daniel %A Heddema,Femke %A Dekker,André %A Sieswerda,Melle %A Verhoeff,Bart-Jan %A Berg,Matt %+ Eindhoven AI Systems Institute (EAISI), Eindhoven University of Technology, PO Box 513, Eindhoven, 5600 MB, The Netherlands, 31 624097295, daniel@kapitan.net %K FHIR %K OMOP %K openEHR %K health care informatics %K information standards %K secondary use %K digital platform %K data sharing %K data interoperability %K open source implementations %K open standards %K Fast Health Interoperability Resources %K Observational Medical Outcomes Partnership %K clinical care %K data exchange %K longitudinal analysis %K low income %K middle-income %K LMIC %K low and middle-income countries %K developing countries %K developing nations %K health information exchange %D 2025 %7 15.4.2025 %9 Viewpoint %J J Med Internet Res %G English %X Following the proposal by Tsafnat et al (2024) to converge on three open health data standards, this viewpoint offers a critical reflection on their proposed alignment of openEHR, Fast Health Interoperability Resources (FHIR), and Observational Medical Outcomes Partnership (OMOP) as default data standards for clinical care and administration, data exchange, and longitudinal analysis, respectively. We argue that open standards are a necessary but not sufficient condition to achieve health data interoperability. The ecosystem of open-source software needs to be considered when choosing an appropriate standard for a given context. We discuss two specific contexts, namely standardization of (1) health data for federated learning, and (2) health data sharing in low- and middle-income countries. Specific design principles, practical considerations, and implementation choices for these two contexts are described, based on ongoing work in both areas. In the case of federated learning, we observe convergence toward OMOP and FHIR, where the two standards can effectively be used side-by-side given the availability of mediators between the two. In the case of health information exchanges in low and middle-income countries, we see a strong convergence toward FHIR as the primary standard. We propose practical guidelines for context-specific adaptation of open health data standards. %M 40232773 %R 10.2196/66616 %U https://www.jmir.org/2025/1/e66616 %U https://doi.org/10.2196/66616 %U http://www.ncbi.nlm.nih.gov/pubmed/40232773 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e70983 %T Dying in Darkness: Deviations From Data Sharing Ethics in the US Public Health System and the Data Genocide of American Indian and Alaska Native Communities %A Schmit,Cason D %A O’Connell,Meghan Curry %A Shewbrooks,Sarah %A Abourezk,Charles %A Cochlin,Fallon J %A Doerr,Megan %A Kum,Hye-Chung %+ Department of Health Policy and Management, School of Public Health, Texas A&M University, 212 Adriance Lab Rd, College Station, TX, 77843, United States, 1 9794360277, schmit@tamu.edu %K ethics %K information dissemination %K indigenous peoples %K public health surveillance %K privacy %K data sharing %K deidentification %K data anonymization %K public health ethics %K data governance %D 2025 %7 26.3.2025 %9 Viewpoint %J J Med Internet Res %G English %X Tribal governments and Tribal Epidemiology Centers face persistent challenges in obtaining the public health data that are essential to accomplishing their legal and ethical duties to promote health in American Indian and Alaska Native communities. We assessed the ethical implications of current impediments to data sharing among federal, state, and Tribal public health partners. Public health ethics obligates public health data sharing and opposes data collection without dissemination to affected communities. Privacy practices, like deidentification and data suppression, often obstruct data access, disproportionately affect American Indian and Alaska Native populations, and exacerbate health disparities. The 2020-2024 syphilis outbreak illustrates how restricted data access impedes effective public health responses. These practices represent a source of structuralized violence throughout the US public health system that contributes to the data genocide of American Indian and Alaska Native populations. Good governance practices like transparent data practices and the establishment of a social license (ie, the informal permission of a community to collect and use data) is essential to ethically balancing collective well-being with individual privacy in public health. %M 40138677 %R 10.2196/70983 %U https://www.jmir.org/2025/1/e70983 %U https://doi.org/10.2196/70983 %U http://www.ncbi.nlm.nih.gov/pubmed/40138677 %0 Journal Article %@ 2563-3570 %I JMIR Publications %V 6 %N %P e65001 %T A Hybrid Deep Learning–Based Feature Selection Approach for Supporting Early Detection of Long-Term Behavioral Outcomes in Survivors of Cancer: Cross-Sectional Study %A Huang,Tracy %A Ngan,Chun-Kit %A Cheung,Yin Ting %A Marcotte,Madelyn %A Cabrera,Benjamin %+ Worcester Polytechnic Institute, 100 Institute Rd, Worcester, MA, 01609, United States, 1 (508) 831 5000, cngan@wpi.edu %K machine learning %K data driven %K clinical domain–guided framework %K survivors of cancer %K cancer %K oncology %K behavioral outcome predictions %K behavioral study %K behavioral outcomes %K feature selection %K deep learning %K neural network %K hybrid %K prediction %K predictive modeling %K patients with cancer %K deep learning models %K leukemia %K computational study %K computational biology %D 2025 %7 13.3.2025 %9 Original Paper %J JMIR Bioinform Biotech %G English %X Background: The number of survivors of cancer is growing, and they often experience negative long-term behavioral outcomes due to cancer treatments. There is a need for better computational methods to handle and predict these outcomes so that physicians and health care providers can implement preventive treatments. Objective: This study aimed to create a new feature selection algorithm to improve the performance of machine learning classifiers to predict negative long-term behavioral outcomes in survivors of cancer. Methods: We devised a hybrid deep learning–based feature selection approach to support early detection of negative long-term behavioral outcomes in survivors of cancer. Within a data-driven, clinical domain–guided framework to select the best set of features among cancer treatments, chronic health conditions, and socioenvironmental factors, we developed a 2-stage feature selection algorithm, that is, a multimetric, majority-voting filter and a deep dropout neural network, to dynamically and automatically select the best set of features for each behavioral outcome. We also conducted an experimental case study on existing study data with 102 survivors of acute lymphoblastic leukemia (aged 15-39 years at evaluation and >5 years postcancer diagnosis) who were treated in a public hospital in Hong Kong. Finally, we designed and implemented radial charts to illustrate the significance of the selected features on each behavioral outcome to support clinical professionals’ future treatment and diagnoses. Results: In this pilot study, we demonstrated that our approach outperforms the traditional statistical and computation methods, including linear and nonlinear feature selectors, for the addressed top-priority behavioral outcomes. Our approach holistically has higher F1, precision, and recall scores compared to existing feature selection methods. The models in this study select several significant clinical and socioenvironmental variables as risk factors associated with the development of behavioral problems in young survivors of acute lymphoblastic leukemia. Conclusions: Our novel feature selection algorithm has the potential to improve machine learning classifiers’ capability to predict adverse long-term behavioral outcomes in survivors of cancer. %M 40080820 %R 10.2196/65001 %U https://bioinform.jmir.org/2025/1/e65001 %U https://doi.org/10.2196/65001 %U http://www.ncbi.nlm.nih.gov/pubmed/40080820 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 13 %N %P e64354 %T Imputation and Missing Indicators for Handling Missing Longitudinal Data: Data Simulation Analysis Based on Electronic Health Record Data %A Ehrig,Molly %A Bullock,Garrett S %A Leng,Xiaoyan Iris %A Pajewski,Nicholas M %A Speiser,Jaime Lynn %K missing indicator method %K missing data %K imputation %K longitudinal data %K electronic health record data %K electronic health records %K EHR %K simulation study %K clinical prediction model %K prediction model %K older adults %K falls %K logistic regression %K prediction modeling %D 2025 %7 13.3.2025 %9 %J JMIR Med Inform %G English %X Background: Missing data in electronic health records are highly prevalent and result in analytical concerns such as heterogeneous sources of bias and loss of statistical power. One simple analytic method for addressing missing or unknown covariate values is to treat missingness for a particular variable as a category onto itself, which we refer to as the missing indicator method. For cross-sectional analyses, recent work suggested that there was minimal benefit to the missing indicator method; however, it is unclear how this approach performs in the setting of longitudinal data, in which correlation among clustered repeated measures may be leveraged for potentially improved model performance. Objectives: This study aims to conduct a simulation study to evaluate whether the missing indicator method improved model performance and imputation accuracy for longitudinal data mimicking an application of developing a clinical prediction model for falls in older adults based on electronic health record data. Methods: We simulated a longitudinal binary outcome using mixed effects logistic regression that emulated a falls assessment at annual follow-up visits. Using multivariate imputation by chained equations, we simulated time-invariant predictors such as sex and medical history, as well as dynamic predictors such as physical function, BMI, and medication use. We induced missing data in predictors under scenarios that had both random (missing at random) and dependent missingness (missing not at random). We evaluated aggregate performance using the area under the receiver operating characteristic curve (AUROC) for models with and with no missing indicators as predictors, as well as complete case analysis, across simulation replicates. We evaluated imputation quality using normalized root-mean-square error for continuous variables and percent falsely classified for categorical variables. Results: Independent of the mechanism used to simulate missing data (missing at random or missing not at random), overall model performance via AUROC was similar regardless of whether missing indicators were included in the model. The root-mean-square error and percent falsely classified measures were similar for models including missing indicators versus those with no missing indicators. Model performance and imputation quality were similar regardless of whether the outcome was related to missingness. Imputation with or with no missing indicators had similar mean values of AUROC compared with complete case analysis, although complete case analysis had the largest range of values. Conclusions: The results of this study suggest that the inclusion of missing indicators in longitudinal data modeling neither improves nor worsens overall performance or imputation accuracy. Future research is needed to address whether the inclusion of missing indicators is useful in prediction modeling with longitudinal data in different settings, such as high dimensional data analysis. %R 10.2196/64354 %U https://medinform.jmir.org/2025/1/e64354 %U https://doi.org/10.2196/64354 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 13 %N %P e63216 %T Large Language Model–Based Critical Care Big Data Deployment and Extraction: Descriptive Analysis %A Yang,Zhongbao %A Xu,Shan-Shan %A Liu,Xiaozhu %A Xu,Ningyuan %A Chen,Yuqing %A Wang,Shuya %A Miao,Ming-Yue %A Hou,Mengxue %A Liu,Shuai %A Zhou,Yi-Min %A Zhou,Jian-Xin %A Zhang,Linlin %K big data %K critical care–related databases %K database deployment %K large language model %K database extraction %K intensive care unit %K ICU %K GPT %K artificial intelligence %K AI %K LLM %D 2025 %7 12.3.2025 %9 %J JMIR Med Inform %G English %X Background: Publicly accessible critical care–related databases contain enormous clinical data, but their utilization often requires advanced programming skills. The growing complexity of large databases and unstructured data presents challenges for clinicians who need programming or data analysis expertise to utilize these systems directly. Objective: This study aims to simplify critical care–related database deployment and extraction via large language models. Methods: The development of this platform was a 2-step process. First, we enabled automated database deployment using Docker container technology, with incorporated web-based analytics interfaces Metabase and Superset. Second, we developed the intensive care unit–generative pretrained transformer (ICU-GPT), a large language model fine-tuned on intensive care unit (ICU) data that integrated LangChain and Microsoft AutoGen. Results: The automated deployment platform was designed with user-friendliness in mind, enabling clinicians to deploy 1 or multiple databases in local, cloud, or remote environments without the need for manual setup. After successfully overcoming GPT’s token limit and supporting multischema data, ICU-GPT could generate Structured Query Language (SQL) queries and extract insights from ICU datasets based on request input. A front-end user interface was developed for clinicians to achieve code-free SQL generation on the web-based client. Conclusions: By harnessing the power of our automated deployment platform and ICU-GPT model, clinicians are empowered to easily visualize, extract, and arrange critical care–related databases more efficiently and flexibly than manual methods. Our research could decrease the time and effort spent on complex bioinformatics methods and advance clinical research. %R 10.2196/63216 %U https://medinform.jmir.org/2025/1/e63216 %U https://doi.org/10.2196/63216 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e51804 %T Impact of Demographic and Clinical Subgroups in Google Trends Data: Infodemiology Case Study on Asthma Hospitalizations %A Portela,Diana %A Freitas,Alberto %A Costa,Elísio %A Giovannini,Mattia %A Bousquet,Jean %A Almeida Fonseca,João %A Sousa-Pinto,Bernardo %+ Department of Community Medicine, Information and Health Decision Sciences, Faculty of Medicine, University of Porto, R. Dr. Plácido da Costa, Porto, 4200-450, Portugal, 351 22 551 3622, bernardosousapinto@protonmail.com %K infodemiology %K asthma %K administrative databases %K multimorbidity %K co-morbidity %K respiratory %K pulmonary %K Google Trends %K correlation %K hospitalization %K admissions %K autoregressive %K information seeking %K searching %K searches %K forecasting %D 2025 %7 10.3.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: Google Trends (GT) data have shown promising results as a complementary tool to classical surveillance approaches. However, GT data are not necessarily provided by a representative sample of patients and may be skewed toward demographic and clinical groups that are more likely to use the internet to search for their health. Objective: In this study, we aimed to assess whether GT-based models perform differently in distinct population subgroups. To assess that, we analyzed a case study on asthma hospitalizations. Methods: We analyzed all hospitalizations with a main diagnosis of asthma occurring in 3 different countries (Portugal, Spain, and Brazil) for a period of approximately 5 years (January 1, 2012-December 17, 2016). Data on web-based searches on common cold for the same countries and time period were retrieved from GT. We estimated the correlation between GT data and the weekly occurrence of asthma hospitalizations (considering separate asthma admissions data according to patients’ age, sex, ethnicity, and presence of comorbidities). In addition, we built autoregressive models to forecast the weekly number of asthma hospitalizations (for the different aforementioned subgroups) for a period of 1 year (June 2015-June 2016) based on admissions and GT data from the 3 previous years. Results: Overall, correlation coefficients between GT on the pseudo-influenza syndrome topic and asthma hospitalizations ranged between 0.33 (in Portugal for admissions with at least one Charlson comorbidity group) and 0.86 (for admissions in women and in White people in Brazil). In the 3 assessed countries, forecasted hospitalizations for 2015-2016 correlated more strongly with observed admissions of older versus younger individuals (Portugal: Spearman ρ=0.70 vs ρ=0.56; Spain: ρ=0.88 vs ρ=0.76; Brazil: ρ=0.83 vs ρ=0.82). In Portugal and Spain, forecasted hospitalizations had a stronger correlation with admissions occurring for women than men (Portugal: ρ=0.75 vs ρ=0.52; Spain: ρ=0.83 vs ρ=0.51). In Brazil, stronger correlations were observed for admissions of White than of Black or Brown individuals (ρ=0.92 vs ρ=0.87). In Portugal, stronger correlations were observed for admissions of individuals without any comorbidity compared with admissions of individuals with comorbidities (ρ=0.68 vs ρ=0.66). Conclusions: We observed that the models based on GT data may perform differently in demographic and clinical subgroups of participants, possibly reflecting differences in the composition of internet users’ health-seeking behaviors. %M 40063932 %R 10.2196/51804 %U https://www.jmir.org/2025/1/e51804 %U https://doi.org/10.2196/51804 %U http://www.ncbi.nlm.nih.gov/pubmed/40063932 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e64721 %T Emerging Domains for Measuring Health Care Delivery With Electronic Health Record Metadata %A Tawfik,Daniel %A Rule,Adam %A Alexanian,Aram %A Cross,Dori %A Holmgren,A Jay %A Lou,Sunny S %A McPeek Hinz,Eugenia %A Rose,Christian %A Viswanadham,Ratnalekha V N %A Mishuris,Rebecca G %A Rodríguez-Fernández,Jorge M %A Ford,Eric W %A Florig,Sarah T %A Sinsky,Christine A %A Apathy,Nate C %+ Department of Pediatrics, Stanford University School of Medicine, 770 Welch Road, Suite 435, Palo Alto, CA, 94304, United States, 1 6507239902, dtawfik@stanford.edu %K metadata %K health services research %K audit logs %K event logs %K electronic health record data %K health care delivery %K patient care %K healthcare teams %K clinician-patient relationship %K cognitive environment %D 2025 %7 6.3.2025 %9 Viewpoint %J J Med Internet Res %G English %X This article aims to introduce emerging measurement domains made feasible through the electronic health record (EHR) use metadata, to inform the changing landscape of health care delivery. We reviewed emerging domains in which EHR metadata may be used to measure health care delivery, outlining a framework for evaluating measures based on desirability, feasibility, and viability. We argue that EHR use metadata may be leveraged to develop and operationalize novel measures in the domains of team structure and dynamics, workflows, and cognitive environment to provide a clearer understanding of modern health care delivery. Examples of measures feasible using metadata include quantification of teamwork and collaboration, patient continuity measures, workflow conformity measures, and attention switching. By enabling measures that can be used to inform the next generation of health care delivery, EHR metadata may be used to improve the quality of patient care and support clinician well-being. Careful attention is needed to ensure that these measures are desirable, feasible, and viable. %M 40053814 %R 10.2196/64721 %U https://www.jmir.org/2025/1/e64721 %U https://doi.org/10.2196/64721 %U http://www.ncbi.nlm.nih.gov/pubmed/40053814 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e54543 %T Exploring Psychological Trends in Populations With Chronic Obstructive Pulmonary Disease During COVID-19 and Beyond: Large-Scale Longitudinal Twitter Mining Study %A Zhang,Chunyan %A Wang,Ting %A Dong,Caixia %A Dai,Duwei %A Zhou,Linyun %A Li,Zongfang %A Xu,Songhua %+ School of Electrical Engineering, Xi'an Jiaotong University, No. 28, Xianning West Road, Xi'an, 710049, China, 86 13289346632, wang.ting@xjtu.edu.cn %K COVID-19 %K chronic obstructive pulmonary disease (COPD) %K psychological trends %K Twitter %K data mining %K deep learning %D 2025 %7 5.3.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: Chronic obstructive pulmonary disease (COPD) ranks among the leading causes of global mortality, and COVID-19 has intensified its challenges. Beyond the evident physical effects, the long-term psychological effects of COVID-19 are not fully understood. Objective: This study aims to unveil the long-term psychological trends and patterns in populations with COPD throughout the COVID-19 pandemic and beyond via large-scale Twitter mining. Methods: A 2-stage deep learning framework was designed in this study. The first stage involved a data retrieval procedure to identify COPD and non-COPD users and to collect their daily tweets. In the second stage, a data mining procedure leveraged various deep learning algorithms to extract demographic characteristics, hashtags, topics, and sentiments from the collected tweets. Based on these data, multiple analytical methods, namely, odds ratio (OR), difference-in-difference, and emotion pattern methods, were used to examine the psychological effects. Results: A cohort of 15,347 COPD users was identified from the data that we collected in the Twitter database, comprising over 2.5 billion tweets, spanning from January 2020 to June 2023. The attentiveness toward COPD was significantly affected by gender, age, and occupation; it was lower in females (OR 0.91, 95% CI 0.87-0.94; P<.001) than in males, higher in adults aged 40 years and older (OR 7.23, 95% CI 6.95-7.52; P<.001) than in those younger than 40 years, and higher in individuals with lower socioeconomic status (OR 1.66, 95% CI 1.60-1.72; P<.001) than in those with higher socioeconomic status. Across the study duration, COPD users showed decreasing concerns for COVID-19 and increasing health-related concerns. After the middle phase of COVID-19 (July 2021), a distinct decrease in sentiments among COPD users contrasted sharply with the upward trend among non-COPD users. Notably, in the post-COVID era (June 2023), COPD users showed reduced levels of joy and trust and increased levels of fear compared to their levels of joy and trust in the middle phase of COVID-19. Moreover, males, older adults, and individuals with lower socioeconomic status showed heightened fear compared to their counterparts. Conclusions: Our data analysis results suggest that populations with COPD experienced heightened mental stress in the post-COVID era. This underscores the importance of developing tailored interventions and support systems that account for diverse population characteristics. %M 40053739 %R 10.2196/54543 %U https://www.jmir.org/2025/1/e54543 %U https://doi.org/10.2196/54543 %U http://www.ncbi.nlm.nih.gov/pubmed/40053739 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e68083 %T Empowering Health Care Actors to Contribute to the Implementation of Health Data Integration Platforms: Retrospective of the medEmotion Project %A Parciak,Marcel %A Pierlet,Noëlla %A Peeters,Liesbet M %+ Biomedical Research Institute, UHasselt - Hasselt University, Agoralaan, Diepenbeek, 3590, Belgium, 32 11269288, marcel.parciak@uhasselt.be %K data science %K health data integration %K health data platform %K real-world evidence %K health care %K health data %K data %K integration platforms %K collaborative %K platform %K Belgium %K Europe %K personas %K communication %K health care providers %K hospital-specific requirements %K digital health %D 2025 %7 4.3.2025 %9 Viewpoint %J J Med Internet Res %G English %X Health data integration platforms are vital to drive collaborative, interdisciplinary medical research projects. Developing such a platform requires input from different stakeholders. Managing these stakeholders and steering platform development is challenging, and misaligning the platform to the partners’ strategies might lead to a low acceptance of the final platform. We present the medEmotion project, a collaborative effort among 7 partners from health care, academia, and industry to develop a health data integration platform for the region of Limburg in Belgium. We focus on the development process and stakeholder engagement, aiming to give practical advice for similar future efforts based on our reflections on medEmotion. We introduce Personas to paraphrase different roles that stakeholders take and Demonstrators that summarize personas’ requirements with respect to the platform. Both the personas and the demonstrators serve 2 purposes. First, they are used to define technical requirements for the medEmotion platform. Second, they represent a communication vehicle that simplifies discussions among all stakeholders. Based on the personas and demonstrators, we present the medEmotion platform based on components from the Microsoft Azure cloud. The demonstrators are based on real-world use cases and showcase the utility of the platform. We reflect on the development process of medEmotion and distill takeaway messages that will be helpful for future projects. Investing in community building, stakeholder engagement, and education is vital to building an ecosystem for a health data integration platform. Instead of academic-led projects, the health care providers themselves ideally drive collaboration among health care providers. The providers are best positioned to address hospital-specific requirements, while academics take a neutral mediator role. This also includes the ideation phase, where it is vital to ensure the involvement of all stakeholders. Finally, balancing innovation with implementation is key to developing an innovative yet sustainable health data integration platform. %M 40053761 %R 10.2196/68083 %U https://www.jmir.org/2025/1/e68083 %U https://doi.org/10.2196/68083 %U http://www.ncbi.nlm.nih.gov/pubmed/40053761 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 13 %N %P e64422 %T Analysis of Retinal Thickness in Patients With Chronic Diseases Using Standardized Optical Coherence Tomography Data: Database Study Based on the Radiology Common Data Model %A Park,ChulHyoung %A Lee,So Hee %A Lee,Da Yun %A Choi,Seoyoon %A You,Seng Chan %A Jeon,Ja Young %A Park,Sang Jun %A Park,Rae Woong %K data standardization %K ophthalmology %K radiology %K optical coherence tomography %K retinal thickness %D 2025 %7 21.2.2025 %9 %J JMIR Med Inform %G English %X Background: The Observational Medical Outcome Partners-Common Data Model (OMOP-CDM) is an international standard for harmonizing electronic medical record (EMR) data. However, since it does not standardize unstructured data, such as medical imaging, using this data in multi-institutional collaborative research becomes challenging. To overcome this limitation, extensions such as the Radiology Common Data Model (R-CDM) have emerged to include and standardize these data types. Objective: This work aims to demonstrate that by standardizing optical coherence tomography (OCT) data into an R-CDM format, multi-institutional collaborative studies analyzing changes in retinal thickness in patients with long-standing chronic diseases can be performed efficiently. Methods: We standardized OCT images collected from two tertiary hospitals for research purposes using the R-CDM. As a proof of concept, we conducted a comparative analysis of retinal thickness between patients who have chronic diseases and those who have not. Patients diagnosed or treated for retinal and choroidal diseases, which could affect retinal thickness, were excluded from the analysis. Using the existing OMOP-CDM at each institution, we extracted cohorts of patients with chronic diseases and control groups, performing large-scale 1:2 propensity score matching (PSM). Subsequently, we linked the OMOP-CDM and R-CDM to extract the OCT image data of these cohorts and analyzed central macular thickness (CMT) and retinal nerve fiber layer (RNFL) thickness using a linear mixed model. Results: OCT data of 261,874 images from Ajou University Medical Center (AUMC) and 475,626 images from Seoul National University Bundang Hospital (SNUBH) were standardized in the R-CDM format. The R-CDM databases established at each institution were linked with the OMOP-CDM database. Following 1:2 PSM, the type 2 diabetes mellitus (T2DM) cohort included 957 patients, and the control cohort had 1603 patients. During the follow-up period, significant reductions in CMT were observed in the T2DM cohorts at AUMC (P=.04) and SNUBH (P=.007), without significant changes in RNFL thickness (AUMC: P=.56; SNUBH: P=.39). Notably, a significant reduction in CMT during the follow-up was observed only at AUMC in the hypertension cohort, compared to the control group (P=.04); no other significant differences in retinal thickness were found in the remaining analyses. Conclusions: The significance of our study lies in demonstrating the efficiency of multi-institutional collaborative research that simultaneously uses clinical data and medical imaging data by leveraging the OMOP-CDM for standardizing EMR data and the R-CDM for standardizing medical imaging data. %R 10.2196/64422 %U https://medinform.jmir.org/2025/1/e64422 %U https://doi.org/10.2196/64422 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e66910 %T Using Structured Codes and Free-Text Notes to Measure Information Complementarity in Electronic Health Records: Feasibility and Validation Study %A Seinen,Tom M %A Kors,Jan A %A van Mulligen,Erik M %A Rijnbeek,Peter R %+ Department of Medical Informatics, Erasmus University Medical Center, Dr. Molewaterplein 40, Rotterdam, 3015 GD, Netherlands, 31 010 7044122, t.seinen@erasmusmc.nl %K natural language processing %K named entity recognition %K clinical concept extraction %K machine learning %K electronic health records %K EHR %K word embeddings %K clinical concept similarity %K text mining %K code %K free-text %K information %K electronic record %K data %K patient records %K framework %K structured data %K unstructured data %D 2025 %7 13.2.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: Electronic health records (EHRs) consist of both structured data (eg, diagnostic codes) and unstructured data (eg, clinical notes). It is commonly believed that unstructured clinical narratives provide more comprehensive information. However, this assumption lacks large-scale validation and direct validation methods. Objective: This study aims to quantitatively compare the information in structured and unstructured EHR data and directly validate whether unstructured data offers more extensive information across a patient population. Methods: We analyzed both structured and unstructured data from patient records and visits in a large Dutch primary care EHR database between January 2021 and January 2024. Clinical concepts were identified from free-text notes using an extraction framework tailored for Dutch and compared with concepts from structured data. Concept embeddings were generated to measure semantic similarity between structured and extracted concepts through cosine similarity. A similarity threshold was systematically determined via annotated matches and minimized weighted Gini impurity. We then quantified the concept overlap between structured and unstructured data across various concept domains and patient populations. Results: In a population of 1.8 million patients, only 13% of extracted concepts from patient records and 7% from individual visits had similar structured counterparts. Conversely, 42% of structured concepts in records and 25% in visits had similar matches in unstructured data. Condition concepts had the highest overlap, followed by measurements and drug concepts. Subpopulation visits, such as those with chronic conditions or psychological disorders, showed different proportions of data overlap, indicating varied reliance on structured versus unstructured data across clinical contexts. Conclusions: Our study demonstrates the feasibility of quantifying the information difference between structured and unstructured data, showing that the unstructured data provides important additional information in the studied database and populations. The annotated concept matches are made publicly available for the clinical natural language processing community. Despite some limitations, our proposed methodology proves versatile, and its application can lead to more robust and insightful observational clinical research. %M 39946687 %R 10.2196/66910 %U https://www.jmir.org/2025/1/e66910 %U https://doi.org/10.2196/66910 %U http://www.ncbi.nlm.nih.gov/pubmed/39946687 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e48775 %T Subtyping Social Determinants of Health in the "All of Us" Program: Network Analysis and Visualization Study %A Bhavnani,Suresh K %A Zhang,Weibin %A Bao,Daniel %A Raji,Mukaila %A Ajewole,Veronica %A Hunter,Rodney %A Kuo,Yong-Fang %A Schmidt,Susanne %A Pappadis,Monique R %A Smith,Elise %A Bokov,Alex %A Reistetter,Timothy %A Visweswaran,Shyam %A Downer,Brian %+ School of Public and Population Health, Department of Biostatistics & Data Science, University of Texas Medical Branch, 301 University Boulevard, Galveston, TX, 77555, United States, 1 (734) 772 1929, subhavna@utmb.edu %K social determinants of health %K All of Us %K bipartite networks %K financial resources %K health care %K health outcomes %K precision medicine %K decision support %K health industry %K clinical implications %K machine learning methods %D 2025 %7 11.2.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: Social determinants of health (SDoH), such as financial resources and housing stability, account for between 30% and 55% of people’s health outcomes. While many studies have identified strong associations between specific SDoH and health outcomes, little is known about how SDoH co-occur to form subtypes critical for designing targeted interventions. Such analysis has only now become possible through the All of Us program. Objective: This study aims to analyze the All of Us dataset for addressing two research questions: (1) What are the range of and responses to survey questions related to SDoH? and (2) How do SDoH co-occur to form subtypes, and what are their risks for adverse health outcomes? Methods: For question 1, an expert panel analyzed the range of and responses to SDoH questions across 6 surveys in the full All of Us dataset (N=372,397; version 6). For question 2, due to systematic missingness and uneven granularity of questions across the surveys, we selected all participants with valid and complete SDoH data and used inverse probability weighting to adjust their imbalance in demographics. Next, an expert panel grouped the SDoH questions into SDoH factors to enable more consistent granularity. To identify the subtypes, we used bipartite modularity maximization for identifying SDoH biclusters and measured their significance and replicability. Next, we measured their association with 3 outcomes (depression, delayed medical care, and emergency room visits in the last year). Finally, the expert panel inferred the subtype labels, potential mechanisms, and targeted interventions. Results: The question 1 analysis identified 110 SDoH questions across 4 surveys covering all 5 domains in Healthy People 2030. As the SDoH questions varied in granularity, they were categorized by an expert panel into 18 SDoH factors. The question 2 analysis (n=12,913; d=18) identified 4 biclusters with significant biclusteredness (Q=0.13; random-Q=0.11; z=7.5; P<.001) and significant replication (real Rand index=0.88; random Rand index=0.62; P<.001). Each subtype had significant associations with specific outcomes and had meaningful interpretations and potential targeted interventions. For example, the Socioeconomic barriers subtype included 6 SDoH factors (eg, not employed and food insecurity) and had a significantly higher odds ratio (4.2, 95% CI 3.5-5.1; P<.001) for depression when compared to other subtypes. The expert panel inferred implications of the results for designing interventions and health care policies based on SDoH subtypes. Conclusions: This study identified SDoH subtypes that had statistically significant biclusteredness and replicability, each of which had significant associations with specific adverse health outcomes and with translational implications for targeted SDoH interventions and health care policies. However, the high degree of systematic missingness requires repeating the analysis as the data become more complete by using our generalizable and scalable machine learning code available on the All of Us workbench. %M 39932771 %R 10.2196/48775 %U https://www.jmir.org/2025/1/e48775 %U https://doi.org/10.2196/48775 %U http://www.ncbi.nlm.nih.gov/pubmed/39932771 %0 Journal Article %@ 2368-7959 %I JMIR Publications %V 12 %N %P e64445 %T Use of Digital Health Technologies for Dementia Care: Bibliometric Analysis and Report %A Abdulazeem,Hebatullah %A Borges do Nascimento,Israel Júnior %A Weerasekara,Ishanka %A Sharifan,Amin %A Grandi Bianco,Victor %A Cunningham,Ciara %A Kularathne,Indunil %A Deeken,Genevieve %A de Barros,Jerome %A Sathian,Brijesh %A Østengaard,Lasse %A Lamontagne-Godwin,Frederique %A van Hoof,Joost %A Lazeri,Ledia %A Redlich,Cassie %A Marston,Hannah R %A Dos Santos,Ryan Alistair %A Azzopardi-Muscat,Natasha %A Yon,Yongjie %A Novillo-Ortiz,David %+ Division of Country Health Policies and Systems, World Health Organization Regional Office for Europe, Marmorvej, 51, Copenhagen, 2100, Denmark, 45 45 33 7198, dnovillo@who.int %K people living with dementia %K digital health technologies %K bibliometric analysis %K evidence-based medicine %D 2025 %7 10.2.2025 %9 Review %J JMIR Ment Health %G English %X Background: Dementia is a syndrome that compromises neurocognitive functions of the individual and that is affecting 55 million individuals globally, as well as global health care systems, national economic systems, and family members. Objective: This study aimed to determine the status quo of scientific production on use of digital health technologies (DHTs) to support (older) people living with dementia, their families, and care partners. In addition, our study aimed to map the current landscape of global research initiatives on DHTs on the prevention, diagnosis, treatment, and support of people living with dementia and their caregivers. Methods: A bibliometric analysis was performed as part of a systematic review protocol using MEDLINE, Embase, Scopus, Epistemonikos, the Cochrane Database of Systematic Reviews, and Google Scholar for systematic and scoping reviews on DHTs and dementia up to February 21, 2024. Search terms included various forms of dementia and DHTs. Two independent reviewers conducted a 2-stage screening process with disagreements resolved by a third reviewer. Eligible reviews were then subjected to a bibliometric analysis using VOSviewer to evaluate document types, authorship, countries, institutions, journal sources, references, and keywords, creating social network maps to visualize emergent research trends. Results: A total of 704 records met the inclusion criteria for bibliometric analysis. Most reviews were systematic, with a substantial number covering mobile health, telehealth, and computer-based cognitive interventions. Bibliometric analysis revealed that the Journal of Medical Internet Research had the highest number of reviews and citations. Researchers from 66 countries contributed, with the United Kingdom and the United States as the most prolific. Overall, the number of publications covering the intersection of DHTs and dementia has increased steadily over time. However, the diversity of reviews conducted on a single topic has resulted in duplicated scientific efforts. Our assessment of contributions from countries, institutions, and key stakeholders reveals significant trends and knowledge gaps, particularly highlighting the dominance of high-income countries in this research domain. Furthermore, our findings emphasize the critical importance of interdisciplinary, collaborative teams and offer clear directions for future research, especially in underrepresented regions. Conclusions: Our study shows a steady increase in dementia- and DHT-related publications, particularly in areas such as mobile health, virtual reality, artificial intelligence, and sensor-based technologies interventions. This increase underscores the importance of systematic approaches and interdisciplinary collaborations, while identifying knowledge gaps, especially in lower-income regions. It is crucial that researchers worldwide adhere to evidence-based medicine principles to avoid duplication of efforts. This analysis offers a valuable foundation for policy makers and academics, emphasizing the need for an international collaborative task force to address knowledge gaps and advance dementia care globally. Trial Registration: PROSPERO CRD42024511241; https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=511241 %M 39928936 %R 10.2196/64445 %U https://mental.jmir.org/2025/1/e64445 %U https://doi.org/10.2196/64445 %U http://www.ncbi.nlm.nih.gov/pubmed/39928936 %0 Journal Article %@ 2564-1891 %I JMIR Publications %V 5 %N %P e53434 %T Unveiling Topics and Emotions in Arabic Tweets Surrounding the COVID-19 Pandemic: Topic Modeling and Sentiment Analysis Approach %A Alshanik,Farah %A Khasawneh,Rawand %A Dalky,Alaa %A Qawasmeh,Ethar %+ Department of Computer Science, Faculty of Computer and Information Technology, Jordan University of Science and Technology, Alhusun St, Irbid, 22110, Jordan, 962 2 7201000 ext 23130, fmalshanik@just.edu.jo %K topic modeling %K sentiment analysis %K COVID-19 %K social media %K Twitter %K public discussion %D 2025 %7 10.2.2025 %9 Original Paper %J JMIR Infodemiology %G English %X Background: The worldwide effects of the COVID-19 pandemic have been profound, and the Arab world has not been exempt from its wide-ranging consequences. Within this context, social media platforms such as Twitter have become essential for sharing information and expressing public opinions during this global crisis. Careful investigation of Arabic tweets related to COVID-19 can provide invaluable insights into the common topics and underlying sentiments that shape discussions about the COVID-19 pandemic. Objective: This study aimed to understand the concerns and feelings of Twitter users in Arabic-speaking countries about the COVID-19 pandemic. This was accomplished through analyzing the themes and sentiments that were expressed in Arabic tweets about the COVID-19 pandemic. Methods: In this study, 1 million Arabic tweets about COVID-19 posted between March 1 and March 31, 2020, were analyzed. Machine learning techniques, such as topic modeling and sentiment analysis, were applied to understand the main topics and emotions that were expressed in these tweets. Results: The analysis of Arabic tweets revealed several prominent topics related to COVID-19. The analysis identified and grouped 16 different conversation topics that were organized into eight themes: (1) preventive measures and safety, (2) medical and health care aspects, (3) government and social measures, (4) impact and numbers, (5) vaccine development and research, (6) COVID-19 and religious practices, (7) global impact of COVID-19 on sports and countries, and (8) COVID-19 and national efforts. Across all the topics identified, the prevailing sentiments regarding the spread of COVID-19 were primarily centered around anger, followed by disgust, joy, and anticipation. Notably, when conversations revolved around new COVID-19 cases and fatalities, public tweets revealed a notably heightened sense of anger in comparison to other subjects. Conclusions: The study offers valuable insights into the topics and emotions expressed in Arabic tweets related to COVID-19. It demonstrates the significance of social media platforms, particularly Twitter, in capturing the Arabic-speaking community’s concerns and sentiments during the COVID-19 pandemic. The findings contribute to a deeper understanding of the prevailing discourse, enabling stakeholders to tailor effective communication strategies and address specific public concerns. This study underscores the importance of monitoring social media conversations in Arabic to support public health efforts and crisis management during the COVID-19 pandemic. %M 39928401 %R 10.2196/53434 %U https://infodemiology.jmir.org/2025/1/e53434 %U https://doi.org/10.2196/53434 %U http://www.ncbi.nlm.nih.gov/pubmed/39928401 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e63550 %T ChatGPT for Univariate Statistics: Validation of AI-Assisted Data Analysis in Healthcare Research %A Ruta,Michael R %A Gaidici,Tony %A Irwin,Chase %A Lifshitz,Jonathan %+ University of Arizona College of Medicine – Phoenix, 475 N 5th St, Phoenix, AZ, 85004, United States, 1 602 827 2002, mruta@arizona.edu %K ChatGPT %K data analysis %K statistics %K chatbot %K artificial intelligence %K biomedical research %K programmers %K bioinformatics %K data processing %D 2025 %7 7.2.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: ChatGPT, a conversational artificial intelligence developed by OpenAI, has rapidly become an invaluable tool for researchers. With the recent integration of Python code interpretation into the ChatGPT environment, there has been a significant increase in the potential utility of ChatGPT as a research tool, particularly in terms of data analysis applications. Objective: This study aimed to assess ChatGPT as a data analysis tool and provide researchers with a framework for applying ChatGPT to data management tasks, descriptive statistics, and inferential statistics. Methods: A subset of the National Inpatient Sample was extracted. Data analysis trials were divided into data processing, categorization, and tabulation, as well as descriptive and inferential statistics. For data processing, categorization, and tabulation assessments, ChatGPT was prompted to reclassify variables, subset variables, and present data, respectively. Descriptive statistics assessments included mean, SD, median, and IQR calculations. Inferential statistics assessments were conducted at varying levels of prompt specificity (“Basic,” “Intermediate,” and “Advanced”). Specific tests included chi-square, Pearson correlation, independent 2-sample t test, 1-way ANOVA, Fisher exact, Spearman correlation, Mann-Whitney U test, and Kruskal-Wallis H test. Outcomes from consecutive prompt-based trials were assessed against expected statistical values calculated in Python (Python Software Foundation), SAS (SAS Institute), and RStudio (Posit PBC). Results: ChatGPT accurately performed data processing, categorization, and tabulation across all trials. For descriptive statistics, it provided accurate means, SDs, medians, and IQRs across all trials. Inferential statistics accuracy against expected statistical values varied with prompt specificity: 32.5% accuracy for “Basic” prompts, 81.3% for “Intermediate” prompts, and 92.5% for “Advanced” prompts. Conclusions: ChatGPT shows promise as a tool for exploratory data analysis, particularly for researchers with some statistical knowledge and limited programming expertise. However, its application requires careful prompt construction and human oversight to ensure accuracy. As a supplementary tool, ChatGPT can enhance data analysis efficiency and broaden research accessibility. %M 39919289 %R 10.2196/63550 %U https://www.jmir.org/2025/1/e63550 %U https://doi.org/10.2196/63550 %U http://www.ncbi.nlm.nih.gov/pubmed/39919289 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 13 %N %P e64479 %T Identification of Clusters in a Population With Obesity Using Machine Learning: Secondary Analysis of The Maastricht Study %A Beuken,Maik JM %A Kleynen,Melanie %A Braun,Susy %A Van Berkel,Kees %A van der Kallen,Carla %A Koster,Annemarie %A Bosma,Hans %A Berendschot,Tos TJM %A Houben,Alfons JHM %A Dukers-Muijrers,Nicole %A van den Bergh,Joop P %A Kroon,Abraham A %A , %A Kanera,Iris M %+ Faculty of Financial Management, Research Center for Statistics & Data Science, Zuyd University of Applied Sciences, Ligne 1, Sittard, 6131 MT, Netherlands, 31 682243809, maik.beuken@zuyd.nl %K Maastricht Study %K participant clusters %K cluster analysis %K factor probabilistic distance clustering %K FPDC algorithm %K statistically equivalent signature %K SES feature selection %K unsupervised machine learning %K obesity %K hypothesis free %K risk factor %K physical inactivity %K poor nutrition %K physical activity %K chronic disease %K type 2 diabetes %K diabetes %K heart disease %K long-term behavior change %D 2025 %7 5.2.2025 %9 Original Paper %J JMIR Med Inform %G English %X Background: Modern lifestyle risk factors, like physical inactivity and poor nutrition, contribute to rising rates of obesity and chronic diseases like type 2 diabetes and heart disease. Particularly personalized interventions have been shown to be effective for long-term behavior change. Machine learning can be used to uncover insights without predefined hypotheses, revealing complex relationships and distinct population clusters. New data-driven approaches, such as the factor probabilistic distance clustering algorithm, provide opportunities to identify potentially meaningful clusters within large and complex datasets. Objective: This study aimed to identify potential clusters and relevant variables among individuals with obesity using a data-driven and hypothesis-free machine learning approach. Methods: We used cross-sectional data from individuals with abdominal obesity from The Maastricht Study. Data (2971 variables) included demographics, lifestyle, biomedical aspects, advanced phenotyping, and social factors (cohort 2010). The factor probabilistic distance clustering algorithm was applied in order to detect clusters within this high-dimensional data. To identify a subset of distinct, minimally redundant, predictive variables, we used the statistically equivalent signature algorithm. To describe the clusters, we applied measures of central tendency and variability, and we assessed the distinctiveness of the clusters through the emerged variables using the F test for continuous variables and the chi-square test for categorical variables at a confidence level of α=.001 Results: We identified 3 distinct clusters (including 4128/9188, 44.93% of all data points) among individuals with obesity (n=4128). The most significant continuous variable for distinguishing cluster 1 (n=1458) from clusters 2 and 3 combined (n=2670) was the lower energy intake (mean 1684, SD 393 kcal/day vs mean 2358, SD 635 kcal/day; P<.001). The most significant categorical variable was occupation (P<.001). A significantly higher proportion (1236/1458, 84.77%) in cluster 1 did not work compared to clusters 2 and 3 combined (1486/2670, 55.66%; P<.001). For cluster 2 (n=1521), the most significant continuous variable was a higher energy intake (mean 2755, SD 506.2 kcal/day vs mean 1749, SD 375 kcal/day; P<.001). The most significant categorical variable was sex (P<.001). A significantly higher proportion (997/1521, 65.55%) in cluster 2 were male compared to the other 2 clusters (885/2607, 33.95%; P<.001). For cluster 3 (n=1149), the most significant continuous variable was overall higher cognitive functioning (mean 0.2349, SD 0.5702 vs mean –0.3088, SD 0.7212; P<.001), and educational level was the most significant categorical variable (P<.001). A significantly higher proportion (475/1149, 41.34%) in cluster 3 received higher vocational or university education in comparison to clusters 1 and 2 combined (729/2979, 24.47%; P<.001). Conclusions: This study demonstrates that a hypothesis-free and fully data-driven approach can be used to identify distinguishable participant clusters in large and complex datasets and find relevant variables that differ within populations with obesity. %M 39908080 %R 10.2196/64479 %U https://medinform.jmir.org/2025/1/e64479 %U https://doi.org/10.2196/64479 %U http://www.ncbi.nlm.nih.gov/pubmed/39908080 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 13 %N %P e59452 %T The Social Construction of Categorical Data: Mixed Methods Approach to Assessing Data Features in Publicly Available Datasets %A Willem,Theresa %A Wollek,Alessandro %A Cheslerean-Boghiu,Theodor %A Kenney,Martha %A Buyx,Alena %+ Institute of History and Ethics in Medicine, School of Medicine and Health, Technical University of Munich, Ismaningerstraße 22, Munich, 81675, Germany, 49 89 4140 4041, theresa.willem@tum.de %K machine learning %K categorical data %K social context dependency %K mixed methods %K dermatology %K dataset analysis %D 2025 %7 28.1.2025 %9 Original Paper %J JMIR Med Inform %G English %X Background: In data-sparse areas such as health care, computer scientists aim to leverage as much available information as possible to increase the accuracy of their machine learning models’ outputs. As a standard, categorical data, such as patients’ gender, socioeconomic status, or skin color, are used to train models in fusion with other data types, such as medical images and text-based medical information. However, the effects of including categorical data features for model training in such data-scarce areas are underexamined, particularly regarding models intended to serve individuals equitably in a diverse population. Objective: This study aimed to explore categorical data’s effects on machine learning model outputs, rooted the effects in the data collection and dataset publication processes, and proposed a mixed methods approach to examining datasets’ data categories before using them for machine learning training. Methods: Against the theoretical background of the social construction of categories, we suggest a mixed methods approach to assess categorical data’s utility for machine learning model training. As an example, we applied our approach to a Brazilian dermatological dataset (Dermatological and Surgical Assistance Program at the Federal University of Espírito Santo [PAD-UFES] 20). We first present an exploratory, quantitative study that assesses the effects when including or excluding each of the unique categorical data features of the PAD-UFES 20 dataset for training a transformer-based model using a data fusion algorithm. We then pair our quantitative analysis with a qualitative examination of the data categories based on interviews with the dataset authors. Results: Our quantitative study suggests scattered effects of including categorical data for machine learning model training across predictive classes. Our qualitative analysis gives insights into how the categorical data were collected and why they were published, explaining some of the quantitative effects that we observed. Our findings highlight the social constructedness of categorical data in publicly available datasets, meaning that the data in a category heavily depend on both how these categories are defined by the dataset creators and the sociomedico context in which the data are collected. This reveals relevant limitations of using publicly available datasets in contexts different from those of the collection of their data. Conclusions: We caution against using data features of publicly available datasets without reflection on the social construction and context dependency of their categorical data features, particularly in data-sparse areas. We conclude that social scientific, context-dependent analysis of available data features using both quantitative and qualitative methods is helpful in judging the utility of categorical data for the population for which a model is intended. %M 39874567 %R 10.2196/59452 %U https://medinform.jmir.org/2025/1/e59452 %U https://doi.org/10.2196/59452 %U http://www.ncbi.nlm.nih.gov/pubmed/39874567 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 13 %N %P e54133 %T Robust Automated Harmonization of Heterogeneous Data Through Ensemble Machine Learning: Algorithm Development and Validation Study %A Yang,Doris %A Zhou,Doudou %A Cai,Steven %A Gan,Ziming %A Pencina,Michael %A Avillach,Paul %A Cai,Tianxi %A Hong,Chuan %K ensemble learning %K semantic learning %K distribution learning %K variable harmonization %K machine learning %K cardiovascular health study %K intracohort comparison %K intercohort comparison %K gold standard labels %D 2025 %7 22.1.2025 %9 %J JMIR Med Inform %G English %X Background: Cohort studies contain rich clinical data across large and diverse patient populations and are a common source of observational data for clinical research. Because large scale cohort studies are both time and resource intensive, one alternative is to harmonize data from existing cohorts through multicohort studies. However, given differences in variable encoding, accurate variable harmonization is difficult. Objective: We propose SONAR (Semantic and Distribution-Based Harmonization) as a method for harmonizing variables across cohort studies to facilitate multicohort studies. Methods: SONAR used semantic learning from variable descriptions and distribution learning from study participant data. Our method learned an embedding vector for each variable and used pairwise cosine similarity to score the similarity between variables. This approach was built off 3 National Institutes of Health cohorts, including the Cardiovascular Health Study, the Multi-Ethnic Study of Atherosclerosis, and the Women’s Health Initiative. We also used gold standard labels to further refine the embeddings in a supervised manner. Results: The method was evaluated using manually curated gold standard labels from the 3 National Institutes of Health cohorts. We evaluated both the intracohort and intercohort variable harmonization performance. The supervised SONAR method outperformed existing benchmark methods for almost all intracohort and intercohort comparisons using area under the curve and top-k accuracy metrics. Notably, SONAR was able to significantly improve harmonization of concepts that were difficult for existing semantic methods to harmonize. Conclusions: SONAR achieves accurate variable harmonization within and between cohort studies by harnessing the complementary strengths of semantic learning and variable distribution learning. %R 10.2196/54133 %U https://medinform.jmir.org/2025/1/e54133 %U https://doi.org/10.2196/54133 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e69742 %T Advantages and Inconveniences of a Multi-Agent Large Language Model System to Mitigate Cognitive Biases in Diagnostic Challenges %A Bousquet,Cedric %A Beltramin,Divà %+ Laboratory of Medical Informatics and Knowledge Engineering in e-Health, Inserm, Sorbonne University, 15 rue de l'école de Médecine, Paris, F-75006, France, 33 0477127974, cedric.bousquet@chu-st-etienne.fr %K large language model %K multi-agent system %K diagnostic errors %K cognition %K clinical decision-making %K cognitive bias %K generative artificial intelligence %D 2025 %7 20.1.2025 %9 Letter to the Editor %J J Med Internet Res %G English %X %M 39832364 %R 10.2196/69742 %U https://www.jmir.org/2025/1/e69742 %U https://doi.org/10.2196/69742 %U http://www.ncbi.nlm.nih.gov/pubmed/39832364 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 27 %N %P e52385 %T A Digital Tool for Clinical Evidence–Driven Guideline Development by Studying Properties of Trial Eligible and Ineligible Populations: Development and Usability Study %A Mumtaz,Shahzad %A McMinn,Megan %A Cole,Christian %A Gao,Chuang %A Hall,Christopher %A Guignard-Duff,Magalie %A Huang,Huayi %A McAllister,David A %A Morales,Daniel R %A Jefferson,Emily %A Guthrie,Bruce %+ Division of Population Health and Genomics, School of Medicine, University of Dundee, The Health Informatics Centre, Ninewells Hospital and Medical School, Dundee, DD2 1FD, United Kingdom, 44 01382383943, e.r.jefferson@dundee.ac.uk %K multimorbidity %K clinical practice guideline %K gout %K Trusted Research Environment %K National Institute for Health and Care Excellence %K Scottish Intercollegiate Guidelines Network %K clinical practice %K development %K efficacy %K validity %K epidemiological data %K epidemiology %K epidemiological %K digital tool %K tool %K age %K gender %K ethnicity %K mortality %K feedback %K availability %D 2025 %7 16.1.2025 %9 Original Paper %J J Med Internet Res %G English %X Background: Clinical guideline development preferentially relies on evidence from randomized controlled trials (RCTs). RCTs are gold-standard methods to evaluate the efficacy of treatments with the highest internal validity but limited external validity, in the sense that their findings may not always be applicable to or generalizable to clinical populations or population characteristics. The external validity of RCTs for the clinical population is constrained by the lack of tailored epidemiological data analysis designed for this purpose due to data governance, consistency of disease or condition definitions, and reduplicated effort in analysis code. Objective: This study aims to develop a digital tool that characterizes the overall population and differences between clinical trial eligible and ineligible populations from the clinical populations of a disease or condition regarding demography (eg, age, gender, ethnicity), comorbidity, coprescription, hospitalization, and mortality. Currently, the process is complex, onerous, and time-consuming, whereas a real-time tool may be used to rapidly inform a guideline developer’s judgment about the applicability of evidence. Methods: The National Institute for Health and Care Excellence—particularly the gout guideline development group—and the Scottish Intercollegiate Guidelines Network guideline developers were consulted to gather their requirements and evidential data needs when developing guidelines. An R Shiny (R Foundation for Statistical Computing) tool was designed and developed using electronic primary health care data linked with hospitalization and mortality data built upon an optimized data architecture. Disclosure control mechanisms were built into the tool to ensure data confidentiality. The tool was deployed within a Trusted Research Environment, allowing only trusted preapproved researchers to conduct analysis. Results: The tool supports 128 chronic health conditions as index conditions and 161 conditions as comorbidities (33 in addition to the 128 index conditions). It enables 2 types of analyses via the graphic interface: overall population and stratified by user-defined eligibility criteria. The analyses produce an overview of statistical tables (eg, age, gender) of the index condition population and, within the overview groupings, produce details on, for example, electronic frailty index, comorbidities, and coprescriptions. The disclosure control mechanism is integral to the tool, limiting tabular counts to meet local governance needs. An exemplary result for gout as an index condition is presented to demonstrate the tool’s functionality. Guideline developers from the National Institute for Health and Care Excellence and the Scottish Intercollegiate Guidelines Network provided positive feedback on the tool. Conclusions: The tool is a proof-of-concept, and the user feedback has demonstrated that this is a step toward computer-interpretable guideline development. Using the digital tool can potentially improve evidence-driven guideline development through the availability of real-world data in real time. %M 39819848 %R 10.2196/52385 %U https://www.jmir.org/2025/1/e52385 %U https://doi.org/10.2196/52385 %U http://www.ncbi.nlm.nih.gov/pubmed/39819848 %0 Journal Article %@ 2368-7959 %I JMIR Publications %V 11 %N %P e59113 %T Implementing Findable, Accessible, Interoperable, Reusable (FAIR) Principles in Child and Adolescent Mental Health Research: Mixed Methods Approach %A de Groot,Rowdy %A van der Graaff,Frank %A van der Doelen,Daniël %A Luijten,Michiel %A De Meyer,Ronald %A Alrouh,Hekmat %A van Oers,Hedy %A Tieskens,Jacintha %A Zijlmans,Josjan %A Bartels,Meike %A Popma,Arne %A de Keizer,Nicolette %A Cornet,Ronald %A Polderman,Tinca J C %K FAIR data %K research data management %K data interoperability %K data standardization %K OMOP CDM %K implementation %K health data %K data quality %K FAIR principles %D 2024 %7 19.12.2024 %9 %J JMIR Ment Health %G English %X Background: The FAIR (Findable, Accessible, Interoperable, Reusable) data principles are a guideline to improve the reusability of data. However, properly implementing these principles is challenging due to a wide range of barriers. Objectives: To further the field of FAIR data, this study aimed to systematically identify barriers regarding implementing the FAIR principles in the area of child and adolescent mental health research, define the most challenging barriers, and provide recommendations for these barriers. Methods: Three sources were used as input to identify barriers: (1) evaluation of the implementation process of the Observational Medical Outcomes Partnership Common Data Model by 3 data managers; (2) interviews with experts on mental health research, reusable health data, and data quality; and (3) a rapid literature review. All barriers were categorized according to type as described previously, the affected FAIR principle, a category to add detail about the origin of the barrier, and whether a barrier was mental health specific. The barriers were assessed and ranked on impact with the data managers using the Delphi method. Results: Thirteen barriers were identified by the data managers, 7 were identified by the experts, and 30 barriers were extracted from the literature. This resulted in 45 unique barriers. The characteristics that were most assigned to the barriers were, respectively, external type (n=32/45; eg, organizational policy preventing the use of required software), tooling category (n=19/45; ie, software and databases), all FAIR principles (n=15/45), and not mental health specific (n=43/45). Consensus on ranking the scores of the barriers was reached after 2 rounds of the Delphi method. The most important recommendations to overcome the barriers are adding a FAIR data steward to the research team, accessible step-by-step guides, and ensuring sustainable funding for the implementation and long-term use of FAIR data. Conclusions: By systematically listing these barriers and providing recommendations, we intend to enhance the awareness of researchers and grant providers that making data FAIR demands specific expertise, available tooling, and proper investments. %R 10.2196/59113 %U https://mental.jmir.org/2024/1/e59113 %U https://doi.org/10.2196/59113 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 12 %N %P e60665 %T An Automatic and End-to-End System for Rare Disease Knowledge Graph Construction Based on Ontology-Enhanced Large Language Models: Development Study %A Cao,Lang %A Sun,Jimeng %A Cross,Adam %K rare disease %K clinical informatics %K LLM %K natural language processing %K machine learning %K artificial intelligence %K large language models %K data extraction %K ontologies %K knowledge graphs %K text mining %D 2024 %7 18.12.2024 %9 %J JMIR Med Inform %G English %X Background: Rare diseases affect millions worldwide but sometimes face limited research focus individually due to low prevalence. Many rare diseases do not have specific International Classification of Diseases, Ninth Edition (ICD-9) and Tenth Edition (ICD-10), codes and therefore cannot be reliably extracted from granular fields like “Diagnosis” and “Problem List” entries, which complicates tasks that require identification of patients with these conditions, including clinical trial recruitment and research efforts. Recent advancements in large language models (LLMs) have shown promise in automating the extraction of medical information, offering the potential to improve medical research, diagnosis, and management. However, most LLMs lack professional medical knowledge, especially concerning specific rare diseases, and cannot effectively manage rare disease data in its various ontological forms, making it unsuitable for these tasks. Objective: Our aim is to create an end-to-end system called automated rare disease mining (AutoRD), which automates the extraction of rare disease–related information from medical text, focusing on entities and their relations to other medical concepts, such as signs and symptoms. AutoRD integrates up-to-date ontologies with other structured knowledge and demonstrates superior performance in rare disease extraction tasks. We conducted various experiments to evaluate AutoRD’s performance, aiming to surpass common LLMs and traditional methods. Methods: AutoRD is a pipeline system that involves data preprocessing, entity extraction, relation extraction, entity calibration, and knowledge graph construction. We implemented this system using GPT-4 and medical knowledge graphs developed from the open-source Human Phenotype and Orphanet ontologies, using techniques such as chain-of-thought reasoning and prompt engineering. We quantitatively evaluated our system’s performance in entity extraction, relation extraction, and knowledge graph construction. The experiment used the well-curated dataset RareDis2023, which contains medical literature focused on rare disease entities and their relations, making it an ideal dataset for training and testing our methodology. Results: On the RareDis2023 dataset, AutoRD achieved an overall entity extraction F1-score of 56.1% and a relation extraction F1-score of 38.6%, marking a 14.4% improvement over the baseline LLM. Notably, the F1-score for rare disease entity extraction reached 83.5%, indicating high precision and recall in identifying rare disease mentions. These results demonstrate the effectiveness of integrating LLMs with medical ontologies in extracting complex rare disease information. Conclusions: AutoRD is an automated end-to-end system for extracting rare disease information from text to build knowledge graphs, addressing critical limitations of existing LLMs by improving identification of these diseases and connecting them to related clinical features. This work underscores the significant potential of LLMs in transforming health care, particularly in the rare disease domain. By leveraging ontology-enhanced LLMs, AutoRD constructs a robust medical knowledge base that incorporates up-to-date rare disease information, facilitating improved identification of patients and resulting in more inclusive research and trial candidacy efforts. %R 10.2196/60665 %U https://medinform.jmir.org/2024/1/e60665 %U https://doi.org/10.2196/60665 %0 Journal Article %@ 2369-2960 %I JMIR Publications %V 10 %N %P e59844 %T The University of California Study of Outcomes in Mothers and Infants (a Population-Based Research Resource): Retrospective Cohort Study %A Baer,Rebecca J %A Bandoli,Gretchen %A Jelliffe-Pawlowski,Laura %A Chambers,Christina D %+ Department of Pediatrics, University of California San Diego, 9500 Gilman Drive, La Jolla, CA, 92093, United States, 1 2063510850, rjbaer@ucsd.edu %K birth certificate %K vital statistics %K hospital discharge %K administrative data %K linkage %K pregnancy outcome %K birth outcome %K infant outcome %K adverse outcome %K preterm birth %K birth defects %K pregnancy %K prenatal %K California %K policy %K disparities %K children %K data collection %D 2024 %7 3.12.2024 %9 Original Paper %J JMIR Public Health Surveill %G English %X Background: Population-based databases are valuable for perinatal research. The California Department of Health Care Access and Information (HCAI) created a linked birth file covering the years 1991 through 2012. This file includes birth and fetal death certificate records linked to the hospital discharge records of the birthing person and infant. In 2019, the University of California Study of Outcomes in Mothers and Infants received approval to create similar linked birth files for births from 2011 onward, with 2 years of overlapping birth files to allow for linkage comparison. Objective: This paper aims to describe the University of California Study of Outcomes in Mothers and Infants linkage methodology, examine the linkage quality, and discuss the benefits and limitations of the approach. Methods: Live birth and fetal death certificates were linked to hospital discharge records for California infants between 2005 and 2020. The linkage algorithm includes variables such as birth hospital and date of birth, and linked record selection is made based on a “link score.” The complete file includes California Vital Statistics and HCAI hospital discharge records for the birthing person (1 y before delivery and 1 y after delivery) and infant (1 y after delivery). Linkage quality was assessed through a comparison of linked files and California Vital Statistics only. Comparisons were made to previous linked birth files created by the HCAI for 2011 and 2012. Results: Of the 8,040,000 live births, 7,427,738 (92.38%) California Vital Statistics live birth records were linked to HCAI records for birthing people, 7,680,597 (95.53%) birth records were linked to HCAI records for the infant, and 7,285,346 (90.61%) California Vital Statistics birth records were linked to HCAI records for both the birthing person and the infant. The linkage rates were 92.44% (976,526/1,056,358) for Asian and 86.27% (28,601/33,151) for Hawaiian or Pacific Islander birthing people. Of the 44,212 fetal deaths, 33,355 (75.44%) had HCAI records linked to the birthing person. When assessing variables in both California Vital Statistics and hospital records, the percentage was greatest when using both sources: the rates of gestational diabetes were 4.52% (329,128/7,285,345) in the California Vital Statistics records, 8.2% (597,534/7,285,345) in the HCAI records, and 9.34% (680,757/7,285,345) when using both data sources. Conclusions: We demonstrate that the linkage strategy used for this data platform is similar in linkage rate and linkage quality to the previous linked birth files created by the HCAI. The linkage provides higher rates of crucial variables, such as diabetes, compared to birth certificate records alone, although selection bias from the linkage must be considered. This platform has been used independently to examine health outcomes, has been linked to environmental datasets and residential data, and has been used to obtain and examine maternal serum and newborn blood spots. %M 39625748 %R 10.2196/59844 %U https://publichealth.jmir.org/2024/1/e59844 %U https://doi.org/10.2196/59844 %U http://www.ncbi.nlm.nih.gov/pubmed/39625748 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 12 %N %P e65784 %T The University Medicine Greifswald’s Trusted Third Party Dispatcher: State-of-the-Art Perspective Into Comprehensive Architectures and Complex Research Workflows %A Bialke,Martin %A Stahl,Dana %A Leddig,Torsten %A Hoffmann,Wolfgang %K architecture %K scalability %K trusted third party %K application %K security %K consent %K identifying data %K infrastructure %K modular %K software %K implementation %K user interface %K health platform %K data management %K data privacy %K health record %K electronic health record %K EHR %K pseudonymization %D 2024 %7 29.11.2024 %9 %J JMIR Med Inform %G English %X %R 10.2196/65784 %U https://medinform.jmir.org/2024/1/e65784 %U https://doi.org/10.2196/65784 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 12 %N %P e67429 %T Authors’ Reply: The University Medicine Greifswald’s Trusted Third Party Dispatcher: State-of-the-Art Perspective Into Comprehensive Architectures and Complex Research Workflows %A Wündisch,Eric %A Hufnagl,Peter %A Brunecker,Peter %A Meier zu Ummeln,Sophie %A Träger,Sarah %A Prasser,Fabian %A Weber,Joachim %K architecture %K scalability %K trusted third party %K application %K security %K consent %K identifying data %K infrastructure %K modular %K software %K implementation %K user interface %K health platform %K data management %K data privacy %K health record %K electronic health record %K EHR %K pseudonymization %D 2024 %7 29.11.2024 %9 %J JMIR Med Inform %G English %X %R 10.2196/67429 %U https://medinform.jmir.org/2024/1/e67429 %U https://doi.org/10.2196/67429 %0 Journal Article %@ 2369-2960 %I JMIR Publications %V 10 %N %P e64726 %T Strengthening the Backbone: Government-Academic Data Collaborations for Crisis Response %A Yang,Rick %A Yang,Alina %K data infrastructure %K data sharing %K cross-sector collaboration %K government-academic partnerships %K public health %K crisis response %D 2024 %7 28.11.2024 %9 %J JMIR Public Health Surveill %G English %X %R 10.2196/64726 %U https://publichealth.jmir.org/2024/1/e64726 %U https://doi.org/10.2196/64726 %0 Journal Article %@ 2369-2960 %I JMIR Publications %V 10 %N %P e66479 %T Authors’ Reply to: Strengthening the Backbone: Government-Academic Data Collaborations for Crisis Response %A Lee,Jian-Sin %A Tyler,Allison R B %A Veinot,Tiffany Christine %A Yakel,Elizabeth %K COVID-19 %K crisis response %K cross-sector collaboration %K data infrastructures %K data science %K data sharing %K pandemic %K public health informatics %D 2024 %7 28.11.2024 %9 %J JMIR Public Health Surveill %G English %X %R 10.2196/66479 %U https://publichealth.jmir.org/2024/1/e66479 %U https://doi.org/10.2196/66479 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e60878 %T Population Characteristics in Justice Health Research Based on PubMed Abstracts From 1963 to 2023: Text Mining Study %A Lukmanjaya,Wilson %A Butler,Tony %A Taflan,Patricia %A Simpson,Paul %A Ginnivan,Natasha %A Buchan,Iain %A Nenadic,Goran %A Karystianis,George %+ School of Population Health, University of New South Wales, Samuels Building, F25, Samuel Terry Ave, Kensington NSW, Sydney, 2052, Australia, 61 2 9385 3136, w.lukmanjaya@unsw.edu.au %K epidemiology %K PubMed %K criminology %K text mining %K justice health %K offending and incarcerated populations %K population characteristics %K open research %K health research %K text mining study %K epidemiological criminology %K public health %K justice systems %K bias %K population %K men %K women %K prison %K prisoner %K researcher %D 2024 %7 22.11.2024 %9 Original Paper %J JMIR Form Res %G English %X Background: The field of epidemiological criminology (or justice health research) has emerged in the past decade, studying the intersection between the public health and justice systems. To ensure research efforts are focused and equitable, it is important to reflect on the outputs in this area and address knowledge gaps. Objective: This study aimed to examine the characteristics of populations researched in a large sample of published outputs and identify research gaps and biases. Methods: A rule-based, text mining method was applied to 34,481 PubMed abstracts published from 1963 to 2023 to identify 4 population characteristics (sex, age, offender type, and nationality). Results: We evaluated our method in a random sample of 100 PubMed abstracts. Microprecision was 94.3%, with microrecall at 85.9% and micro–F1-score at 89.9% across the 4 characteristics. Half (n=17,039, 49.4%) of the 34,481 abstracts did not have any characteristic mentions and only 1.3% (n=443) reported sex, age, offender type, and nationality. From the 5170 (14.9%) abstracts that reported age, 3581 (69.3%) mentioned young people (younger than 18 years) and 3037 (58.7%) mentioned adults. Since 1990, studies reporting female-only populations increased, and in 2023, these accounted for almost half (105/216, 48.6%) of the research outputs, as opposed to 33.3% (72/216) for male-only populations. Nordic countries (Sweden, Norway, Finland, and Denmark) had the highest number of abstracts proportional to their incarcerated populations. Offenders with mental illness were the most common group of interest (840/4814, 17.4%), with an increase from 1990 onward. Conclusions: Research reporting on female populations increased, surpassing that involving male individuals, despite female individuals representing 5% of the incarcerated population; this suggests that male prisoners are underresearched. Although calls have been made for the justice health area to focus more on young people, our results showed that among the abstracts reporting age, most mentioned a population aged <18 years, reflecting a rise of youth involvement in the youth justice system. Those convicted of sex offenses and crimes relating to children were not as researched as the existing literature suggests, with a focus instead on populations with mental illness, whose rates rose steadily in the last 30 years. After adjusting for the size of the incarcerated population, Nordic countries have conducted proportionately the most research. Our findings highlight that despite the presence of several research reporting guidelines, justice health abstracts still do not adequately describe the investigated populations. Our study offers new insights in the field of justice health with implications for promoting diversity in the selection of research participants. %M 39576975 %R 10.2196/60878 %U https://formative.jmir.org/2024/1/e60878 %U https://doi.org/10.2196/60878 %U http://www.ncbi.nlm.nih.gov/pubmed/39576975 %0 Journal Article %@ 2369-2960 %I JMIR Publications %V 10 %N %P e63031 %T Assessing the Digital Advancement of Public Health Systems Using Indicators Published in Gray Literature: Narrative Review %A Maaß,Laura %A Badino,Manuel %A Iyamu,Ihoghosa %A Holl,Felix %+ SOCIUM Research Center on Inequality and Social Policy, University of Bremen, Mary-Somerville-Straße 3, Bremen, 28359, Germany, 49 421 218 58610, laura.maass@uni-bremen.de %K digital public health %K health system %K indicator %K interdisciplinary %K information and communications technology %K maturity assessment %K readiness assessment %K narrative review %K gray literature %K digital health %K mobile phone %D 2024 %7 20.11.2024 %9 Review %J JMIR Public Health Surveill %G English %X Background: Revealing the full potential of digital public health (DiPH) systems requires a wide-ranging tool to assess their maturity and readiness for emerging technologies. Although a variety of indices exist to assess digital health systems, questions arise about the inclusion of indicators of information and communications technology maturity and readiness, digital (health) literacy, and interest in DiPH tools by the society and workforce, as well as the maturity of the legal framework and the readiness of digitalized health systems. Existing tools frequently target one of these domains while overlooking the others. In addition, no review has yet holistically investigated the available national DiPH system maturity and readiness indicators using a multidisciplinary lens. Objective: We used a narrative review to map the landscape of DiPH system maturity and readiness indicators published in the gray literature. Methods: As original indicators were not published in scientific databases, we applied predefined search strings to the DuckDuckGo and Google search engines for 11 countries from all continents that had reached level 4 of 5 in the latest Global Digital Health Monitor evaluation. In addition, we searched the literature published by 19 international organizations for maturity and readiness indicators concerning DiPH. Results: Of the 1484 identified references, 137 were included, and they yielded 15,806 indicators. We deemed 286 indicators from 90 references relevant for DiPH system maturity and readiness assessments. The majority of these indicators (133/286, 46.5%) had legal relevance (targeting big data and artificial intelligence regulation, cybersecurity, national DiPH strategies, or health data governance), and the smallest number of indicators (37/286, 12.9%) were related to social domains (focusing on internet use and access, digital literacy and digital health literacy, or the use of DiPH tools, smartphones, and computers). Another 14.3% (41/286) of indicators analyzed the information and communications technology infrastructure (such as workforce, electricity, internet, and smartphone availability or interoperability standards). The remaining 26.2% (75/286) of indicators described the degree to which DiPH was applied (including health data architecture, storage, and access; the implementation of DiPH interventions; or the existence of interventions promoting health literacy and digital inclusion). Conclusions: Our work is the first to conduct a multidisciplinary analysis of the gray literature on DiPH maturity and readiness assessments. Although new methods for systematically researching gray literature are needed, our study holds the potential to develop more comprehensive tools for DiPH system assessments. We contributed toward a more holistic understanding of DiPH. Further examination is required to analyze the suitability and applicability of all identified indicators in diverse health care settings. By developing a standardized method to assess DiPH system maturity and readiness, we aim to foster informed decision-making among health care planners and practitioners to improve resource distribution and continue to drive innovation in health care delivery. %M 39566910 %R 10.2196/63031 %U https://publichealth.jmir.org/2024/1/e63031 %U https://doi.org/10.2196/63031 %U http://www.ncbi.nlm.nih.gov/pubmed/39566910 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e50235 %T The Challenges and Lessons Learned Building a New UK Infrastructure for Finding and Accessing Population-Wide COVID-19 Data for Research and Public Health Analysis: The CO-CONNECT Project %A Jefferson,Emily %A Milligan,Gordon %A Johnston,Jenny %A Mumtaz,Shahzad %A Cole,Christian %A Best,Joseph %A Giles,Thomas Charles %A Cox,Samuel %A Masood,Erum %A Horban,Scott %A Urwin,Esmond %A Beggs,Jillian %A Chuter,Antony %A Reilly,Gerry %A Morris,Andrew %A Seymour,David %A Hopkins,Susan %A Sheikh,Aziz %A Quinlan,Philip %+ Population Health and Genomics, School of Medicine, University of Dundee, The Health Informatics Centre, Ninewells Hospital and Medical School, Dundee, DD2 1FD, United Kingdom, 44 01382383943, e.r.jefferson@dundee.ac.uk %K COVID-19 %K infrastructure %K trusted research environments %K safe havens %K feasibility analysis %K cohort discovery %K federated analytics %K federated discovery %K lessons learned %K population wide %K data %K public health %K analysis %K CO-CONNECT %K challenges %K data transformation %D 2024 %7 20.11.2024 %9 Viewpoint %J J Med Internet Res %G English %X The COVID-19-Curated and Open Analysis and Research Platform (CO-CONNECT) project worked with 22 organizations across the United Kingdom to build a federated platform, enabling researchers to instantaneously and dynamically query federated datasets to find relevant data for their study. Finding relevant data takes time and effort, reducing the efficiency of research. Although data controllers could understand the value of such a system, there were significant challenges and delays in setting up the platform in response to COVID-19. This paper aims to present the challenges and lessons learned from the CO-CONNECT project to support other similar initiatives in the future. The project encountered many challenges, including the impacts of lockdowns on collaboration, understanding the new architecture, competing demands on people’s time during a pandemic, data governance approvals, different levels of technical capabilities, data transformation to a common data model, access to granular-level laboratory data, and how to engage public and patient representatives meaningfully on a highly technical project. To overcome these challenges, we developed a range of methods to support data partners such as explainer videos; regular, short, “touch base” videoconference calls; drop-in workshops; live demos; and a standardized technical onboarding documentation pack. A 4-stage data governance process emerged. The patient and public representatives were fully integrated team members. Persistence, patience, and understanding were key. We make 8 recommendations to change the landscape for future similar initiatives. The new architecture and processes developed are being built upon for non–COVID-19–related data, providing an infrastructural legacy. %M 39566065 %R 10.2196/50235 %U https://www.jmir.org/2024/1/e50235 %U https://doi.org/10.2196/50235 %U http://www.ncbi.nlm.nih.gov/pubmed/39566065 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e59439 %T Mitigating Cognitive Biases in Clinical Decision-Making Through Multi-Agent Conversations Using Large Language Models: Simulation Study %A Ke,Yuhe %A Yang,Rui %A Lie,Sui An %A Lim,Taylor Xin Yi %A Ning,Yilin %A Li,Irene %A Abdullah,Hairil Rizal %A Ting,Daniel Shu Wei %A Liu,Nan %+ Centre for Quantitative Medicine, Duke-NUS Medical School, 8 College Road, Singapore, 169857, Singapore, 65 66016503, liu.nan@duke-nus.edu.sg %K clinical decision-making %K cognitive bias %K generative artificial intelligence %K large language model %K multi-agent %D 2024 %7 19.11.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Cognitive biases in clinical decision-making significantly contribute to errors in diagnosis and suboptimal patient outcomes. Addressing these biases presents a formidable challenge in the medical field. Objective: This study aimed to explore the role of large language models (LLMs) in mitigating these biases through the use of the multi-agent framework. We simulate the clinical decision-making processes through multi-agent conversation and evaluate its efficacy in improving diagnostic accuracy compared with humans. Methods: A total of 16 published and unpublished case reports where cognitive biases have resulted in misdiagnoses were identified from the literature. In the multi-agent framework, we leveraged GPT-4 (OpenAI) to facilitate interactions among different simulated agents to replicate clinical team dynamics. Each agent was assigned a distinct role: (1) making the final diagnosis after considering the discussions, (2) acting as a devil’s advocate to correct confirmation and anchoring biases, (3) serving as a field expert in the required medical subspecialty, (4) facilitating discussions to mitigate premature closure bias, and (5) recording and summarizing findings. We tested varying combinations of these agents within the framework to determine which configuration yielded the highest rate of correct final diagnoses. Each scenario was repeated 5 times for consistency. The accuracy of the initial diagnoses and the final differential diagnoses were evaluated, and comparisons with human-generated answers were made using the Fisher exact test. Results: A total of 240 responses were evaluated (3 different multi-agent frameworks). The initial diagnosis had an accuracy of 0% (0/80). However, following multi-agent discussions, the accuracy for the top 2 differential diagnoses increased to 76% (61/80) for the best-performing multi-agent framework (Framework 4-C). This was significantly higher compared with the accuracy achieved by human evaluators (odds ratio 3.49; P=.002). Conclusions: The multi-agent framework demonstrated an ability to re-evaluate and correct misconceptions, even in scenarios with misleading initial investigations. In addition, the LLM-driven, multi-agent conversation framework shows promise in enhancing diagnostic accuracy in diagnostically challenging medical scenarios. %M 39561363 %R 10.2196/59439 %U https://www.jmir.org/2024/1/e59439 %U https://doi.org/10.2196/59439 %U http://www.ncbi.nlm.nih.gov/pubmed/39561363 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 12 %N %P e57754 %T Data Ownership in the AI-Powered Integrative Health Care Landscape %A Liu,Shuimei %A Guo,L Raymond %+ School of Juris Master, China University of Political Science and Law, 25 Xitucheng Rd, Hai Dian Qu, Beijing, 100088, China, 1 (734) 358 3970, shuiliu0802@alumni.iu.edu %K data ownership %K integrative healthcare %K artificial intelligence %K AI %K ownership %K data science %K governance %K consent %K privacy %K security %K access %K model %K framework %K transparency %D 2024 %7 19.11.2024 %9 Viewpoint %J JMIR Med Inform %G English %X In the rapidly advancing landscape of artificial intelligence (AI) within integrative health care (IHC), the issue of data ownership has become pivotal. This study explores the intricate dynamics of data ownership in the context of IHC and the AI era, presenting the novel Collaborative Healthcare Data Ownership (CHDO) framework. The analysis delves into the multifaceted nature of data ownership, involving patients, providers, researchers, and AI developers, and addresses challenges such as ambiguous consent, attribution of insights, and international inconsistencies. Examining various ownership models, including privatization and communization postulates, as well as distributed access control, data trusts, and blockchain technology, the study assesses their potential and limitations. The proposed CHDO framework emphasizes shared ownership, defined access and control, and transparent governance, providing a promising avenue for responsible and collaborative AI integration in IHC. This comprehensive analysis offers valuable insights into the complex landscape of data ownership in IHC and the AI era, potentially paving the way for ethical and sustainable advancements in data-driven health care. %M 39560980 %R 10.2196/57754 %U https://medinform.jmir.org/2024/1/e57754 %U https://doi.org/10.2196/57754 %U http://www.ncbi.nlm.nih.gov/pubmed/39560980 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 12 %N %P e53622 %T Distributed Statistical Analyses: A Scoping Review and Examples of Operational Frameworks Adapted to Health Analytics %A Camirand Lemyre,Félix %A Lévesque,Simon %A Domingue,Marie-Pier %A Herrmann,Klaus %A Ethier,Jean-François %K distributed algorithms %K generalized linear models %K horizontally partitioned data %K GLMs %K learning health systems %K distributed analysis %K federated analysis %K data science %K data custodians %K algorithms %K statistics %K synthesis %K review methods %K searches %K scoping %D 2024 %7 14.11.2024 %9 %J JMIR Med Inform %G English %X Background: Data from multiple organizations are crucial for advancing learning health systems. However, ethical, legal, and social concerns may restrict the use of standard statistical methods that rely on pooling data. Although distributed algorithms offer alternatives, they may not always be suitable for health frameworks. Objective: This study aims to support researchers and data custodians in three ways: (1) providing a concise overview of the literature on statistical inference methods for horizontally partitioned data, (2) describing the methods applicable to generalized linear models (GLMs) and assessing their underlying distributional assumptions, and (3) adapting existing methods to make them fully usable in health settings. Methods: A scoping review methodology was used for the literature mapping, from which methods presenting a methodological framework for GLM analyses with horizontally partitioned data were identified and assessed from the perspective of applicability in health settings. Statistical theory was used to adapt methods and derive the properties of the resulting estimators. Results: From the review, 41 articles were selected and 6 approaches were extracted to conduct standard GLM-based statistical analysis. However, these approaches assumed evenly and identically distributed data across nodes. Consequently, statistical procedures were derived to accommodate uneven node sample sizes and heterogeneous data distributions across nodes. Workflows and detailed algorithms were developed to highlight information sharing requirements and operational complexity. Conclusions: This study contributes to the field of health analytics by providing an overview of the methods that can be used with horizontally partitioned data by adapting these methods to the context of heterogeneous health data and clarifying the workflows and quantities exchanged by the methods discussed. Further analysis of the confidentiality preserved by these methods is needed to fully understand the risk associated with the sharing of summary statistics. %R 10.2196/53622 %U https://medinform.jmir.org/2024/1/e53622 %U https://doi.org/10.2196/53622 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e59634 %T An Electronic Medical Record–Based Prognostic Model for Inpatient Falls: Development and Internal-External Cross-Validation %A Parsons,Rex %A Blythe,Robin %A Cramb,Susanna %A Abdel-Hafez,Ahmad %A McPhail,Steven %+ Australian Centre for Health Services Innovation and Centre for Healthcare Transformation, School of Public Health and Social Work, Queensland University of Technology, 60 Musk Ave, Kelvin Grove, 4059, Australia, 61 31380905, rex.parsons@hdr.qut.edu.au %K clinical prediction model %K falls %K patient safety %K prognostic %K electronic medical record %K EMR %K intervention %K hospital %K risk assessment %K clinical decision %K support system %K in-hospital fall %K survival model %K inpatient falls %D 2024 %7 13.11.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Effective fall prevention interventions in hospitals require appropriate allocation of resources early in admission. To address this, fall risk prediction tools and models have been developed with the aim to provide fall prevention strategies to patients at high risk. However, fall risk assessment tools have typically been inaccurate for prediction, ineffective in prevention, and time-consuming to complete. Accurate, dynamic, individualized estimates of fall risk for admitted patients using routinely recorded data may assist in prioritizing fall prevention efforts. Objective: The objective of this study was to develop and validate an accurate and dynamic prognostic model for inpatient falls among a cohort of patients using routinely recorded electronic medical record data. Methods: We used routinely recorded data from 5 Australian hospitals to develop and internally-externally validate a prediction model for inpatient falls using a Cox proportional hazards model with time-varying covariates. The study cohort included patients admitted during 2018-2021 to any ward, with no age restriction. Predictors used in the model included admission-related administrative data, length of stay, and number of previous falls during the admission (updated every 12 hours up to 14 days after admission). Model calibration was assessed using Poisson regression and discrimination using the area under the time-dependent receiver operating characteristic curve. Results: There were 1,107,556 inpatient admissions, 6004 falls, and 5341 unique fallers. The area under the time-dependent receiver operating characteristic curve was 0.899 (95% CI 0.88-0.91) at 24 hours after admission and declined throughout admission (eg, 0.765, 95% CI 0.75-0.78 on the seventh day after admission). Site-dependent overestimation and underestimation of risk was observed on the calibration plots. Conclusions: Using a large dataset from multiple hospitals and robust methods to model development and validation, we developed a prognostic model for inpatient falls. It had high discrimination, suggesting the model has the potential for operationalization in clinical decision support for prioritizing inpatients for fall prevention. Performance was site dependent, and model recalibration may lead to improved performance. %M 39536309 %R 10.2196/59634 %U https://www.jmir.org/2024/1/e59634 %U https://doi.org/10.2196/59634 %U http://www.ncbi.nlm.nih.gov/pubmed/39536309 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e54335 %T Early Identification of Cognitive Impairment in Community Environments Through Modeling Subtle Inconsistencies in Questionnaire Responses: Machine Learning Model Development and Validation %A Gao,Hongxin %A Schneider,Stefan %A Hernandez,Raymond %A Harris,Jenny %A Maupin,Danny %A Junghaenel,Doerte U %A Kapteyn,Arie %A Stone,Arthur %A Zelinski,Elizabeth %A Meijer,Erik %A Lee,Pey-Jiuan %A Orriens,Bart %A Jin,Haomiao %+ School of Health Sciences, University of Surrey, Kate Granger Building, 30 Priestley Road, Guildford, GU2 7YH, United Kingdom, 44 7438534086, h.jin@surrey.ac.uk %K machine learning %K artificial intelligence %K cognitive impairments %K surveys and questionnaires %K community health services %K public health %K early identification %K elder care %K dementia %D 2024 %7 13.11.2024 %9 Original Paper %J JMIR Form Res %G English %X Background: The underdiagnosis of cognitive impairment hinders timely intervention of dementia. Health professionals working in the community play a critical role in the early detection of cognitive impairment, yet still face several challenges such as a lack of suitable tools, necessary training, and potential stigmatization. Objective: This study explored a novel application integrating psychometric methods with data science techniques to model subtle inconsistencies in questionnaire response data for early identification of cognitive impairment in community environments. Methods: This study analyzed questionnaire response data from participants aged 50 years and older in the Health and Retirement Study (waves 8-9, n=12,942). Predictors included low-quality response indices generated using the graded response model from four brief questionnaires (optimism, hopelessness, purpose in life, and life satisfaction) assessing aspects of overall well-being, a focus of health professionals in communities. The primary and supplemental predicted outcomes were current cognitive impairment derived from a validated criterion and dementia or mortality in the next ten years. Seven predictive models were trained, and the performance of these models was evaluated and compared. Results: The multilayer perceptron exhibited the best performance in predicting current cognitive impairment. In the selected four questionnaires, the area under curve values for identifying current cognitive impairment ranged from 0.63 to 0.66 and was improved to 0.71 to 0.74 when combining the low-quality response indices with age and gender for prediction. We set the threshold for assessing cognitive impairment risk in the tool based on the ratio of underdiagnosis costs to overdiagnosis costs, and a ratio of 4 was used as the default choice. Furthermore, the tool outperformed the efficiency of age or health-based screening strategies for identifying individuals at high risk for cognitive impairment, particularly in the 50- to 59-year and 60- to 69-year age groups. The tool is available on a portal website for the public to access freely. Conclusions: We developed a novel prediction tool that integrates psychometric methods with data science to facilitate “passive or backend” cognitive impairment assessments in community settings, aiming to promote early cognitive impairment detection. This tool simplifies the cognitive impairment assessment process, making it more adaptable and reducing burdens. Our approach also presents a new perspective for using questionnaire data: leveraging, rather than dismissing, low-quality data. %M 39536306 %R 10.2196/54335 %U https://formative.jmir.org/2024/1/e54335 %U https://doi.org/10.2196/54335 %U http://www.ncbi.nlm.nih.gov/pubmed/39536306 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 13 %N %P e58116 %T Combating Antimicrobial Resistance Through a Data-Driven Approach to Optimize Antibiotic Use and Improve Patient Outcomes: Protocol for a Mixed Methods Study %A Mayito,Jonathan %A Tumwine,Conrad %A Galiwango,Ronald %A Nuwamanya,Elly %A Nakasendwa,Suzan %A Hope,Mackline %A Kiggundu,Reuben %A Byonanebye,Dathan M %A Dhikusooka,Flavia %A Twemanye,Vivian %A Kambugu,Andrew %A Kakooza,Francis %+ Infectious Diseases Institute, College of Health Sciences, Makerere University, IDI-McKinnell Knowledge Centre, P.O. Box 22418, Kampala, 10208, Uganda, 256 0704976874, tconrad@idi.co.ug %K antimicrobial resistance %K AMR database %K AMR %K machine learning %K antimicrobial use %K artificial intelligence %K antimicrobial %K data-driven %K mixed-method %K patient outcome %K drug-resistant infections %K drug resistant %K surveillance data %K economic %K antibiotic %D 2024 %7 8.11.2024 %9 Protocol %J JMIR Res Protoc %G English %X Background: It is projected that drug-resistant infections will lead to 10 million deaths annually by 2050 if left unabated. Despite this threat, surveillance data from resource-limited settings are scarce and often lack antimicrobial resistance (AMR)–related clinical outcomes and economic burden. We aim to build an AMR and antimicrobial use (AMU) data warehouse, describe the trends of resistance and antibiotic use, determine the economic burden of AMR in Uganda, and develop a machine learning algorithm to predict AMR-related clinical outcomes. Objective: The overall objective of the study is to use data-driven approaches to optimize antibiotic use and combat antimicrobial-resistant infections in Uganda. We aim to (1) build a dynamic AMR and antimicrobial use and consumption (AMUC) data warehouse to support research in AMR and AMUC to inform AMR-related interventions and public health policy, (2) evaluate the trends in AMR and antibiotic use based on annual antibiotic and point prevalence survey data collected at 9 regional referral hospitals over a 5-year period, (3) develop a machine learning model to predict the clinical outcomes of patients with bacterial infectious syndromes due to drug-resistant pathogens, and (4) estimate the annual economic burden of AMR in Uganda using the cost-of-illness approach. Methods: We will conduct a study involving data curation, machine learning–based modeling, and cost-of-illness analysis using AMR and AMU data abstracted from procurement, human resources, and clinical records of patients with bacterial infectious syndromes at 9 regional referral hospitals in Uganda collected between 2018 and 2026. We will use data curation procedures, FLAIR (Findable, Linkable, Accessible, Interactable and Repeatable) principles, and role-based access control to build a robust and dynamic AMR and AMU data warehouse. We will also apply machine learning algorithms to model AMR-related clinical outcomes, advanced statistical analysis to study AMR and AMU trends, and cost-of-illness analysis to determine the AMR-related economic burden. Results: The study received funding from the Wellcome Trust through the Centers for Antimicrobial Optimisation Network (CAMO-Net) in April 2023. As of October 28, 2024, we completed data warehouse development, which is now under testing; completed data curation of the historical Fleming Fund surveillance data (2020-2023); and collected retrospective AMR records for 599 patients that contained clinical outcomes and cost-of-illness economic burden data across 9 surveillance sites for objectives 3 and 4, respectively. Conclusions: The data warehouse will promote access to rich and interlinked AMR and AMU data sets to answer AMR program and research questions using a wide evidence base. The AMR-related clinical outcomes model and cost data will facilitate improvement in the clinical management of AMR patients and guide resource allocation to support AMR surveillance and interventions. International Registered Report Identifier (IRRID): PRR1-10.2196/58116 %M 39514268 %R 10.2196/58116 %U https://www.researchprotocols.org/2024/1/e58116 %U https://doi.org/10.2196/58116 %U http://www.ncbi.nlm.nih.gov/pubmed/39514268 %0 Journal Article %@ 2369-3762 %I JMIR Publications %V 10 %N %P e53337 %T Leveraging the Electronic Health Record to Measure Resident Clinical Experiences and Identify Training Gaps: Development and Usability Study %A Bhavaraju,Vasudha L %A Panchanathan,Sarada %A Willis,Brigham C %A Garcia-Filion,Pamela %K clinical informatics %K electronic health record %K pediatric resident %K COVID-19 %K competence-based medical education %K pediatric %K children %K SARS-CoV-2 %K clinic %K urban %K diagnosis %K health informatics %K EHR %K individualized learning plan %D 2024 %7 6.11.2024 %9 %J JMIR Med Educ %G English %X Background: Competence-based medical education requires robust data to link competence with clinical experiences. The SARS-CoV-2 (COVID-19) pandemic abruptly altered the standard trajectory of clinical exposure in medical training programs. Residency program directors were tasked with identifying and addressing the resultant gaps in each trainee’s experiences using existing tools. Objective: This study aims to demonstrate a feasible and efficient method to capture electronic health record (EHR) data that measure the volume and variety of pediatric resident clinical experiences from a continuity clinic; generate individual-, class-, and graduate-level benchmark data; and create a visualization for learners to quickly identify gaps in clinical experiences. Methods: This pilot was conducted in a large, urban pediatric residency program from 2016 to 2022. Through consensus, 5 pediatric faculty identified diagnostic groups that pediatric residents should see to be competent in outpatient pediatrics. Information technology consultants used International Classification of Diseases, Tenth Revision (ICD-10) codes corresponding with each diagnostic group to extract EHR patient encounter data as an indicator of exposure to the specific diagnosis. The frequency (volume) and diagnosis types (variety) seen by active residents (classes of 2020‐2022) were compared with class and graduated resident (classes of 2016‐2019) averages. These data were converted to percentages and translated to a radar chart visualization for residents to quickly compare their current clinical experiences with peers and graduates. Residents were surveyed on the use of these data and the visualization to identify training gaps. Results: Patient encounter data about clinical experiences for 102 residents (N=52 graduates) were extracted. Active residents (n=50) received data reports with radar graphs biannually: 3 for the classes of 2020 and 2021 and 2 for the class of 2022. Radar charts distinctly demonstrated gaps in diagnoses exposure compared with classmates and graduates. Residents found the visualization useful in setting clinical and learning goals. Conclusions: This pilot describes an innovative method of capturing and presenting data about resident clinical experiences, compared with peer and graduate benchmarks, to identify learning gaps that may result from disruptions or modifications in medical training. This methodology can be aggregated across specialties and institutions and potentially inform competence-based medical education. %R 10.2196/53337 %U https://mededu.jmir.org/2024/1/e53337 %U https://doi.org/10.2196/53337 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 12 %N %P e58130 %T Electronic Health Record Data Quality and Performance Assessments: Scoping Review %A Penev,Yordan P %A Buchanan,Timothy R %A Ruppert,Matthew M %A Liu,Michelle %A Shekouhi,Ramin %A Guan,Ziyuan %A Balch,Jeremy %A Ozrazgat-Baslanti,Tezcan %A Shickel,Benjamin %A Loftus,Tyler J %A Bihorac,Azra %K electronic health record %K EHR %K record %K data quality %K data performance %K clinical informatics %K performance %K data science %K synthesis %K review methods %K review methodology %K search %K scoping %D 2024 %7 6.11.2024 %9 %J JMIR Med Inform %G English %X Background: Electronic health records (EHRs) have an enormous potential to advance medical research and practice through easily accessible and interpretable EHR-derived databases. Attainability of this potential is limited by issues with data quality (DQ) and performance assessment. Objective: This review aims to streamline the current best practices on EHR DQ and performance assessments as a replicable standard for researchers in the field. Methods: PubMed was systematically searched for original research articles assessing EHR DQ and performance from inception until May 7, 2023. Results: Our search yielded 26 original research articles. Most articles had 1 or more significant limitations, including incomplete or inconsistent reporting (n=6, 30%), poor replicability (n=5, 25%), and limited generalizability of results (n=5, 25%). Completeness (n=21, 81%), conformance (n=18, 69%), and plausibility (n=16, 62%) were the most cited indicators of DQ, while correctness or accuracy (n=14, 54%) was most cited for data performance, with context-specific supplementation by recency (n=7, 27%), fairness (n=6, 23%), stability (n=4, 15%), and shareability (n=2, 8%) assessments. Artificial intelligence–based techniques, including natural language data extraction, data imputation, and fairness algorithms, were demonstrated to play a rising role in improving both dataset quality and performance. Conclusions: This review highlights the need for incentivizing DQ and performance assessments and their standardization. The results suggest the usefulness of artificial intelligence–based techniques for enhancing DQ and performance to unlock the full potential of EHRs to improve medical research and practice. %R 10.2196/58130 %U https://medinform.jmir.org/2024/1/e58130 %U https://doi.org/10.2196/58130 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e59674 %T Patient Health Record Protection Beyond the Health Insurance Portability and Accountability Act: Mixed Methods Study %A Subramanian,Hemang %A Sengupta,Arijit %A Xu,Yilin %+ Florida International University, 11200 SW 8th Street, Miami, FL, 33199-2156, United States, 1 3053488446, hsubrama@fiu.edu %K security %K privacy %K security breach %K breach report %K health care %K health care infrastructure %K regulatory %K law enforcement %K Omnibus Rule %K qualitative analysis %K AI-generated data %K artificial intelligence %K difference-in-differences %K best practice %K data privacy %K safe practice %D 2024 %7 6.11.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: The security and privacy of health care information are crucial for maintaining the societal value of health care as a public good. However, governance over electronic health care data has proven inefficient, despite robust enforcement efforts. Both federal (HIPAA [Health Insurance Portability and Accountability Act]) and state regulations, along with the ombudsman rule, have not effectively reduced the frequency or impact of data breaches in the US health care system. While legal frameworks have bolstered data security, recent years have seen a concerning increase in breach incidents. This paper investigates common breach types and proposes best practices derived from the data as potential solutions. Objective: The primary aim of this study is to analyze health care and hospital breach data, comparing it against HIPAA compliance levels across states (spatial analysis) and the impact of the Omnibus Rule over time (temporal analysis). The goal is to establish guidelines for best practices in handling sensitive information within hospitals and clinical environments. Methods: The study used data from the Department of Health and Human Services on reported breaches, assessing the severity and impact of each breach type. We then analyzed secondary data to examine whether HIPAA’s storage and retention rule amendments have influenced security and privacy incidents across all 50 states. Finally, we conducted a qualitative analysis of textual data from vulnerability and breach reports to identify actionable best practices for health care settings. Results: Our findings indicate that hacking or IT incidents have the most significant impact on the number of individuals affected, highlighting this as a primary breach category. The overall difference-in-differences trend reveals no significant reduction in breach rates (P=.50), despite state-level regulations exceeding HIPAA requirements and the introduction of the ombudsman rule. This persistence in breach trends implies that even strengthened protections and additional guidelines have not effectively curbed the rising number of affected individuals. Through qualitative analysis, we identified 15 unique values and associated best practices from industry standards. Conclusions: Combining quantitative and qualitative insights, we propose the “SecureSphere framework” to enhance data security in health care institutions. This framework presents key security values structured in concentric circles: core values at the center and peripheral values around them. The core values include employee management, policy, procedures, and IT management. Peripheral values encompass the remaining security attributes that support these core elements. This structured approach provides a comprehensive security strategy for protecting patient health information and is designed to help health care organizations develop sustainable practices for data security. %M 39504550 %R 10.2196/59674 %U https://www.jmir.org/2024/1/e59674 %U https://doi.org/10.2196/59674 %U http://www.ncbi.nlm.nih.gov/pubmed/39504550 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 12 %N %P e54246 %T A New Natural Language Processing–Inspired Methodology (Detection, Initial Characterization, and Semantic Characterization) to Investigate Temporal Shifts (Drifts) in Health Care Data: Quantitative Study %A Paiva,Bruno %A Gonçalves,Marcos André %A da Rocha,Leonardo Chaves Dutra %A Marcolino,Milena Soriano %A Lana,Fernanda Cristina Barbosa %A Souza-Silva,Maira Viana Rego %A Almeida,Jussara M %A Pereira,Polianna Delfino %A de Andrade,Claudio Moisés Valiense %A Gomes,Angélica Gomides dos Reis %A Ferreira,Maria Angélica Pires %A Bartolazzi,Frederico %A Sacioto,Manuela Furtado %A Boscato,Ana Paula %A Guimarães-Júnior,Milton Henriques %A dos Reis,Priscilla Pereira %A Costa,Felício Roberto %A Jorge,Alzira de Oliveira %A Coelho,Laryssa Reis %A Carneiro,Marcelo %A Sales,Thaís Lorenna Souza %A Araújo,Silvia Ferreira %A Silveira,Daniel Vitório %A Ruschel,Karen Brasil %A Santos,Fernanda Caldeira Veloso %A Cenci,Evelin Paola de Almeida %A Menezes,Luanna Silva Monteiro %A Anschau,Fernando %A Bicalho,Maria Aparecida Camargos %A Manenti,Euler Roberto Fernandes %A Finger,Renan Goulart %A Ponce,Daniela %A de Aguiar,Filipe Carrilho %A Marques,Luiza Margoto %A de Castro,Luís César %A Vietta,Giovanna Grünewald %A Godoy,Mariana Frizzo de %A Vilaça,Mariana do Nascimento %A Morais,Vivian Costa %+ Computer Science Department, Universidade Federal de Minas Gerais, Belo Horizonte, Brazil, Street Daniel de Carvalho, 1846, apto 201, Belo Horizonte, 30431310, Brazil, 55 31999710134, angelfire7@gmail.com %K health care %K machine learning %K data drifts %K temporal drifts %D 2024 %7 28.10.2024 %9 Original Paper %J JMIR Med Inform %G English %X Background: Proper analysis and interpretation of health care data can significantly improve patient outcomes by enhancing services and revealing the impacts of new technologies and treatments. Understanding the substantial impact of temporal shifts in these data is crucial. For example, COVID-19 vaccination initially lowered the mean age of at-risk patients and later changed the characteristics of those who died. This highlights the importance of understanding these shifts for assessing factors that affect patient outcomes. Objective: This study aims to propose detection, initial characterization, and semantic characterization (DIS), a new methodology for analyzing changes in health outcomes and variables over time while discovering contextual changes for outcomes in large volumes of data. Methods: The DIS methodology involves 3 steps: detection, initial characterization, and semantic characterization. Detection uses metrics such as Jensen-Shannon divergence to identify significant data drifts. Initial characterization offers a global analysis of changes in data distribution and predictive feature significance over time. Semantic characterization uses natural language processing–inspired techniques to understand the local context of these changes, helping identify factors driving changes in patient outcomes. By integrating the outcomes from these 3 steps, our results can identify specific factors (eg, interventions and modifications in health care practices) that drive changes in patient outcomes. DIS was applied to the Brazilian COVID-19 Registry and the Medical Information Mart for Intensive Care, version IV (MIMIC-IV) data sets. Results: Our approach allowed us to (1) identify drifts effectively, especially using metrics such as the Jensen-Shannon divergence, and (2) uncover reasons for the decline in overall mortality in both the COVID-19 and MIMIC-IV data sets, as well as changes in the cooccurrence between different diseases and this particular outcome. Factors such as vaccination during the COVID-19 pandemic and reduced iatrogenic events and cancer-related deaths in MIMIC-IV were highlighted. The methodology also pinpointed shifts in patient demographics and disease patterns, providing insights into the evolving health care landscape during the study period. Conclusions: We developed a novel methodology combining machine learning and natural language processing techniques to detect, characterize, and understand temporal shifts in health care data. This understanding can enhance predictive algorithms, improve patient outcomes, and optimize health care resource allocation, ultimately improving the effectiveness of machine learning predictive algorithms applied to health care data. Our methodology can be applied to a variety of scenarios beyond those discussed in this paper. %M 39467275 %R 10.2196/54246 %U https://medinform.jmir.org/2024/1/e54246 %U https://doi.org/10.2196/54246 %U http://www.ncbi.nlm.nih.gov/pubmed/39467275 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e55531 %T Internet-Based Social Connections of Black American College Students in Pre–COVID-19 and Peri–COVID-19 Pandemic Periods: Network Analysis %A Lee,Eun %A Kim,Heejun %A Esener,Yildiz %A McCall,Terika %+ Department of Scientific Computing, Pukyong National University, 45, Yongso-ro, Nam-gu, Busan, 48513, Republic of Korea, 82 10 7356 7890, eunlee@pknu.ac.kr %K COVID-19 pandemic %K college students %K Black American %K African American %K social network analysis %K social media %K mental health %K depression %D 2024 %7 28.10.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: A global-scale pandemic, such as the COVID-19 pandemic, greatly impacted communities of color. Moreover, physical distancing recommendations during the height of the COVID-19 pandemic negatively affected people’s sense of social connection, especially among young individuals. More research is needed on the use of social media and communication about depression, with a specific focus on young Black Americans. Objective: This paper aims to examine whether there are any differences in social-networking characteristics before and during the pandemic periods (ie, pre–COVID-19 pandemic vs peri–COVID-19 pandemic) among the students of historically black colleges and universities (HBCUs). For the study, the researchers focus on the students who have posted a depression-related tweet or have retweeted such posts on their timeline and also those who have not made such tweets. This is done to understand the collective patterns of both groups. Methods: This paper analyzed the social networks on Twitter (currently known as X; X Corp) of HBCU students through comparing pre–COVID-19 and peri–COVID-19 pandemic data. The researchers quantified the structural properties, such as reciprocity, homophily, and communities, to test the differences in internet-based socializing patterns between the depression-related and non–depression related groups for the 2 periods. Results: During the COVID-19 pandemic period, the group with depression-related tweets saw an increase in internet-based friendships, with the average number of friends rising from 1194 (SD 528.14) to 1371 (SD 824.61; P<.001). Their mutual relationships strengthened (reciprocity: 0.78-0.8; P=.01), and they showed higher assortativity with other depression-related group members (0.6-0.7; P<.001). In a network with only HBCU students, internet-based and physical affiliation memberships aligned closely during the peri–COVID-19 pandemic period, with membership entropy decreasing from 1.0 to 0.5. While users without depression-related tweets engaged more on the internet with other users who shared physical affiliations, those who posted depression-related tweets maintained consistent entropy levels (modularity: 0.75-0.76). Compared with randomized networks before and during the COVID-19 pandemic (P<.001), the users also exhibited high homophily with other members who posted depression-related tweets. Conclusions: The findings of this study provided insight into the social media activities of HBCU students’ social networks and communication about depression on social media. Future social media interventions focused on the mental health of Black college students may focus on providing resources to students who communicate about depression. Efforts aimed at providing relevant resources and information to internet-based communities that share institutional affiliation may enhance access to social support, particularly for those who may not proactively seek assistance. This approach may contribute to increased social support for individuals within these communities, especially those with a limited social capacity. %M 39467280 %R 10.2196/55531 %U https://www.jmir.org/2024/1/e55531 %U https://doi.org/10.2196/55531 %U http://www.ncbi.nlm.nih.gov/pubmed/39467280 %0 Journal Article %@ 2562-7600 %I JMIR Publications %V 7 %N %P e62678 %T Advancing AI Data Ethics in Nursing: Future Directions for Nursing Practice, Research, and Education %A Ball Dunlap,Patricia A %A Michalowski,Martin %K artificial intelligence %K AI data ethics %K data-centric AI %K nurses %K nursing informatics %K machine learning %K data literacy %K health care AI %K responsible AI %D 2024 %7 25.10.2024 %9 %J JMIR Nursing %G English %X The ethics of artificial intelligence (AI) are increasingly recognized due to concerns such as algorithmic bias, opacity, trust issues, data security, and fairness. Specifically, machine learning algorithms, central to AI technologies, are essential in striving for ethically sound systems that mimic human intelligence. These technologies rely heavily on data, which often remain obscured within complex systems and must be prioritized for ethical collection, processing, and usage. The significance of data ethics in achieving responsible AI was first highlighted in the broader context of health care and subsequently in nursing. This viewpoint explores the principles of data ethics, drawing on relevant frameworks and strategies identified through a formal literature review. These principles apply to real-world and synthetic data in AI and machine-learning contexts. Additionally, the data-centric AI paradigm is briefly examined, emphasizing its focus on data quality and the ethical development of AI solutions that integrate human-centered domain expertise. The ethical considerations specific to nursing are addressed, including 4 recommendations for future directions in nursing practice, research, and education and 2 hypothetical nurse-focused ethical case studies. The primary objectives are to position nurses to actively participate in AI and data ethics, thereby contributing to creating high-quality and relevant data for machine learning applications. %R 10.2196/62678 %U https://nursing.jmir.org/2024/1/e62678 %U https://doi.org/10.2196/62678 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 12 %N %P e54653 %T Accelerating Evidence Synthesis in Observational Studies: Development of a Living Natural Language Processing–Assisted Intelligent Systematic Literature Review System %A Manion,Frank J %A Du,Jingcheng %A Wang,Dong %A He,Long %A Lin,Bin %A Wang,Jingqi %A Wang,Siwei %A Eckels,David %A Cervenka,Jan %A Fiduccia,Peter C %A Cossrow,Nicole %A Yao,Lixia %K machine learning %K deep learning %K natural language processing %K systematic literature review %K artificial intelligence %K software development %K data extraction %K epidemiology %D 2024 %7 23.10.2024 %9 %J JMIR Med Inform %G English %X Background: Systematic literature review (SLR), a robust method to identify and summarize evidence from published sources, is considered to be a complex, time-consuming, labor-intensive, and expensive task. Objective: This study aimed to present a solution based on natural language processing (NLP) that accelerates and streamlines the SLR process for observational studies using real-world data. Methods: We followed an agile software development and iterative software engineering methodology to build a customized intelligent end-to-end living NLP-assisted solution for observational SLR tasks. Multiple machine learning–based NLP algorithms were adopted to automate article screening and data element extraction processes. The NLP prediction results can be further reviewed and verified by domain experts, following the human-in-the-loop design. The system integrates explainable articificial intelligence to provide evidence for NLP algorithms and add transparency to extracted literature data elements. The system was developed based on 3 existing SLR projects of observational studies, including the epidemiology studies of human papillomavirus–associated diseases, the disease burden of pneumococcal diseases, and cost-effectiveness studies on pneumococcal vaccines. Results: Our Intelligent SLR Platform covers major SLR steps, including study protocol setting, literature retrieval, abstract screening, full-text screening, data element extraction from full-text articles, results summary, and data visualization. The NLP algorithms achieved accuracy scores of 0.86-0.90 on article screening tasks (framed as text classification tasks) and macroaverage F1 scores of 0.57-0.89 on data element extraction tasks (framed as named entity recognition tasks). Conclusions: Cutting-edge NLP algorithms expedite SLR for observational studies, thus allowing scientists to have more time to focus on the quality of data and the synthesis of evidence in observational studies. Aligning the living SLR concept, the system has the potential to update literature data and enable scientists to easily stay current with the literature related to observational studies prospectively and continuously. %R 10.2196/54653 %U https://medinform.jmir.org/2024/1/e54653 %U https://doi.org/10.2196/54653 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 12 %N %P e57569 %T A Generic Transformation Approach for Complex Laboratory Data Using the Fast Healthcare Interoperability Resources Mapping Language: Method Development and Implementation %A Kruse,Jesse %A Wiedekopf,Joshua %A Kock-Schoppenhauer,Ann-Kristin %A Essenwanger,Andrea %A Ingenerf,Josef %A Ulrich,Hannes %K FHIR %K StructureMaps %K FHIR mapping language %K laboratory data %K mapping %K standardization %K data science %K healthcare system %K HIS %K information system %K electronic healthcare record %K health care system %K electronic health record %K health information system %D 2024 %7 18.10.2024 %9 %J JMIR Med Inform %G English %X Background: Reaching meaningful interoperability between proprietary health care systems is a ubiquitous task in medical informatics, where communication servers are traditionally used for referring and transforming data from the source to target systems. The Mirth Connect Server, an open-source communication server, offers, in addition to the exchange functionality, functions for simultaneous manipulation of data. The standard Fast Healthcare Interoperability Resources (FHIR) has recently become increasingly prevalent in national health care systems. FHIR specifies its own standardized mechanisms for transforming data structures using StructureMaps and the FHIR mapping language (FML). Objective: In this study, a generic approach is developed, which allows for the application of declarative mapping rules defined using FML in an exchangeable manner. A transformation engine is required to execute the mapping rules. Methods: FHIR natively defines resources to support the conversion of instance data, such as an FHIR StructureMap. This resource encodes all information required to transform data from a source system to a target system. In our approach, this information is defined in an implementation-independent manner using FML. Once the mapping has been defined, executable Mirth channels are automatically generated from the resources containing the mapping in JavaScript format. These channels can then be deployed to the Mirth Connect Server. Results: The resulting tool is called FML2Mirth, a Java-based transformer that derives Mirth channels from detailed declarative mapping rules based on the underlying StructureMaps. Implementation of the translate functionality is provided by the integration of a terminology server, and to achieve conformity with existing profiles, validation via the FHIR validator is built in. The system was evaluated for its practical use by transforming Labordatenträger version 2 (LDTv.2) laboratory results into Medical Information Object (Medizinisches Informationsobjekt) laboratory reports in accordance with the National Association of Statutory Health Insurance Physicians’ specifications and into the HL7 (Health Level Seven) Europe Laboratory Report. The system could generate complex structures, but LDTv.2 lacks some information to fully comply with the specification. Conclusions: The tool for the auto-generation of Mirth channels was successfully presented. Our tests reveal the feasibility of using the complex structures of the mapping language in combination with a terminology server to transform instance data. Although the Mirth Server and the FHIR are well established in medical informatics, the combination offers space for more research, especially with regard to FML. Simultaneously, it can be stated that the mapping language still has implementation-related shortcomings that can be compensated by Mirth Connect as a base technology. %R 10.2196/57569 %U https://medinform.jmir.org/2024/1/e57569 %U https://doi.org/10.2196/57569 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e43954 %T Data Visualization Preferences in Remote Measurement Technology for Individuals Living With Depression, Epilepsy, and Multiple Sclerosis: Qualitative Study %A Simblett,Sara %A Dawe-Lane,Erin %A Gilpin,Gina %A Morris,Daniel %A White,Katie %A Erturk,Sinan %A Devonshire,Julie %A Lees,Simon %A Zormpas,Spyridon %A Polhemus,Ashley %A Temesi,Gergely %A Cummins,Nicholas %A Hotopf,Matthew %A Wykes,Til %A , %+ Institute of Psychiatry, Psychology and Neuroscience, King's College London, 16 De Crespigny Park, London, SE5 8AF, United Kingdom, 44 02078480762, sara.simblett@kcl.ac.uk %K mHealth %K qualitative %K technology %K depression %K epilepsy %K multiple sclerosis %K wearables %K devices %K smartphone apps %K application %K feedback %K users %K data %K data visualization %K mobile phone %D 2024 %7 18.10.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Remote measurement technology (RMT) involves the use of wearable devices and smartphone apps to measure health outcomes in everyday life. RMT with feedback in the form of data visual representations can facilitate self-management of chronic health conditions, promote health care engagement, and present opportunities for intervention. Studies to date focus broadly on multiple dimensions of service users’ design preferences and RMT user experiences (eg, health variables of perceived importance and perceived quality of medical advice provided) as opposed to data visualization preferences. Objective: This study aims to explore data visualization preferences and priorities in RMT, with individuals living with depression, those with epilepsy, and those with multiple sclerosis (MS). Methods: A triangulated qualitative study comparing and thematically synthesizing focus group discussions with user reviews of existing self-management apps and a systematic review of RMT data visualization preferences. A total of 45 people participated in 6 focus groups across the 3 health conditions (depression, n=17; epilepsy, n=11; and MS, n=17). Results: Thematic analysis validated a major theme around design preferences and recommendations and identified a further four minor themes: (1) data reporting, (2) impact of visualization, (3) moderators of visualization preferences, and (4) system-related factors and features. Conclusions: When used effectively, data visualizations are valuable, engaging components of RMT. Easy to use and intuitive data visualization design was lauded by individuals with neurological and psychiatric conditions. Apps design needs to consider the unique requirements of service users. Overall, this study offers RMT developers a comprehensive outline of the data visualization preferences of individuals living with depression, epilepsy, and MS. %M 39423366 %R 10.2196/43954 %U https://www.jmir.org/2024/1/e43954 %U https://doi.org/10.2196/43954 %U http://www.ncbi.nlm.nih.gov/pubmed/39423366 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e50730 %T Automatic Recommender System of Development Platforms for Smart Contract–Based Health Care Insurance Fraud Detection Solutions: Taxonomy and Performance Evaluation %A Kaafarani,Rima %A Ismail,Leila %A Zahwe,Oussama %+ Intelligent Distributed Computing and Systems Laboratory, Department of Computer Science and Software Engineering, College of Information Technology, United Arab Emirates University, 15551 Al Maqam Campus, Al Ain, Abu Dhabi, 15551, United Arab Emirates, 971 37673333 ext 5530, leila@uaeu.ac.ae %K blockchain %K blockchain development platform %K eHealth %K fraud detection %K fraud scenarios %K health care %K health care insurance %K health insurance %K machine learning %K medical informatics %K recommender system %K smart contract %K taxonomy %D 2024 %7 18.10.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Health care insurance fraud is on the rise in many ways, such as falsifying information and hiding third-party liability. This can result in significant losses for the medical health insurance industry. Consequently, fraud detection is crucial. Currently, companies employ auditors who manually evaluate records and pinpoint fraud. However, an automated and effective method is needed to detect fraud with the continually increasing number of patients seeking health insurance. Blockchain is an emerging technology and is constantly evolving to meet business needs. With its characteristics of immutability, transparency, traceability, and smart contracts, it demonstrates its potential in the health care domain. In particular, self-executable smart contracts are essential to reduce the costs associated with traditional paradigms, which are mostly manual, while preserving privacy and building trust among health care stakeholders, including the patient and the health insurance networks. However, with the proliferation of blockchain development platform options, selecting the right one for health care insurance can be difficult. This study addressed this void and developed an automated decision map recommender system to select the most effective blockchain platform for insurance fraud detection. Objective: This study aims to develop smart contracts for detecting health care insurance fraud efficiently. Therefore, we provided a taxonomy of fraud scenarios and implemented their detection using a blockchain platform that was suitable for health care insurance fraud detection. To automatically and efficiently select the best platform, we proposed and implemented a decision map–based recommender system. For developing the decision-map, we proposed a taxonomy of 102 blockchain platforms. Methods: We developed smart contracts for 12 fraud scenarios that we identified in the literature. We used the top 2 blockchain platforms selected by our proposed decision-making map–based recommender system, which is tailored for health care insurance fraud. The map used our taxonomy of 102 blockchain platforms classified according to their application domains. Results: The recommender system demonstrated that Hyperledger Fabric was the best blockchain platform for identifying health care insurance fraud. We validated our recommender system by comparing the performance of the top 2 platforms selected by our system. The blockchain platform taxonomy that we created revealed that 59 blockchain platforms are suitable for all application domains, 25 are suitable for financial services, and 18 are suitable for various application domains. We implemented fraud detection based on smart contracts. Conclusions: Our decision map recommender system, which was based on our proposed taxonomy of 102 platforms, automatically selected the top 2 platforms, which were Hyperledger Fabric and Neo, for the implementation of health care insurance fraud detection. Our performance evaluation of the 2 platforms indicated that Fabric surpassed Neo in all performance metrics, as depicted by our recommender system. We provided an implementation of fraud detection based on smart contracts. %M 39423005 %R 10.2196/50730 %U https://www.jmir.org/2024/1/e50730 %U https://doi.org/10.2196/50730 %U http://www.ncbi.nlm.nih.gov/pubmed/39423005 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e53024 %T Building and Sustaining Public Trust in Health Data Sharing for Musculoskeletal Research: Semistructured Interview and Focus Group Study %A Yusuf,Zainab K %A Dixon,William G %A Sharp,Charlotte %A Cook,Louise %A Holm,Søren %A Sanders,Caroline %+ Centre for Primary Care and Health Services Research, NIHR Greater Manchester Patient Safety Research Collaboration, University of Manchester, Williamson Building, Oxford Road, Manchester, M13 9PL, United Kingdom, 44 01612757619, caroline.sanders@manchester.ac.uk %K data sharing %K public trust %K musculoskeletal %K marginalized communities %K underserved communities %D 2024 %7 15.10.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Although many people are supportive of their deidentified health care data being used for research, concerns about privacy, safety, and security of health care data remain. There is low awareness about how data are used for research and related governance. Transparency about how health data are used for research is crucial for building public trust. One proposed solution is to ensure that affected communities are notified, particularly marginalized communities where there has previously been a lack of engagement and mistrust. Objective: This study aims to explore patient and public perspectives on the use of deidentified data from electronic health records for musculoskeletal research and to explore ways to build and sustain public trust in health data sharing for a research program (known as “the Data Jigsaw”) piloting new ways of using and analyzing electronic health data. Views and perspectives about how best to engage with local communities informed the development of a public notification campaign about the research. Methods: Qualitative methods data were generated from 20 semistructured interviews and 8 focus groups, comprising 48 participants in total with musculoskeletal conditions or symptoms, including 3 carers. A presentation about the use of health data for research and examples from the specific research projects within the program were used to trigger discussion. We worked in partnership with a patient and public involvement group throughout the research and cofacilitated wider community engagement. Results: Respondents were supportive of their health care data being shared for research purposes, but there was low awareness about how electronic health records are used for research. Security and governance concerns about data sharing were noted, including collaborations with external companies and accessing social care records. Project examples from the Data Jigsaw program were viewed positively after respondents knew more about how their data were being used to improve patient care. A range of different methods to build and sustain trust were deemed necessary by participants. Information was requested about: data management; individuals with access to the data (including any collaboration with external companies); the National Health Service’s national data opt-out; and research outcomes. It was considered important to enable in-person dialogue with affected communities in addition to other forms of information. Conclusions: The findings have emphasized the need for transparency and awareness about health data sharing for research, and the value of tailoring this to reflect current and local research where residents might feel more invested in the focus of research and the use of local records. Thus, the provision for targeted information within affected communities with accessible messages and community-based dialogue could help to build and sustain public trust. These findings can also be extrapolated to other conditions beyond musculoskeletal conditions, making the findings relevant to a much wider community. %M 39405526 %R 10.2196/53024 %U https://www.jmir.org/2024/1/e53024 %U https://doi.org/10.2196/53024 %U http://www.ncbi.nlm.nih.gov/pubmed/39405526 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 12 %N %P e49781 %T Case Identification of Depression in Inpatient Electronic Medical Records: Scoping Review %A Grothman,Allison %A Ma,William J %A Tickner,Kendra G %A Martin,Elliot A %A Southern,Danielle A %A Quan,Hude %K electronic medical records %K EMR phenotyping %K depression %K algorithms %K health services research %K precision public health %K inpatient %K clinical information %K phenotyping %K data accessibility %K scoping review %K disparity %K development %K phenotype %K PRISMA-ScR %K Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews %D 2024 %7 14.10.2024 %9 %J JMIR Med Inform %G English %X Background: Electronic medical records (EMRs) contain large amounts of detailed clinical information. Using medical record review to identify conditions within large quantities of EMRs can be time-consuming and inefficient. EMR-based phenotyping using machine learning and natural language processing algorithms is a continually developing area of study that holds potential for numerous mental health disorders. Objective: This review evaluates the current state of EMR-based case identification for depression and provides guidance on using current algorithms and constructing new ones. Methods: A scoping review of EMR-based algorithms for phenotyping depression was completed. This research encompassed studies published from January 2000 to May 2023. The search involved 3 databases: Embase, MEDLINE, and APA PsycInfo. This was carried out using selected keywords that fell into 3 categories: terms connected with EMRs, terms connected to case identification, and terms pertaining to depression. This study adhered to the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) guidelines. Results: A total of 20 papers were assessed and summarized in the review. Most of these studies were undertaken in the United States, accounting for 75% (15/20). The United Kingdom and Spain followed this, accounting for 15% (3/20) and 10% (2/20) of the studies, respectively. Both data-driven and clinical rule-based methodologies were identified. The development of EMR-based phenotypes and algorithms indicates the data accessibility permitted by each health system, which led to varying performance levels among different algorithms. Conclusions: Better use of structured and unstructured EMR components through techniques such as machine learning and natural language processing has the potential to improve depression phenotyping. However, more validation must be carried out to have confidence in depression case identification algorithms in general. %R 10.2196/49781 %U https://medinform.jmir.org/2024/1/e49781 %U https://doi.org/10.2196/49781 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 12 %N %P e62924 %T Use of SNOMED CT in Large Language Models: Scoping Review %A Chang,Eunsuk %A Sung,Sumi %+ Department of Nursing Science, Research Institute of Nursing Science, Chungbuk National University, 1 Chungdae-ro, Seowon-gu, Cheongju, 28644, Republic of Korea, 82 43 249 1731, sumisung@cbnu.ac.kr %K SNOMED CT %K ontology %K knowledge graph %K large language models %K natural language processing %K language models %D 2024 %7 7.10.2024 %9 Review %J JMIR Med Inform %G English %X Background: Large language models (LLMs) have substantially advanced natural language processing (NLP) capabilities but often struggle with knowledge-driven tasks in specialized domains such as biomedicine. Integrating biomedical knowledge sources such as SNOMED CT into LLMs may enhance their performance on biomedical tasks. However, the methodologies and effectiveness of incorporating SNOMED CT into LLMs have not been systematically reviewed. Objective: This scoping review aims to examine how SNOMED CT is integrated into LLMs, focusing on (1) the types and components of LLMs being integrated with SNOMED CT, (2) which contents of SNOMED CT are being integrated, and (3) whether this integration improves LLM performance on NLP tasks. Methods: Following the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) guidelines, we searched ACM Digital Library, ACL Anthology, IEEE Xplore, PubMed, and Embase for relevant studies published from 2018 to 2023. Studies were included if they incorporated SNOMED CT into LLM pipelines for natural language understanding or generation tasks. Data on LLM types, SNOMED CT integration methods, end tasks, and performance metrics were extracted and synthesized. Results: The review included 37 studies. Bidirectional Encoder Representations from Transformers and its biomedical variants were the most commonly used LLMs. Three main approaches for integrating SNOMED CT were identified: (1) incorporating SNOMED CT into LLM inputs (28/37, 76%), primarily using concept descriptions to expand training corpora; (2) integrating SNOMED CT into additional fusion modules (5/37, 14%); and (3) using SNOMED CT as an external knowledge retriever during inference (5/37, 14%). The most frequent end task was medical concept normalization (15/37, 41%), followed by entity extraction or typing and classification. While most studies (17/19, 89%) reported performance improvements after SNOMED CT integration, only a small fraction (19/37, 51%) provided direct comparisons. The reported gains varied widely across different metrics and tasks, ranging from 0.87% to 131.66%. However, some studies showed either no improvement or a decline in certain performance metrics. Conclusions: This review demonstrates diverse approaches for integrating SNOMED CT into LLMs, with a focus on using concept descriptions to enhance biomedical language understanding and generation. While the results suggest potential benefits of SNOMED CT integration, the lack of standardized evaluation methods and comprehensive performance reporting hinders definitive conclusions about its effectiveness. Future research should prioritize consistent reporting of performance comparisons and explore more sophisticated methods for incorporating SNOMED CT’s relational structure into LLMs. In addition, the biomedical NLP community should develop standardized evaluation frameworks to better assess the impact of ontology integration on LLM performance. %M 39374057 %R 10.2196/62924 %U https://medinform.jmir.org/2024/1/e62924 %U https://doi.org/10.2196/62924 %U http://www.ncbi.nlm.nih.gov/pubmed/39374057 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 12 %N %P e58085 %T Addressing Information Biases Within Electronic Health Record Data to Improve the Examination of Epidemiologic Associations With Diabetes Prevalence Among Young Adults: Cross-Sectional Study %A Conderino,Sarah %A Anthopolos,Rebecca %A Albrecht,Sandra S %A Farley,Shannon M %A Divers,Jasmin %A Titus,Andrea R %A Thorpe,Lorna E %K information bias %K electronic health record %K EHR %K epidemiologic method %K confounding factor %K diabetes %K epidemiology %K young adult %K cross-sectional study %K risk factor %K asthma %K race %K ethnicity %K diabetic %K diabetic adult %D 2024 %7 1.10.2024 %9 %J JMIR Med Inform %G English %X Background: Electronic health records (EHRs) are increasingly used for epidemiologic research to advance public health practice. However, key variables are susceptible to missing data or misclassification within EHRs, including demographic information or disease status, which could affect the estimation of disease prevalence or risk factor associations. Objective: In this paper, we applied methods from the literature on missing data and causal inference to assess whether we could mitigate information biases when estimating measures of association between potential risk factors and diabetes among a patient population of New York City young adults. Methods: We estimated the odds ratio (OR) for diabetes by race or ethnicity and asthma status using EHR data from NYU Langone Health. Methods from the missing data and causal inference literature were then applied to assess the ability to control for misclassification of health outcomes in the EHR data. We compared EHR-based associations with associations observed from 2 national health surveys, the Behavioral Risk Factor Surveillance System (BRFSS) and the National Health and Nutrition Examination Survey, representing traditional public health surveillance systems. Results: Observed EHR-based associations between race or ethnicity and diabetes were comparable to health survey-based estimates, but the association between asthma and diabetes was significantly overestimated (OREHR 3.01, 95% CI 2.86-3.18 vs ORBRFSS 1.23, 95% CI 1.09-1.40). Missing data and causal inference methods reduced information biases in these estimates, yielding relative differences from traditional estimates below 50% (ORMissingData 1.79, 95% CI 1.67-1.92 and ORCausal 1.42, 95% CI 1.34-1.51). Conclusions: Findings suggest that without bias adjustment, EHR analyses may yield biased measures of association, driven in part by subgroup differences in health care use. However, applying missing data or causal inference frameworks can help control for and, importantly, characterize residual information biases in these estimates. %R 10.2196/58085 %U https://medinform.jmir.org/2024/1/e58085 %U https://doi.org/10.2196/58085 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e53711 %T An Ontology to Bridge the Clinical Management of Patients and Public Health Responses for Strengthening Infectious Disease Surveillance: Design Science Study %A Lim,Sachiko %A Johannesson,Paul %+ Department of Computer and Systems Sciences, Stockholm University, Nodhuset, Borgarfjordsgatan 12, Kista, SE-164 07, Sweden, 46 0760968462, sachiko@dsv.su.se %K infectious disease %K ontology %K IoT %K infectious disease surveillance %K patient monitoring %K infectious disease management %K risk analysis %K early warning %K data integration %K semantic interoperability %K public health %D 2024 %7 26.9.2024 %9 Original Paper %J JMIR Form Res %G English %X Background: Novel surveillance approaches using digital technologies, including the Internet of Things (IoT), have evolved, enhancing traditional infectious disease surveillance systems by enabling real-time detection of outbreaks and reaching a wider population. However, disparate, heterogenous infectious disease surveillance systems often operate in silos due to a lack of interoperability. As a life-changing clinical use case, the COVID-19 pandemic has manifested that a lack of interoperability can severely inhibit public health responses to emerging infectious diseases. Interoperability is thus critical for building a robust ecosystem of infectious disease surveillance and enhancing preparedness for future outbreaks. The primary enabler for semantic interoperability is ontology. Objective: This study aims to design the IoT-based management of infectious disease ontology (IoT-MIDO) to enhance data sharing and integration of data collected from IoT-driven patient health monitoring, clinical management of individual patients, and disparate heterogeneous infectious disease surveillance. Methods: The ontology modeling approach was chosen for its semantic richness in knowledge representation, flexibility, ease of extensibility, and capability for knowledge inference and reasoning. The IoT-MIDO was developed using the basic formal ontology (BFO) as the top-level ontology. We reused the classes from existing BFO-based ontologies as much as possible to maximize the interoperability with other BFO-based ontologies and databases that rely on them. We formulated the competency questions as requirements for the ontology to achieve the intended goals. Results: We designed an ontology to integrate data from heterogeneous sources, including IoT-driven patient monitoring, clinical management of individual patients, and infectious disease surveillance systems. This integration aims to facilitate the collaboration between clinical care and public health domains. We also demonstrate five use cases using the simplified ontological models to show the potential applications of IoT-MIDO: (1) IoT-driven patient monitoring, risk assessment, early warning, and risk management; (2) clinical management of patients with infectious diseases; (3) epidemic risk analysis for timely response at the public health level; (4) infectious disease surveillance; and (5) transforming patient information into surveillance information. Conclusions: The development of the IoT-MIDO was driven by competency questions. Being able to answer all the formulated competency questions, we successfully demonstrated that our ontology has the potential to facilitate data sharing and integration for orchestrating IoT-driven patient health monitoring in the context of an infectious disease epidemic, clinical patient management, infectious disease surveillance, and epidemic risk analysis. The novelty and uniqueness of the ontology lie in building a bridge to link IoT-based individual patient monitoring and early warning based on patient risk assessment to infectious disease epidemic surveillance at the public health level. The ontology can also serve as a starting point to enable potential decision support systems, providing actionable insights to support public health organizations and practitioners in making informed decisions in a timely manner. %M 39325530 %R 10.2196/53711 %U https://formative.jmir.org/2024/1/e53711 %U https://doi.org/10.2196/53711 %U http://www.ncbi.nlm.nih.gov/pubmed/39325530 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e59505 %T Multimodal Large Language Models in Health Care: Applications, Challenges, and Future Outlook %A AlSaad,Rawan %A Abd-alrazaq,Alaa %A Boughorbel,Sabri %A Ahmed,Arfan %A Renault,Max-Antoine %A Damseh,Rafat %A Sheikh,Javaid %+ Weill Cornell Medicine-Qatar, Education City, Street 2700, Doha, Qatar, 974 44928830, rta4003@qatar-med.cornell.edu %K artificial intelligence %K large language models %K multimodal large language models %K multimodality %K multimodal generative artificial intelligence %K multimodal generative AI %K generative artificial intelligence %K generative AI %K health care %D 2024 %7 25.9.2024 %9 Viewpoint %J J Med Internet Res %G English %X In the complex and multidimensional field of medicine, multimodal data are prevalent and crucial for informed clinical decisions. Multimodal data span a broad spectrum of data types, including medical images (eg, MRI and CT scans), time-series data (eg, sensor data from wearable devices and electronic health records), audio recordings (eg, heart and respiratory sounds and patient interviews), text (eg, clinical notes and research articles), videos (eg, surgical procedures), and omics data (eg, genomics and proteomics). While advancements in large language models (LLMs) have enabled new applications for knowledge retrieval and processing in the medical field, most LLMs remain limited to processing unimodal data, typically text-based content, and often overlook the importance of integrating the diverse data modalities encountered in clinical practice. This paper aims to present a detailed, practical, and solution-oriented perspective on the use of multimodal LLMs (M-LLMs) in the medical field. Our investigation spanned M-LLM foundational principles, current and potential applications, technical and ethical challenges, and future research directions. By connecting these elements, we aimed to provide a comprehensive framework that links diverse aspects of M-LLMs, offering a unified vision for their future in health care. This approach aims to guide both future research and practical implementations of M-LLMs in health care, positioning them as a paradigm shift toward integrated, multimodal data–driven medical practice. We anticipate that this work will spark further discussion and inspire the development of innovative approaches in the next generation of medical M-LLM systems. %M 39321458 %R 10.2196/59505 %U https://www.jmir.org/2024/1/e59505 %U https://doi.org/10.2196/59505 %U http://www.ncbi.nlm.nih.gov/pubmed/39321458 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e56804 %T Public Maternal Health Dashboards in the United States: Descriptive Assessment %A Callaghan-Koru,Jennifer A %A Newman Chargois,Paige %A Tiwari,Tanvangi %A Brown,Clare C %A Greenfield,William %A Koru,Güneş %+ Fay W Boozman College of Public Health, University of Arkansas for Medical Sciences, 2708 S. 48th St., Springdale, AR, 72762, United States, 1 479 713 8102, jck@uams.edu %K dashboard %K maternal health %K data visualization %K data communication %K perinatal health %D 2024 %7 17.9.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Data dashboards have become more widely used for the public communication of health-related data, including in maternal health. Objective: We aimed to evaluate the content and features of existing publicly available maternal health dashboards in the United States. Methods: Through systematic searches, we identified 80 publicly available, interactive dashboards presenting US maternal health data. We abstracted and descriptively analyzed the technical features and content of identified dashboards across four areas: (1) scope and origins, (2) technical capabilities, (3) data sources and indicators, and (4) disaggregation capabilities. Where present, we abstracted and qualitatively analyzed dashboard text describing the purpose and intended audience. Results: Most reviewed dashboards reported state-level data (58/80, 72%) and were hosted on a state health department website (48/80, 60%). Most dashboards reported data from only 1 (33/80, 41%) or 2 (23/80, 29%) data sources. Key indicators, such as the maternal mortality rate (10/80, 12%) and severe maternal morbidity rate (12/80, 15%), were absent from most dashboards. Included dashboards used a range of data visualizations, and most allowed some disaggregation by time (65/80, 81%), geography (65/80, 81%), and race or ethnicity (55/80, 69%). Among dashboards that identified their audience (30/80, 38%), legislators or policy makers and public health agencies or organizations were the most common audiences. Conclusions: While maternal health dashboards have proliferated, their designs and features are not standard. This assessment of maternal health dashboards in the United States found substantial variation among dashboards, including inconsistent data sources, health indicators, and disaggregation capabilities. Opportunities to strengthen dashboards include integrating a greater number of data sources, increasing disaggregation capabilities, and considering end-user needs in dashboard design. %M 39288409 %R 10.2196/56804 %U https://www.jmir.org/2024/1/e56804 %U https://doi.org/10.2196/56804 %U http://www.ncbi.nlm.nih.gov/pubmed/39288409 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 12 %N %P e57853 %T PCEtoFHIR: Decomposition of Postcoordinated SNOMED CT Expressions for Storage as HL7 FHIR Resources %A Ohlsen,Tessa %A Ingenerf,Josef %A Essenwanger,Andrea %A Drenkhahn,Cora %+ IT Center for Clinical Research, University of Luebeck, Ratzeburger Allee 160, Luebeck, 23562, Germany, 49 45131015623, t.ohlsen@uni-luebeck.de %K SNOMED CT %K HL7 FHIR %K TermInfo %K postcoordination %K semantic interoperability %K terminology %K OWL %K semantic similarity %D 2024 %7 17.9.2024 %9 Original Paper %J JMIR Med Inform %G English %X Background: To ensure interoperability, both structural and semantic standards must be followed. For exchanging medical data between information systems, the structural standard FHIR (Fast Healthcare Interoperability Resources) has recently gained popularity. Regarding semantic interoperability, the reference terminology SNOMED Clinical Terms (SNOMED CT), as a semantic standard, allows for postcoordination, offering advantages over many other vocabularies. These postcoordinated expressions (PCEs) make SNOMED CT an expressive and flexible interlingua, allowing for precise coding of medical facts. However, this comes at the cost of increased complexity, as well as challenges in storage and processing. Additionally, the boundary between semantic (terminology) and structural (information model) standards becomes blurred, leading to what is known as the TermInfo problem. Although often viewed critically, the TermInfo overlap can also be explored for its potential benefits, such as enabling flexible transformation of parts of PCEs. Objective: In this paper, an alternative solution for storing PCEs is presented, which involves combining them with the FHIR data model. Ultimately, all components of a PCE should be expressible solely through precoordinated concepts that are linked to the appropriate elements of the information model. Methods: The approach involves storing PCEs decomposed into their components in alignment with FHIR resources. By utilizing the Web Ontology Language (OWL) to generate an OWL ClassExpression, and combining it with an external reasoner and semantic similarity measures, a precoordinated SNOMED CT concept that most accurately describes the PCE is identified as a Superconcept. In addition, the nonmatching attribute relationships between the Superconcept and the PCE are identified as the “Delta.” Once SNOMED CT attributes are manually mapped to FHIR elements, FHIRPath expressions can be defined for both the Superconcept and the Delta, allowing the identified precoordinated codes to be stored within FHIR resources. Results: A web application called PCEtoFHIR was developed to implement this approach. In a validation process with 600 randomly selected precoordinated concepts, the formal correctness of the generated OWL ClassExpressions was verified. Additionally, 33 PCEs were used for two separate validation tests. Based on these validations, it was demonstrated that a previously proposed semantic similarity calculation is suitable for determining the Superconcept. Additionally, the 33 PCEs were used to confirm the correct functioning of the entire approach. Furthermore, the FHIR StructureMaps were reviewed and deemed meaningful by FHIR experts. Conclusions: PCEtoFHIR offers services to decompose PCEs for storage within FHIR resources. When creating structure mappings for specific subdomains of SNOMED CT concepts (eg, allergies) to desired FHIR profiles, the use of SNOMED CT Expression Templates has proven highly effective. Domain experts can create templates with appropriate mappings, which can then be easily reused in a constrained manner by end users. %M 39287966 %R 10.2196/57853 %U https://medinform.jmir.org/2024/1/e57853 %U https://doi.org/10.2196/57853 %U http://www.ncbi.nlm.nih.gov/pubmed/39287966 %0 Journal Article %@ 2292-9495 %I JMIR Publications %V 11 %N %P e55182 %T Use of Creative Frameworks in Health Care to Solve Data and Information Problems: Scoping Review %A Mess,Elisabeth Veronica %A Kramer,Frank %A Krumme,Julia %A Kanelakis,Nico %A Teynor,Alexandra %+ Institute for Agile Software Development, Technical University of Applied Sciences Augsburg, An der Hochschule 1, Augsburg, 86161, Germany, 49 +49 821 5586 36, elisabethveronica.mess@hs-augsburg.de %K creative frameworks %K data and information problems %K data collection %K data processing %K data provision %K health care %K information visualization %K interdisciplinary teams %K user-centered design %K user-centered data design %K user-centric development %D 2024 %7 13.9.2024 %9 Review %J JMIR Hum Factors %G English %X Background: Digitization is vital for data management, especially in health care. However, problems still hinder health care stakeholders in their daily work while collecting, processing, and providing health data or information. Data are missing, incorrect, cannot be collected, or information is inadequately presented. These problems can be seen as data or information problems. A proven way to elicit requirements for (software) systems is by using creative frameworks (eg, user-centered design, design thinking, lean UX [user experience], or service design) or creative methods (eg, mind mapping, storyboarding, 6 thinking hats, or interaction room). However, to what extent they are used to solve data or information-related problems in health care is unclear. Objective: The primary objective of this scoping review is to investigate the use of creative frameworks in addressing data and information problems in health care. Methods: Following JBI guidelines and the PRISMA-ScR framework, this paper analyzes selected papers, answering whether creative frameworks addressed health care data or information problems. Focusing on data problems (elicitation or collection, processing) and information problems (provision or visualization), the review examined German and English papers published between 2018 and 2022 using keywords related to “data,” “design,” and “user-centered.” The database SCOPUS was used. Results: Of the 898 query results, only 23 papers described a data or information problem and a creative method to solve it. These were included in the follow-up analysis and divided into different problem categories: data collection (n=7), data processing (n=1), information visualization (n=11), and mixed problems meaning data and information problem present (n=4). The analysis showed that most identified problems fall into the information visualization category. This could indicate that creative frameworks are particularly suitable for solving information or visualization problems and less for other, more abstract areas such as data problems. The results also showed that most researchers applied a creative framework after they knew what specific (data or information) problem they had (n=21). Only a minority chose a creative framework to identify a problem and realize it was a data or information problem (n=2). In response to these findings, the paper discusses the need for a new approach that addresses health care data and information challenges by promoting collaboration, iterative feedback, and user-centered development. Conclusions: Although the potential of creative frameworks is undisputed, applying these in solving data and information problems is a minority. To harness this potential, a suitable method needs to be developed to support health care system stakeholders. This method could be the User-Centered Data Approach. %M 39269739 %R 10.2196/55182 %U https://humanfactors.jmir.org/2024/1/e55182 %U https://doi.org/10.2196/55182 %U http://www.ncbi.nlm.nih.gov/pubmed/39269739 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 13 %N %P e58705 %T Identifications of Similarity Metrics for Patients With Cancer: Protocol for a Scoping Review %A Manuilova,Iryna %A Bossenz,Jan %A Weise,Annemarie Bianka %A Boehm,Dominik %A Strantz,Cosima %A Unberath,Philipp %A Reimer,Niklas %A Metzger,Patrick %A Pauli,Thomas %A Werle,Silke D %A Schulze,Susann %A Hiemer,Sonja %A Ustjanzew,Arsenij %A Kestler,Hans A %A Busch,Hauke %A Brors,Benedikt %A Christoph,Jan %+ Junior Research Group (Bio-) Medical Data Science, Faculty of Medicine, Martin Luther University Halle-Wittenberg, Magdeburger Str 8, Halle (Saale), 06112, Germany, 49 345 557 2651, Iryna.Manuilova@uk-halle.de %K patient similarity %K cancer research %K patient similarity applications %K precision medicine %K cancer similarity metrics %K scoping review protocol %D 2024 %7 4.9.2024 %9 Protocol %J JMIR Res Protoc %G English %X Background: Understanding the similarities of patients with cancer is essential to advancing personalized medicine, improving patient outcomes, and developing more effective and individualized treatments. It enables researchers to discover important patterns, biomarkers, and treatment strategies that can have a significant impact on cancer research and oncology. In addition, the identification of previously successfully treated patients supports oncologists in making treatment decisions for a new patient who is clinically or molecularly similar to the previous patient. Objective: The planned review aims to systematically summarize, map, and describe existing evidence to understand how patient similarity is defined and used in cancer research and clinical care. Methods: To systematically identify relevant studies and to ensure reproducibility and transparency of the review process, a comprehensive literature search will be conducted in several bibliographic databases, including Web of Science, PubMed, LIVIVIVO, and MEDLINE, covering the period from 1998 to February 2024. After the initial duplicate deletion phase, a study selection phase will be applied using Rayyan, which consists of 3 distinct steps: title and abstract screening, disagreement resolution, and full-text screening. To ensure the integrity and quality of the selection process, each of these steps is preceded by a pilot testing phase. This methodological process will culminate in the presentation of the final research results in a structured form according to the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) flowchart. The protocol has been registered in the Journal of Medical Internet Research. Results: This protocol outlines the methodologies used in conducting the scoping review. A search of the specified electronic databases and after removing duplicates resulted in 1183 unique records. As of March 2024, the review process has moved to the full-text evaluation phase. At this stage, data extraction will be conducted using a pretested chart template. Conclusions: The scoping review protocol, centered on these main concepts, aims to systematically map the available evidence on patient similarity among patients with cancer. By defining the types of data sources, approaches, and methods used in the field, and aligning these with the research questions, the review will provide a foundation for future research and clinical application in personalized cancer care. This protocol will guide the literature search, data extraction, and synthesis of findings to achieve the review’s objectives. International Registered Report Identifier (IRRID): DERR1-10.2196/58705 %M 39230952 %R 10.2196/58705 %U https://www.researchprotocols.org/2024/1/e58705 %U https://doi.org/10.2196/58705 %U http://www.ncbi.nlm.nih.gov/pubmed/39230952 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e51297 %T Provenance Information for Biomedical Data and Workflows: Scoping Review %A Gierend,Kerstin %A Krüger,Frank %A Genehr,Sascha %A Hartmann,Francisca %A Siegel,Fabian %A Waltemath,Dagmar %A Ganslandt,Thomas %A Zeleke,Atinkut Alamirrew %+ Department of Biomedical Informatics, Mannheim Institute for intelligent Systems in Medicine, Medical Faculty Mannheim, Heidelberg University, Theodor-Kutzer-Ufer 1-3, Mannheim, 68167, Germany, 49 621383 ext 8087, kerstin.gierend@medma.uni-heidelberg.de %K provenance %K biomedical research %K data management %K scoping review %K health care data %K software life cycle %D 2024 %7 23.8.2024 %9 Review %J J Med Internet Res %G English %X Background: The record of the origin and the history of data, known as provenance, holds importance. Provenance information leads to higher interpretability of scientific results and enables reliable collaboration and data sharing. However, the lack of comprehensive evidence on provenance approaches hinders the uptake of good scientific practice in clinical research. Objective: This scoping review aims to identify approaches and criteria for provenance tracking in the biomedical domain. We reviewed the state-of-the-art frameworks, associated artifacts, and methodologies for provenance tracking. Methods: This scoping review followed the methodological framework developed by Arksey and O’Malley. We searched the PubMed and Web of Science databases for English-language articles published from 2006 to 2022. Title and abstract screening were carried out by 4 independent reviewers using the Rayyan screening tool. A majority vote was required for consent on the eligibility of papers based on the defined inclusion and exclusion criteria. Full-text reading and screening were performed independently by 2 reviewers, and information was extracted into a pretested template for the 5 research questions. Disagreements were resolved by a domain expert. The study protocol has previously been published. Results: The search resulted in a total of 764 papers. Of 624 identified, deduplicated papers, 66 (10.6%) studies fulfilled the inclusion criteria. We identified diverse provenance-tracking approaches ranging from practical provenance processing and managing to theoretical frameworks distinguishing diverse concepts and details of data and metadata models, provenance components, and notations. A substantial majority investigated underlying requirements to varying extents and validation intensities but lacked completeness in provenance coverage. Mostly, cited requirements concerned the knowledge about data integrity and reproducibility. Moreover, these revolved around robust data quality assessments, consistent policies for sensitive data protection, improved user interfaces, and automated ontology development. We found that different stakeholder groups benefit from the availability of provenance information. Thereby, we recognized that the term provenance is subjected to an evolutionary and technical process with multifaceted meanings and roles. Challenges included organizational and technical issues linked to data annotation, provenance modeling, and performance, amplified by subsequent matters such as enhanced provenance information and quality principles. Conclusions: As data volumes grow and computing power increases, the challenge of scaling provenance systems to handle data efficiently and assist complex queries intensifies, necessitating automated and scalable solutions. With rising legal and scientific demands, there is an urgent need for greater transparency in implementing provenance systems in research projects, despite the challenges of unresolved granularity and knowledge bottlenecks. We believe that our recommendations enable quality and guide the implementation of auditable and measurable provenance approaches as well as solutions in the daily tasks of biomedical scientists. International Registered Report Identifier (IRRID): RR2-10.2196/31750 %M 39178413 %R 10.2196/51297 %U https://www.jmir.org/2024/1/e51297 %U https://doi.org/10.2196/51297 %U http://www.ncbi.nlm.nih.gov/pubmed/39178413 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e58502 %T Transforming Digital Phenotyping Raw Data Into Actionable Biomarkers, Quality Metrics, and Data Visualizations Using Cortex Software Package: Tutorial %A Burns,James %A Chen,Kelly %A Flathers,Matthew %A Currey,Danielle %A Macrynikola,Natalia %A Vaidyam,Aditya %A Langholm,Carsten %A Barnett,Ian %A Byun,Andrew (Jin Soo) %A Lane,Erlend %A Torous,John %+ Division of Digital Psychiatry, Beth Israel Deaconess Medical Center, Harvard Medical School, 330 Brookline Ave, Boston, MA, 02215, United States, 1 6176676700, jtorous@bidmc.harvard.edu %K digital phenotyping %K mental health %K data visualization %K data analysis %K smartphones %K smartphone %K Cortex %K open-source %K data processing %K mindLAMP %K app %K apps %K data set %K clinical %K real world %K methodology %K mobile phone %D 2024 %7 23.8.2024 %9 Tutorial %J J Med Internet Res %G English %X As digital phenotyping, the capture of active and passive data from consumer devices such as smartphones, becomes more common, the need to properly process the data and derive replicable features from it has become paramount. Cortex is an open-source data processing pipeline for digital phenotyping data, optimized for use with the mindLAMP apps, which is used by nearly 100 research teams across the world. Cortex is designed to help teams (1) assess digital phenotyping data quality in real time, (2) derive replicable clinical features from the data, and (3) enable easy-to-share data visualizations. Cortex offers many options to work with digital phenotyping data, although some common approaches are likely of value to all teams using it. This paper highlights the reasoning, code, and example steps necessary to fully work with digital phenotyping data in a streamlined manner. Covering how to work with the data, assess its quality, derive features, and visualize findings, this paper is designed to offer the reader the knowledge and skills to apply toward analyzing any digital phenotyping data set. More specifically, the paper will teach the reader the ins and outs of the Cortex Python package. This includes background information on its interaction with the mindLAMP platform, some basic commands to learn what data can be pulled and how, and more advanced use of the package mixed with basic Python with the goal of creating a correlation matrix. After the tutorial, different use cases of Cortex are discussed, along with limitations. Toward highlighting clinical applications, this paper also provides 3 easy ways to implement examples of Cortex use in real-world settings. By understanding how to work with digital phenotyping data and providing ready-to-deploy code with Cortex, the paper aims to show how the new field of digital phenotyping can be both accessible to all and rigorous in methodology. %M 39178032 %R 10.2196/58502 %U https://www.jmir.org/2024/1/e58502 %U https://doi.org/10.2196/58502 %U http://www.ncbi.nlm.nih.gov/pubmed/39178032 %0 Journal Article %@ 1929-073X %I JMIR Publications %V 13 %N %P e53821 %T Emerging Indications for Hyperbaric Oxygen Treatment: Registry Cohort Study %A Tanaka,Hideaki L %A Rees,Judy R %A Zhang,Ziyin %A Ptak,Judy A %A Hannigan,Pamela M %A Silverman,Elaine M %A Peacock,Janet L %A Buckey,Jay C %A , %+ Geisel School of Medicine, Dartmouth College, 1 Medical Center Drive, Lebanon, NH, 03756, United States, 1 603 646 5328, jay.buckey@dartmouth.edu %K hyperbaric oxygen %K inflammatory bowel disease %K calciphylaxis %K post–COVID-19 condition %K PCC %K postacute sequelae of COVID-19 %K PASC %K infected implanted hardware %K hypospadias %K frostbite %K facial filler %K pyoderma gangrenosum %D 2024 %7 20.8.2024 %9 Original Paper %J Interact J Med Res %G English %X Background: Hyperbaric oxygen (HBO2) treatment is used across a range of medical specialties for a variety of applications, particularly where hypoxia and inflammation are important contributors. Because of its hypoxia-relieving and anti-inflammatory effects HBO2 may be useful for new indications not currently approved by the Undersea and Hyperbaric Medical Society. Identifying these new applications for HBO2 is difficult because individual centers may only treat a few cases and not track the outcomes consistently. The web-based International Multicenter Registry for Hyperbaric Oxygen Therapy captures prospective outcome data for patients treated with HBO2 therapy. These data can then be used to identify new potential applications for HBO2, which has relevance for a range of medical specialties. Objective: Although hyperbaric medicine has established indications, new ones continue to emerge. One objective of this registry study was to identify cases where HBO2 has been used for conditions falling outside of current Undersea and Hyperbaric Medical Society–approved indications and present outcome data for them. Methods: This descriptive study used data from a web-based, multicenter, international registry of patients treated with HBO2. Participating centers agree to collect data on all patients treated using standard outcome measures, and individual centers send deidentified data to the central registry. HBO2 treatment programs in the United States, the United Kingdom, and Australia participate. Demographic, outcome, complication, and treatment data, including pre- and posttreatment quality of life questionnaires (EQ-5D-5L) were collected for individuals referred for HBO2 treatment. Results: Out of 9726 patient entries, 378 (3.89%) individuals were treated for 45 emerging indications. Post–COVID-19 condition (PCC; also known as postacute sequelae of COVID-19; 149/378, 39.4%), ulcerative colitis (47/378, 12.4%), and Crohn disease (40/378, 10.6%) accounted for 62.4% (n=236) of the total cases. Calciphylaxis (20/378, 5.3%), frostbite (18/378, 4.8%), and peripheral vascular disease–related wounds (12/378, 3.2%) accounted for a further 13.2% (n=50). Patients with PCC reported significant improvement on the Neurobehavioral Symptom Inventory (NSI score: pretreatment=30.6; posttreatment=14.4; P<.001). Patients with Crohn disease reported significantly improved quality of life (EQ-5D score: pretreatment=53.8; posttreatment=68.8), and 5 (13%) reported closing a fistula. Patients with ulcerative colitis and complete pre- and post-HBO2 data reported improved quality of life and lower scores on a bowel questionnaire examining frequency, blood, pain, and urgency. A subset of patients with calciphylaxis and arterial ulcers also reported improvement. Conclusions: HBO2 is being used for a wide range of possible applications across various medical specialties for its hypoxia-relieving and anti-inflammatory effects. Results show statistically significant improvements in patient-reported outcomes for inflammatory bowel disease and PCC. HBO2 is also being used for frostbite, pyoderma gangrenosum, pterygium, hypospadias repair, and facial filler procedures. Other indications show evidence for improvement, and the case series for all indications is growing in the registry. International Registered Report Identifier (IRRID): RR2-10.2196/18857 %M 39078624 %R 10.2196/53821 %U https://www.i-jmr.org/2024/1/e53821 %U https://doi.org/10.2196/53821 %U http://www.ncbi.nlm.nih.gov/pubmed/39078624 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 12 %N %P e58548 %T Bridging Real-World Data Gaps: Connecting Dots Across 10 Asian Countries %A Julian,Guilherme Silva %A Shau,Wen-Yi %A Chou,Hsu-Wen %A Setia,Sajita %+ Executive Office, Transform Medical Communications Limited, 184 Glasgow Street, Wanganui, 4500, New Zealand, 64 276175433, sajita.setia@transform-medcomms.com %K Asia %K electronic medical records %K EMR %K health care databases %K health technology assessment %K HTA %K real-world data %K real-world evidence %D 2024 %7 15.8.2024 %9 Viewpoint %J JMIR Med Inform %G English %X The economic trend and the health care landscape are rapidly evolving across Asia. Effective real-world data (RWD) for regulatory and clinical decision-making is a crucial milestone associated with this evolution. This necessitates a critical evaluation of RWD generation within distinct nations for the use of various RWD warehouses in the generation of real-world evidence (RWE). In this article, we outline the RWD generation trends for 2 contrasting nation archetypes: “Solo Scholars”—nations with relatively self-sufficient RWD research systems—and “Global Collaborators”—countries largely reliant on international infrastructures for RWD generation. The key trends and patterns in RWD generation, country-specific insights into the predominant databases used in each country to produce RWE, and insights into the broader landscape of RWD database use across these countries are discussed. Conclusively, the data point out the heterogeneous nature of RWD generation practices across 10 different Asian nations and advocate for strategic enhancements in data harmonization. The evidence highlights the imperative for improved database integration and the establishment of standardized protocols and infrastructure for leveraging electronic medical records (EMR) in streamlining RWD acquisition. The clinical data analysis and reporting system of Hong Kong is an excellent example of a successful EMR system that showcases the capacity of integrated robust EMR platforms to consolidate and produce diverse RWE. This, in turn, can potentially reduce the necessity for reliance on numerous condition-specific local and global registries or limited and largely unavailable medical insurance or claims databases in most Asian nations. Linking health technology assessment processes with open data initiatives such as the Observational Medical Outcomes Partnership Common Data Model and the Observational Health Data Sciences and Informatics could enable the leveraging of global data resources to inform local decision-making. Advancing such initiatives is crucial for reinforcing health care frameworks in resource-limited settings and advancing toward cohesive, evidence-driven health care policy and improved patient outcomes in the region. %M 39026427 %R 10.2196/58548 %U https://medinform.jmir.org/2024/1/e58548 %U https://doi.org/10.2196/58548 %U http://www.ncbi.nlm.nih.gov/pubmed/39026427 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 12 %N %P e49542 %T Transforming Primary Care Data Into the Observational Medical Outcomes Partnership Common Data Model: Development and Usability Study %A Fruchart,Mathilde %A Quindroit,Paul %A Jacquemont,Chloé %A Beuscart,Jean-Baptiste %A Calafiore,Matthieu %A Lamer,Antoine %K data reuse %K Observational Medical Outcomes Partnership %K common data model %K data warehouse %K reproducible research %K primary care %K dashboard %K electronic health record %K patient tracking system %K patient monitoring %K EHR %K primary care data %D 2024 %7 13.8.2024 %9 %J JMIR Med Inform %G English %X Background: Patient-monitoring software generates a large amount of data that can be reused for clinical audits and scientific research. The Observational Health Data Sciences and Informatics (OHDSI) consortium developed the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) to standardize electronic health record data and promote large-scale observational and longitudinal research. Objective: This study aimed to transform primary care data into the OMOP CDM format. Methods: We extracted primary care data from electronic health records at a multidisciplinary health center in Wattrelos, France. We performed structural mapping between the design of our local primary care database and the OMOP CDM tables and fields. Local French vocabularies concepts were mapped to OHDSI standard vocabularies. To validate the implementation of primary care data into the OMOP CDM format, we applied a set of queries. A practical application was achieved through the development of a dashboard. Results: Data from 18,395 patients were implemented into the OMOP CDM, corresponding to 592,226 consultations over a period of 20 years. A total of 18 OMOP CDM tables were implemented. A total of 17 local vocabularies were identified as being related to primary care and corresponded to patient characteristics (sex, location, year of birth, and race), units of measurement, biometric measures, laboratory test results, medical histories, and drug prescriptions. During semantic mapping, 10,221 primary care concepts were mapped to standard OHDSI concepts. Five queries were used to validate the OMOP CDM by comparing the results obtained after the completion of the transformations with the results obtained in the source software. Lastly, a prototype dashboard was developed to visualize the activity of the health center, the laboratory test results, and the drug prescription data. Conclusions: Primary care data from a French health care facility have been implemented into the OMOP CDM format. Data concerning demographics, units, measurements, and primary care consultation steps were already available in OHDSI vocabularies. Laboratory test results and drug prescription data were mapped to available vocabularies and structured in the final model. A dashboard application provided health care professionals with feedback on their practice. %R 10.2196/49542 %U https://medinform.jmir.org/2024/1/e49542 %U https://doi.org/10.2196/49542 %0 Journal Article %@ 2369-2960 %I JMIR Publications %V 10 %N %P e59924 %T Developing the DIGIFOOD Dashboard to Monitor the Digitalization of Local Food Environments: Interdisciplinary Approach %A Jia,Si Si %A Luo,Xinwei %A Gibson,Alice Anne %A Partridge,Stephanie Ruth %+ Susan Wakil School of Nursing and Midwifery, Faculty of Medicine and Health, University of Sydney, Level 8, Susan Wakil Health Building, Camperdown, Sydney, 2006, Australia, 61 2 8627 1697, sisi.jia@sydney.edu.au %K online food delivery %K food environment %K dashboard %K web scraping %K big data %K surveillance %K monitoring %K prevention %K food %K food delivery %K development study %K development %K accessibility %K Australia %K monitoring tool %K tool %K tools %D 2024 %7 13.8.2024 %9 Original Paper %J JMIR Public Health Surveill %G English %X Background: Online food delivery services (OFDS) enable individuals to conveniently access foods from any deliverable location. The increased accessibility to foods may have implications on the consumption of healthful or unhealthful foods. Concerningly, previous research suggests that OFDS offer an abundance of energy-dense and nutrient-poor foods, which are heavily promoted through deals or discounts. Objective: In this paper, we describe the development of the DIGIFOOD dashboard to monitor the digitalization of local food environments in New South Wales, Australia, resulting from the proliferation of OFDS. Methods: Together with a team of data scientists, we designed a purpose-built dashboard using Microsoft Power BI. The development process involved three main stages: (1) data acquisition of food outlets via web scraping, (2) data cleaning and processing, and (3) visualization of food outlets on the dashboard. We also describe the categorization process of food outlets to characterize the healthfulness of local, online, and hybrid food environments. These categories included takeaway franchises, independent takeaways, independent restaurants and cafes, supermarkets or groceries, bakeries, alcohol retailers, convenience stores, and sandwich or salad shops. Results: To date, the DIGIFOOD dashboard has mapped 36,967 unique local food outlets (locally accessible and scraped from Google Maps) and 16,158 unique online food outlets (accessible online and scraped from Uber Eats) across New South Wales, Australia. In 2023, the market-leading OFDS operated in 1061 unique suburbs or localities in New South Wales. The Sydney-Parramatta region, a major urban area in New South Wales accounting for 28 postcodes, recorded the highest number of online food outlets (n=4221). In contrast, the Far West and Orana region, a rural area in New South Wales with only 2 postcodes, recorded the lowest number of food outlets accessible online (n=7). Urban areas appeared to have the greatest increase in total food outlets accessible via online food delivery. In both local and online food environments, it was evident that independent restaurants and cafes comprised the largest proportion of food outlets at 47.2% (17,437/36,967) and 51.8% (8369/16,158), respectively. However, compared to local food environments, the online food environment has relatively more takeaway franchises (2734/16,158, 16.9% compared to 3273/36,967, 8.9%) and independent takeaway outlets (2416/16,158, 14.9% compared to 4026/36,967, 10.9%). Conclusions: The DIGIFOOD dashboard leverages the current rich data landscape to display and contrast the availability and healthfulness of food outlets that are locally accessible versus accessible online. The DIGIFOOD dashboard can be a useful monitoring tool for the evolving digital food environment at a regional scale and has the potential to be scaled up at a national level. Future iterations of the dashboard, including data from additional prominent OFDS, can be used by policy makers to identify high-priority areas with limited access to healthful foods both online and locally. %M 39137032 %R 10.2196/59924 %U https://publichealth.jmir.org/2024/1/e59924 %U https://doi.org/10.2196/59924 %U http://www.ncbi.nlm.nih.gov/pubmed/39137032 %0 Journal Article %@ 2369-3762 %I JMIR Publications %V 10 %N %P e50667 %T Identifying Learning Preferences and Strategies in Health Data Science Courses: Systematic Review %A Rohani,Narjes %A Sowa,Stephen %A Manataki,Areti %+ Usher Institute, University of Edinburgh, Old Medical School, Teviot Place, Edinburgh, EH8 9AG, United Kingdom, 44 131 650 3, Narjes.rohani@ed.ac.uk %K health data science %K bioinformatics %K learning approach %K learning preference %K learning tactic %K learning strategy %K interdisciplinary %K systematic review %K medical education %D 2024 %7 12.8.2024 %9 Review %J JMIR Med Educ %G English %X Background: Learning and teaching interdisciplinary health data science (HDS) is highly challenging, and despite the growing interest in HDS education, little is known about the learning experiences and preferences of HDS students. Objective: We conducted a systematic review to identify learning preferences and strategies in the HDS discipline. Methods: We searched 10 bibliographic databases (PubMed, ACM Digital Library, Web of Science, Cochrane Library, Wiley Online Library, ScienceDirect, SpringerLink, EBSCOhost, ERIC, and IEEE Xplore) from the date of inception until June 2023. We followed the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines and included primary studies written in English that investigated the learning preferences or strategies of students in HDS-related disciplines, such as bioinformatics, at any academic level. Risk of bias was independently assessed by 2 screeners using the Mixed Methods Appraisal Tool, and we used narrative data synthesis to present the study results. Results: After abstract screening and full-text reviewing of the 849 papers retrieved from the databases, 8 (0.9%) studies, published between 2009 and 2021, were selected for narrative synthesis. The majority of these papers (7/8, 88%) investigated learning preferences, while only 1 (12%) paper studied learning strategies in HDS courses. The systematic review revealed that most HDS learners prefer visual presentations as their primary learning input. In terms of learning process and organization, they mostly tend to follow logical, linear, and sequential steps. Moreover, they focus more on abstract information, rather than detailed and concrete information. Regarding collaboration, HDS students sometimes prefer teamwork, and sometimes they prefer to work alone. Conclusions: The studies’ quality, assessed using the Mixed Methods Appraisal Tool, ranged between 73% and 100%, indicating excellent quality overall. However, the number of studies in this area is small, and the results of all studies are based on self-reported data. Therefore, more research needs to be conducted to provide insight into HDS education. We provide some suggestions, such as using learning analytics and educational data mining methods, for conducting future research to address gaps in the literature. We also discuss implications for HDS educators, and we make recommendations for HDS course design; for example, we recommend including visual materials, such as diagrams and videos, and offering step-by-step instructions for students. %M 39133909 %R 10.2196/50667 %U https://mededu.jmir.org/2024/1/e50667 %U https://doi.org/10.2196/50667 %U http://www.ncbi.nlm.nih.gov/pubmed/39133909 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e47100 %T Comparing Federal Communications Commission and Microsoft Estimates of Broadband Access for Mental Health Video Telemedicine Among Veterans: Retrospective Cohort Study %A O'Shea,Amy MJ %A Mulligan,Kailey %A Carter,Knute D %A Haraldsson,Bjarni %A Wray,Charlie M %A Shahnazi,Ariana %A Kaboli,Peter J %+ Center for Access and Delivery Research and Evaluation (CADRE), Iowa City VA Healthcare System, 601 US Hwy 6, Iowa City, IA, 52246, United States, 1 3193380581, amy.oshea@va.gov %K broadband %K telemedicine %K Federal Communications Commission %K veterans %K United States Department of Veterans Affairs %K internet %K mental health care %K veteran health %K broadband access %K web-based %K digital %D 2024 %7 8.8.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: The COVID-19 pandemic highlighted the importance of telemedicine in health care. However, video telemedicine requires adequate broadband internet speeds. As video-based telemedicine grows, variations in broadband access must be accurately measured and characterized. Objective: This study aims to compare the Federal Communications Commission (FCC) and Microsoft US broadband use data sources to measure county-level broadband access among veterans receiving mental health care from the Veterans Health Administration (VHA). Methods: Retrospective observational cohort study using administrative data to identify mental health visits from January 1, 2019, to December 31, 2020, among 1161 VHA mental health clinics. The exposure is county-level broadband percentages calculated as the percentage of the county population with access to adequate broadband speeds (ie, download >25 megabits per second) as measured by the FCC and Microsoft. All veterans receiving VHA mental health services during the study period were included and categorized based on their use of video mental health visits. Broadband access was compared between and within data sources, stratified by video versus no video telemedicine use. Results: Over the 2-year study period, 1,474,024 veterans with VHA mental health visits were identified. Average broadband percentages varied by source (FCC mean 91.3%, SD 12.5% vs Microsoft mean 48.2%, SD 18.1%; P<.001). Within each data source, broadband percentages generally increased from 2019 to 2020. Adjusted regression analyses estimated the change after pandemic onset versus before the pandemic in quarterly county-based mental health visit counts at prespecified broadband percentages. Using FCC model estimates, given all other covariates are constant and assuming an FCC percentage set at 70%, the incidence rate ratio (IRR) of county-level quarterly mental video visits during the COVID-19 pandemic was 6.81 times (95% CI 6.49-7.13) the rate before the pandemic. In comparison, the model using Microsoft data exhibited a stronger association (IRR 7.28; 95% CI 6.78-7.81). This relationship held across all broadband access levels assessed. Conclusions: This study found FCC broadband data estimated higher and less variable county-level broadband percentages compared to those estimated using Microsoft data. Regardless of the data source, veterans without mental health video visits lived in counties with lower broadband access, highlighting the need for accurate broadband speeds to prioritize infrastructure and intervention development based on the greatest community-level impacts. Future work should link broadband access to differences in clinical outcomes. %M 39116440 %R 10.2196/47100 %U https://www.jmir.org/2024/1/e47100 %U https://doi.org/10.2196/47100 %U http://www.ncbi.nlm.nih.gov/pubmed/39116440 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e53369 %T Assessing Opportunities and Barriers to Improving the Secondary Use of Health Care Data at the National Level: Multicase Study in the Kingdom of Saudi Arabia and Estonia %A Metsallik,Janek %A Draheim,Dirk %A Sabic,Zlatan %A Novak,Thomas %A Ross,Peeter %+ E-Medicine Centre, Department of Health Technologies, School of Information Technologies, Tallinn University of Technology, Akadeemia 15a, Tallinn, 12616, Estonia, 372 56485978, janek.metsallik@taltech.ee %K health data governance %K secondary use %K health information sharing maturity %K large-scale interoperability %K health data stewardship %K health data custodianship %K health information purpose %K health data policy %D 2024 %7 8.8.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Digitization shall improve the secondary use of health care data. The Government of the Kingdom of Saudi Arabia ordered a project to compile the National Master Plan for Health Data Analytics, while the Government of Estonia ordered a project to compile the Person-Centered Integrated Hospital Master Plan. Objective: This study aims to map these 2 distinct projects’ problems, approaches, and outcomes to find the matching elements for reuse in similar cases. Methods: We assessed both health care systems’ abilities for secondary use of health data by exploratory case studies with purposive sampling and data collection via semistructured interviews and documentation review. The collected content was analyzed qualitatively and coded according to a predefined framework. The analytical framework consisted of data purpose, flow, and sharing. The Estonian project used the Health Information Sharing Maturity Model from the Mitre Corporation as an additional analytical framework. The data collection and analysis in the Kingdom of Saudi Arabia took place in 2019 and covered health care facilities, public health institutions, and health care policy. The project in Estonia collected its inputs in 2020 and covered health care facilities, patient engagement, public health institutions, health care financing, health care policy, and health technology innovations. Results: In both cases, the assessments resulted in a set of recommendations focusing on the governance of health care data. In the Kingdom of Saudi Arabia, the health care system consists of multiple isolated sectors, and there is a need for an overarching body coordinating data sets, indicators, and reports at the national level. The National Master Plan of Health Data Analytics proposed a set of organizational agreements for proper stewardship. Despite Estonia’s national Digital Health Platform, the requirements remain uncoordinated between various data consumers. We recommended reconfiguring the stewardship of the national health data to include multipurpose data use into the scope of interoperability standardization. Conclusions: Proper data governance is the key to improving the secondary use of health data at the national level. The data flows from data providers to data consumers shall be coordinated by overarching stewardship structures and supported by interoperable data custodians. %M 39116424 %R 10.2196/53369 %U https://www.jmir.org/2024/1/e53369 %U https://doi.org/10.2196/53369 %U http://www.ncbi.nlm.nih.gov/pubmed/39116424 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e52180 %T Ethical, Legal, and Practical Concerns Surrounding the Implemention of New Forms of Consent for Health Data Research: Qualitative Interview Study %A Wiertz,Svenja %A Boldt,Joachim %+ Department of Medical Ethics and the History of Medicine, University of Freiburg, Stefan-Meier-Str. 26, Freiburg, 79104, Germany, 49 761 203 5044, wiertz@egm.uni-freiburg.de %K health data %K health research %K informed consent %K broad consent %K tiered consent %K consent management %K digital infrastructure %K data safety %K GDPR %D 2024 %7 7.8.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: In Europe, within the scope of the General Data Protection Regulation, more and more digital infrastructures are created to allow for large-scale access to patients’ health data and their use for research. When the research is performed on the basis of patient consent, traditional study-specific consent appears too cumbersome for many researchers. Alternative models of consent are currently being discussed and introduced in different contexts. Objective: This study explores stakeholder perspectives on ethical, legal, and practical concerns regarding models of consent for health data research at German university medical centers. Methods: Semistructured focus group interviews were conducted with medical researchers at German university medical centers, health IT specialists, data protection officers, and patient representatives. The interviews were analyzed using a software-supported structuring qualitative content analysis. Results: Stakeholders regarded broad consent to be only marginally less laborious to implement and manage than tiered consent. Patient representatives favored specific consent, with tiered consent as a possible alternative. All stakeholders lamented that information material was difficult to understand. Oral information and videos were mentioned as a means of improvement. Patient representatives doubted that researchers had a sufficient degree of data security expertise to act as sole information providers. They were afraid of undue pressure if obtaining health data research consent were part of medical appointments. IT specialists and other stakeholders regarded the withdrawal of consent to be a major challenge and called for digital consent management solutions. On the one hand, the transfer of health data to non-European countries and for-profit organizations is seen as a necessity for research. On the other hand, there are data security concerns with regard to these actors. Research without consent is legally possible under certain conditions but deemed problematic by all stakeholder groups, albeit for differing reasons and to different degrees. Conclusions: More efforts should be made to determine which options of choice should be included in health data research consent. Digital tools could improve patient information and facilitate consent management. A unified and strict regulation for research without consent is required at the national and European Union level. Obtaining consent for health data research should be independent of medical appointments, and additional personnel should be trained in data security to provide information on health data research. %M 39110970 %R 10.2196/52180 %U https://www.jmir.org/2024/1/e52180 %U https://doi.org/10.2196/52180 %U http://www.ncbi.nlm.nih.gov/pubmed/39110970 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 12 %N %P e56627 %T Advancing Accuracy in Multimodal Medical Tasks Through Bootstrapped Language-Image Pretraining (BioMedBLIP): Performance Evaluation Study %A Naseem,Usman %A Thapa,Surendrabikram %A Masood,Anum %+ Department of Circulation and Medical Imaging, Norwegian University of Science and Technology, B4-135, Realfagbygget Building., Gloshaugen Campus, Trondheim, 7491, Norway, 47 92093743, anum.msd@gmail.com %K biomedical text mining %K BioNLP %K vision-language pretraining %K multimodal models %K medical image analysis %D 2024 %7 5.8.2024 %9 Original Paper %J JMIR Med Inform %G English %X Background: Medical image analysis, particularly in the context of visual question answering (VQA) and image captioning, is crucial for accurate diagnosis and educational purposes. Objective: Our study aims to introduce BioMedBLIP models, fine-tuned for VQA tasks using specialized medical data sets such as Radiology Objects in Context and Medical Information Mart for Intensive Care-Chest X-ray, and evaluate their performance in comparison to the state of the art (SOTA) original Bootstrapping Language-Image Pretraining (BLIP) model. Methods: We present 9 versions of BioMedBLIP across 3 downstream tasks in various data sets. The models are trained on a varying number of epochs. The findings indicate the strong overall performance of our models. We proposed BioMedBLIP for the VQA generation model, VQA classification model, and BioMedBLIP image caption model. We conducted pretraining in BLIP using medical data sets, producing an adapted BLIP model tailored for medical applications. Results: In VQA generation tasks, BioMedBLIP models outperformed the SOTA on the Semantically-Labeled Knowledge-Enhanced (SLAKE) data set, VQA in Radiology (VQA-RAD), and Image Cross-Language Evaluation Forum data sets. In VQA classification, our models consistently surpassed the SOTA on the SLAKE data set. Our models also showed competitive performance on the VQA-RAD and PathVQA data sets. Similarly, in image captioning tasks, our model beat the SOTA, suggesting the importance of pretraining with medical data sets. Overall, in 20 different data sets and task combinations, our BioMedBLIP excelled in 15 (75%) out of 20 tasks. BioMedBLIP represents a new SOTA in 15 (75%) out of 20 tasks, and our responses were rated higher in all 20 tasks (P<.005) in comparison to SOTA models. Conclusions: Our BioMedBLIP models show promising performance and suggest that incorporating medical knowledge through pretraining with domain-specific medical data sets helps models achieve higher performance. Our models thus demonstrate their potential to advance medical image analysis, impacting diagnosis, medical education, and research. However, data quality, task-specific variability, computational resources, and ethical considerations should be carefully addressed. In conclusion, our models represent a contribution toward the synergy of artificial intelligence and medicine. We have made BioMedBLIP freely available, which will help in further advancing research in multimodal medical tasks. %M 39102281 %R 10.2196/56627 %U https://medinform.jmir.org/2024/1/e56627 %U https://doi.org/10.2196/56627 %U http://www.ncbi.nlm.nih.gov/pubmed/39102281 %0 Journal Article %@ 2292-9495 %I JMIR Publications %V 11 %N %P e52257 %T Understanding the Use of Mobility Data in Disasters: Exploratory Qualitative Study of COVID-19 User Feedback %A Chan,Jennifer Lisa %A Tsay,Sarah %A Sambara,Sraavya %A Welch,Sarah B %+ Department of Emergency Medicine, Feinberg School of Medicine, Northwestern University, 211 E. Ontario Street312-694-7000, Chicago, IL, 60611, United States, 1 312 694 7000, jennifer-chan@northwestern.edu %K mobility data %K disasters %K surveillance %K COVID-19 %K qualitative %K user feedback %K policy making %K emergency %K pandemic %K disaster response %K data usage %K situational awareness %K data translation %K big data %D 2024 %7 1.8.2024 %9 Original Paper %J JMIR Hum Factors %G English %X Background: Human mobility data have been used as a potential novel data source to guide policies and response planning during the COVID-19 global pandemic. The COVID-19 Mobility Data Network (CMDN) facilitated the use of human mobility data around the world. Both researchers and policy makers assumed that mobility data would provide insights to help policy makers and response planners. However, evidence that human mobility data were operationally useful and provided added value for public health response planners remains largely unknown. Objective: This exploratory study focuses on advancing the understanding of the use of human mobility data during the early phase of the COVID-19 pandemic. The study explored how researchers and practitioners around the world used these data in response planning and policy making, focusing on processing data and human factors enabling or hindering use of the data. Methods: Our project was based on phenomenology and used an inductive approach to thematic analysis. Transcripts were open-coded to create the codebook that was then applied by 2 team members who blind-coded all transcripts. Consensus coding was used for coding discrepancies. Results: Interviews were conducted with 45 individuals during the early period of the COVID-19 pandemic. Although some teams used mobility data for response planning, few were able to describe their uses in policy making, and there were no standardized ways that teams used mobility data. Mobility data played a larger role in providing situational awareness for government partners, helping to understand where people were moving in relation to the spread of COVID-19 variants and reactions to stay-at-home orders. Interviewees who felt they were more successful using mobility data often cited an individual who was able to answer general questions about mobility data; provide interactive feedback on results; and enable a 2-way communication exchange about data, meaning, value, and potential use. Conclusions: Human mobility data were used as a novel data source in the COVID-19 pandemic by a network of academic researchers and practitioners using privacy-preserving and anonymized mobility data. This study reflects the processes in analyzing and communicating human mobility data, as well as how these data were used in response planning and how the data were intended for use in policy making. The study reveals several valuable use cases. Ultimately, the role of a data translator was crucial in understanding the complexities of this novel data source. With this role, teams were able to adapt workflows, visualizations, and reports to align with end users and decision makers while communicating this information meaningfully to address the goals of responders and policy makers. %M 39088256 %R 10.2196/52257 %U https://humanfactors.jmir.org/2024/1/e52257 %U https://doi.org/10.2196/52257 %U http://www.ncbi.nlm.nih.gov/pubmed/39088256 %0 Journal Article %@ 1947-2579 %I JMIR Publications %V 16 %N %P e56237 %T Making Metadata Machine-Readable as the First Step to Providing Findable, Accessible, Interoperable, and Reusable Population Health Data: Framework Development and Implementation Study %A Amadi,David %A Kiwuwa-Muyingo,Sylvia %A Bhattacharjee,Tathagata %A Taylor,Amelia %A Kiragga,Agnes %A Ochola,Michael %A Kanjala,Chifundo %A Gregory,Arofan %A Tomlin,Keith %A Todd,Jim %A Greenfield,Jay %+ Department of Population Health, Faculty of Epidemiology and Population Health, London School of Hygiene and Tropical Medicine, Keppel St, London, WC1E 7HT, United Kingdom, 44 20 7636 8636, david.amadi@lshtm.ac.uk %K FAIR data principles %K metadata %K machine-readable metadata %K DDI %K Data Documentation Initiative %K standardization %K JSON-LD %K JavaScript Object Notation for Linked Data %K OMOP CDM %K Observational Medical Outcomes Partnership Common Data Model %K data science %K data models %D 2024 %7 1.8.2024 %9 Original Paper %J Online J Public Health Inform %G English %X Background: Metadata describe and provide context for other data, playing a pivotal role in enabling findability, accessibility, interoperability, and reusability (FAIR) data principles. By providing comprehensive and machine-readable descriptions of digital resources, metadata empower both machines and human users to seamlessly discover, access, integrate, and reuse data or content across diverse platforms and applications. However, the limited accessibility and machine-interpretability of existing metadata for population health data hinder effective data discovery and reuse. Objective: To address these challenges, we propose a comprehensive framework using standardized formats, vocabularies, and protocols to render population health data machine-readable, significantly enhancing their FAIRness and enabling seamless discovery, access, and integration across diverse platforms and research applications. Methods: The framework implements a 3-stage approach. The first stage is Data Documentation Initiative (DDI) integration, which involves leveraging the DDI Codebook metadata and documentation of detailed information for data and associated assets, while ensuring transparency and comprehensiveness. The second stage is Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) standardization. In this stage, the data are harmonized and standardized into the OMOP CDM, facilitating unified analysis across heterogeneous data sets. The third stage involves the integration of Schema.org and JavaScript Object Notation for Linked Data (JSON-LD), in which machine-readable metadata are generated using Schema.org entities and embedded within the data using JSON-LD, boosting discoverability and comprehension for both machines and human users. We demonstrated the implementation of these 3 stages using the Integrated Disease Surveillance and Response (IDSR) data from Malawi and Kenya. Results: The implementation of our framework significantly enhanced the FAIRness of population health data, resulting in improved discoverability through seamless integration with platforms such as Google Dataset Search. The adoption of standardized formats and protocols streamlined data accessibility and integration across various research environments, fostering collaboration and knowledge sharing. Additionally, the use of machine-interpretable metadata empowered researchers to efficiently reuse data for targeted analyses and insights, thereby maximizing the overall value of population health resources. The JSON-LD codes are accessible via a GitHub repository and the HTML code integrated with JSON-LD is available on the Implementation Network for Sharing Population Information from Research Entities website. Conclusions: The adoption of machine-readable metadata standards is essential for ensuring the FAIRness of population health data. By embracing these standards, organizations can enhance diverse resource visibility, accessibility, and utility, leading to a broader impact, particularly in low- and middle-income countries. Machine-readable metadata can accelerate research, improve health care decision-making, and ultimately promote better health outcomes for populations worldwide. %M 39088253 %R 10.2196/56237 %U https://ojphi.jmir.org/2024/1/e56237 %U https://doi.org/10.2196/56237 %U http://www.ncbi.nlm.nih.gov/pubmed/39088253 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e48595 %T Early Detection of Pulmonary Embolism in a General Patient Population Immediately Upon Hospital Admission Using Machine Learning to Identify New, Unidentified Risk Factors: Model Development Study %A Ben Yehuda,Ori %A Itelman,Edward %A Vaisman,Adva %A Segal,Gad %A Lerner,Boaz %+ Department of Industrial Engineering and Management, Ben-Gurion University of the Negev, POB 653, Beer-Sheva, 84105, Israel, 972 +972544399763, boaz@bgu.ac.il %K pulmonary embolism %K deep vein thrombosis %K venous thromboembolism %K imbalanced data %K clustering %K risk factors %K Wells score %K revised Genova score %K hospital admission %K machine learning %D 2024 %7 30.7.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Under- or late identification of pulmonary embolism (PE)—a thrombosis of 1 or more pulmonary arteries that seriously threatens patients’ lives—is a major challenge confronting modern medicine. Objective: We aimed to establish accurate and informative machine learning (ML) models to identify patients at high risk for PE as they are admitted to the hospital, before their initial clinical checkup, by using only the information in their medical records. Methods: We collected demographics, comorbidities, and medications data for 2568 patients with PE and 52,598 control patients. We focused on data available prior to emergency department admission, as these are the most universally accessible data. We trained an ML random forest algorithm to detect PE at the earliest possible time during a patient’s hospitalization—at the time of his or her admission. We developed and applied 2 ML-based methods specifically to address the data imbalance between PE and non-PE patients, which causes misdiagnosis of PE. Results: The resulting models predicted PE based on age, sex, BMI, past clinical PE events, chronic lung disease, past thrombotic events, and usage of anticoagulants, obtaining an 80% geometric mean value for the PE and non-PE classification accuracies. Although on hospital admission only 4% (1942/46,639) of the patients had a diagnosis of PE, we identified 2 clustering schemes comprising subgroups with more than 61% (705/1120 in clustering scheme 1; 427/701 and 340/549 in clustering scheme 2) positive patients for PE. One subgroup in the first clustering scheme included 36% (705/1942) of all patients with PE who were characterized by a definite past PE diagnosis, a 6-fold higher prevalence of deep vein thrombosis, and a 3-fold higher prevalence of pneumonia, compared with patients of the other subgroups in this scheme. In the second clustering scheme, 2 subgroups (1 of only men and 1 of only women) included patients who all had a past PE diagnosis and a relatively high prevalence of pneumonia, and a third subgroup included only those patients with a past diagnosis of pneumonia. Conclusions: This study established an ML tool for early diagnosis of PE almost immediately upon hospital admission. Despite the highly imbalanced scenario undermining accurate PE prediction and using information available only from the patient’s medical history, our models were both accurate and informative, enabling the identification of patients already at high risk for PE upon hospital admission, even before the initial clinical checkup was performed. The fact that we did not restrict our patients to those at high risk for PE according to previously published scales (eg, Wells or revised Genova scores) enabled us to accurately assess the application of ML on raw medical data and identify new, previously unidentified risk factors for PE, such as previous pulmonary disease, in general populations. %M 39079116 %R 10.2196/48595 %U https://www.jmir.org/2024/1/e48595 %U https://doi.org/10.2196/48595 %U http://www.ncbi.nlm.nih.gov/pubmed/39079116 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 12 %N %P e52896 %T Unsupervised Feature Selection to Identify Important ICD-10 and ATC Codes for Machine Learning on a Cohort of Patients With Coronary Heart Disease: Retrospective Study %A Ghasemi,Peyman %A Lee,Joon %K unsupervised feature selection %K ICD-10 %K International Classification of Diseases %K ATC %K Anatomical Therapeutic Chemical %K concrete autoencoder %K Laplacian score %K unsupervised feature selection for multicluster data %K autoencoder-inspired unsupervised feature selection %K principal feature analysis %K machine learning %K artificial intelligence %K case study %K coronary artery disease %K artery disease %K patient cohort %K artery %K mortality prediction %K mortality %K data set %K interpretability %K International Classification of Diseases, Tenth Revision %D 2024 %7 26.7.2024 %9 %J JMIR Med Inform %G English %X Background: The application of machine learning in health care often necessitates the use of hierarchical codes such as the International Classification of Diseases (ICD) and Anatomical Therapeutic Chemical (ATC) systems. These codes classify diseases and medications, respectively, thereby forming extensive data dimensions. Unsupervised feature selection tackles the “curse of dimensionality” and helps to improve the accuracy and performance of supervised learning models by reducing the number of irrelevant or redundant features and avoiding overfitting. Techniques for unsupervised feature selection, such as filter, wrapper, and embedded methods, are implemented to select the most important features with the most intrinsic information. However, they face challenges due to the sheer volume of ICD and ATC codes and the hierarchical structures of these systems. Objective: The objective of this study was to compare several unsupervised feature selection methods for ICD and ATC code databases of patients with coronary artery disease in different aspects of performance and complexity and select the best set of features representing these patients. Methods: We compared several unsupervised feature selection methods for 2 ICD and 1 ATC code databases of 51,506 patients with coronary artery disease in Alberta, Canada. Specifically, we used the Laplacian score, unsupervised feature selection for multicluster data, autoencoder-inspired unsupervised feature selection, principal feature analysis, and concrete autoencoders with and without ICD or ATC tree weight adjustment to select the 100 best features from over 9000 ICD and 2000 ATC codes. We assessed the selected features based on their ability to reconstruct the initial feature space and predict 90-day mortality following discharge. We also compared the complexity of the selected features by mean code level in the ICD or ATC tree and the interpretability of the features in the mortality prediction task using Shapley analysis. Results: In feature space reconstruction and mortality prediction, the concrete autoencoder–based methods outperformed other techniques. Particularly, a weight-adjusted concrete autoencoder variant demonstrated improved reconstruction accuracy and significant predictive performance enhancement, confirmed by DeLong and McNemar tests (P<.05). Concrete autoencoders preferred more general codes, and they consistently reconstructed all features accurately. Additionally, features selected by weight-adjusted concrete autoencoders yielded higher Shapley values in mortality prediction than most alternatives. Conclusions: This study scrutinized 5 feature selection methods in ICD and ATC code data sets in an unsupervised context. Our findings underscore the superiority of the concrete autoencoder method in selecting salient features that represent the entire data set, offering a potential asset for subsequent machine learning research. We also present a novel weight adjustment approach for the concrete autoencoders specifically tailored for ICD and ATC code data sets to enhance the generalizability and interpretability of the selected features. %R 10.2196/52896 %U https://medinform.jmir.org/2024/1/e52896 %U https://doi.org/10.2196/52896 %0 Journal Article %@ 2369-2960 %I JMIR Publications %V 10 %N %P e54281 %T Pooled Cohort Profile: ReCoDID Consortium’s Harmonized Acute Febrile Illness Arbovirus Meta-Cohort %A Gómez,Gustavo %A Hufstedler,Heather %A Montenegro Morales,Carlos %A Roell,Yannik %A Lozano-Parra,Anyela %A Tami,Adriana %A Magalhaes,Tereza %A Marques,Ernesto T A %A Balmaseda,Angel %A Calvet,Guilherme %A Harris,Eva %A Brasil,Patricia %A Herrera,Victor %A Villar,Luis %A Maxwell,Lauren %A Jaenisch,Thomas %A , %+ Heidelberg Institute of Global Health, Heidelberg University Hospital, Im Neuenheimer Feld 130.3, Heidelberg, 69120, Germany, 49 06221 56 0, heather.hufstedler@uni-heidelberg.de %K infectious disease %K harmonized meta-cohort %K IPD-MA %K arbovirus %K dengue %K zika %K chikungunya %K surveillance %K public health %K open access data %K FAIR principles %K febrile illness %K clinical-epidemiological data %K cross-disease interaction %K epidemiology %K consortium %K innovation %K statistical tool %K Latin America %K Maelstrom's %K methodology %K CDISC %K immunological interaction %K flavivirus %K infection %K arboviral disease %D 2024 %7 23.7.2024 %9 Viewpoint %J JMIR Public Health Surveill %G English %X Infectious disease (ID) cohorts are key to advancing public health surveillance, public policies, and pandemic responses. Unfortunately, ID cohorts often lack funding to store and share clinical-epidemiological (CE) data and high-dimensional laboratory (HDL) data long term, which is evident when the link between these data elements is not kept up to date. This becomes particularly apparent when smaller cohorts fail to successfully address the initial scientific objectives due to limited case numbers, which also limits the potential to pool these studies to monitor long-term cross-disease interactions within and across populations. CE data from 9 arbovirus (arthropod-borne viruses) cohorts in Latin America were retrospectively harmonized using the Maelstrom Research methodology and standardized to Clinical Data Interchange Standards Consortium (CDISC). We created a harmonized and standardized meta-cohort that contains CE and HDL data from 9 arbovirus studies from Latin America. To facilitate advancements in cross-population inference and reuse of cohort data, the Reconciliation of Cohort Data for Infectious Diseases (ReCoDID) Consortium harmonized and standardized CE and HDL from 9 arbovirus cohorts into 1 meta-cohort. Interested parties will be able to access data dictionaries that include information on variables across the data sets via Bio Studies. After consultation with each cohort, linked harmonized and curated human cohort data (CE and HDL) will be made accessible through the European Genome-phenome Archive platform to data users after their requests are evaluated by the ReCoDID Data Access Committee. This meta-cohort can facilitate various joint research projects (eg, on immunological interactions between sequential flavivirus infections and for the evaluation of potential biomarkers for severe arboviral disease). %M 39042429 %R 10.2196/54281 %U https://publichealth.jmir.org/2024/1/e54281 %U https://doi.org/10.2196/54281 %U http://www.ncbi.nlm.nih.gov/pubmed/39042429 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 12 %N %P e57005 %T Uncovering Harmonization Potential in Health Care Data Through Iterative Refinement of Fast Healthcare Interoperability Resources Profiles Based on Retrospective Discrepancy Analysis: Case Study %A Rosenau,Lorenz %A Behrend,Paul %A Wiedekopf,Joshua %A Gruendner,Julian %A Ingenerf,Josef %+ IT Center for Clinical Research, University of Lübeck, Gebäude 64, 2.OG, Raum 05, Ratzeburger Allee 160, Lübeck, 23562, Germany, 49 451 3101 5636, lorenz.rosenau@uni-luebeck.de %K Health Level 7 Fast Healthcare Interoperability Resources %K HL7 FHIR %K FHIR profiles %K interoperability %K data harmonization %K discrepancy analysis %K data quality %K cross-institutional data exchange %K Medical Informatics Initiative %K federated data access challenges %D 2024 %7 23.7.2024 %9 Original Paper %J JMIR Med Inform %G English %X Background: Cross-institutional interoperability between health care providers remains a recurring challenge worldwide. The German Medical Informatics Initiative, a collaboration of 37 university hospitals in Germany, aims to enable interoperability between partner sites by defining Fast Healthcare Interoperability Resources (FHIR) profiles for the cross-institutional exchange of health care data, the Core Data Set (CDS). The current CDS and its extension modules define elements representing patients’ health care records. All university hospitals in Germany have made significant progress in providing routine data in a standardized format based on the CDS. In addition, the central research platform for health, the German Portal for Medical Research Data feasibility tool, allows medical researchers to query the available CDS data items across many participating hospitals. Objective: In this study, we aimed to evaluate a novel approach of combining the current top-down generated FHIR profiles with the bottom-up generated knowledge gained by the analysis of respective instance data. This allowed us to derive options for iteratively refining FHIR profiles using the information obtained from a discrepancy analysis. Methods: We developed an FHIR validation pipeline and opted to derive more restrictive profiles from the original CDS profiles. This decision was driven by the need to align more closely with the specific assumptions and requirements of the central feasibility platform’s search ontology. While the original CDS profiles offer a generic framework adaptable for a broad spectrum of medical informatics use cases, they lack the specificity to model the nuanced criteria essential for medical researchers. A key example of this is the necessity to represent specific laboratory codings and values interdependencies accurately. The validation results allow us to identify discrepancies between the instance data at the clinical sites and the profiles specified by the feasibility platform and addressed in the future. Results: A total of 20 university hospitals participated in this study. Historical factors, lack of harmonization, a wide range of source systems, and case sensitivity of coding are some of the causes for the discrepancies identified. While in our case study, Conditions, Procedures, and Medications have a high degree of uniformity in the coding of instance data due to legislative requirements for billing in Germany, we found that laboratory values pose a significant data harmonization challenge due to their interdependency between coding and value. Conclusions: While the CDS achieves interoperability, different challenges for federated data access arise, requiring more specificity in the profiles to make assumptions on the instance data. We further argue that further harmonization of the instance data can significantly lower required retrospective harmonization efforts. We recognize that discrepancies cannot be resolved solely at the clinical site; therefore, our findings have a wide range of implications and will require action on multiple levels and by various stakeholders. %M 39042420 %R 10.2196/57005 %U https://medinform.jmir.org/2024/1/e57005 %U https://doi.org/10.2196/57005 %U http://www.ncbi.nlm.nih.gov/pubmed/39042420 %0 Journal Article %@ 2561-1011 %I JMIR Publications %V 8 %N %P e54994 %T Identifying Predictors of Heart Failure Readmission in Patients From a Statutory Health Insurance Database: Retrospective Machine Learning Study %A Levinson,Rebecca T %A Paul,Cinara %A Meid,Andreas D %A Schultz,Jobst-Hendrik %A Wild,Beate %+ Department of General Internal Medicine and Psychosomatics, Heidelberg University Hospital, Heidelberg University, Im Neuenheimer Feld 410, Heidelberg, 69120, Germany, 49 6221565888, rebeccaterrall.levinson@med.uni-heidelberg.de %K statutory health insurance %K readmission %K machine learning %K heart failure %K heart %K cardiology %K cardiac %K hospitalization %K insurance %K predict %K predictive %K prediction %K predictions %K predictor %K predictors %K all cause %D 2024 %7 23.7.2024 %9 Original Paper %J JMIR Cardio %G English %X Background: Patients with heart failure (HF) are the most commonly readmitted group of adult patients in Germany. Most patients with HF are readmitted for noncardiovascular reasons. Understanding the relevance of HF management outside the hospital setting is critical to understanding HF and factors that lead to readmission. Application of machine learning (ML) on data from statutory health insurance (SHI) allows the evaluation of large longitudinal data sets representative of the general population to support clinical decision-making. Objective: This study aims to evaluate the ability of ML methods to predict 1-year all-cause and HF-specific readmission after initial HF-related admission of patients with HF in outpatient SHI data and identify important predictors. Methods: We identified individuals with HF using outpatient data from 2012 to 2018 from the AOK Baden-Württemberg SHI in Germany. We then trained and applied regression and ML algorithms to predict the first all-cause and HF-specific readmission in the year after the first admission for HF. We fitted a random forest, an elastic net, a stepwise regression, and a logistic regression to predict readmission by using diagnosis codes, drug exposures, demographics (age, sex, nationality, and type of coverage within SHI), degree of rurality for residence, and participation in disease management programs for common chronic conditions (diabetes mellitus type 1 and 2, breast cancer, chronic obstructive pulmonary disease, and coronary heart disease). We then evaluated the predictors of HF readmission according to their importance and direction to predict readmission. Results: Our final data set consisted of 97,529 individuals with HF, and 78,044 (80%) were readmitted within the observation period. Of the tested modeling approaches, the random forest approach best predicted 1-year all-cause and HF-specific readmission with a C-statistic of 0.68 and 0.69, respectively. Important predictors for 1-year all-cause readmission included prescription of pantoprazole, chronic obstructive pulmonary disease, atherosclerosis, sex, rurality, and participation in disease management programs for type 2 diabetes mellitus and coronary heart disease. Relevant features for HF-specific readmission included a large number of canonical HF comorbidities. Conclusions: While many of the predictors we identified were known to be relevant comorbidities for HF, we also uncovered several novel associations. Disease management programs have widely been shown to be effective at managing chronic disease; however, our results indicate that in the short term they may be useful for targeting patients with HF with comorbidity at increased risk of readmission. Our results also show that living in a more rural location increases the risk of readmission. Overall, factors beyond comorbid disease were relevant for risk of HF readmission. This finding may impact how outpatient physicians identify and monitor patients at risk of HF readmission. %R 10.2196/54994 %U https://cardio.jmir.org/2024/1/e54994 %U https://doi.org/10.2196/54994 %0 Journal Article %@ 2369-3762 %I %V 10 %N %P e53624 %T Data-Driven Fundraising: Strategic Plan for Medical Education %A Jalali,Alireza %A Nyman,Jacline %A Loeffelholz,Ouida %A Courtney,Chantelle %K fundraising %K philanthropy %K crowdfunding %K funding %K charity %K higher education %K university %K medical education %K educators %K advancement %K data analytics %K ethics %K ethical %K education %K medical school %K school %K support %K financial %K community %D 2024 %7 22.7.2024 %9 %J JMIR Med Educ %G English %X Higher education institutions, including medical schools, increasingly rely on fundraising to bridge funding gaps and support their missions. This paper presents a viewpoint on data-driven strategies in fundraising, outlining a 4-step approach for effective planning while considering ethical implications. It outlines a 4-step approach to creating an effective, end-to-end, data-driven fundraising plan, emphasizing the crucial stages of data collection, data analysis, goal establishment, and targeted strategy formulation. By leveraging internal and external data, schools can create tailored outreach initiatives that resonate with potential donors. However, the fundraising process must be grounded in ethical considerations. Ethical challenges, particularly in fundraising with grateful medical patients, necessitate transparent and honest practices prioritizing donors’ and beneficiaries’ rights and safeguarding public trust. This paper presents a viewpoint on the critical role of data-driven strategies in fundraising for medical education. It emphasizes integrating comprehensive data analysis with ethical considerations to enhance fundraising efforts in medical schools. By integrating data analytics with fundraising best practices and ensuring ethical practice, medical institutions can ensure financial support and foster enduring, trust-based relationships with their donor communities. %R 10.2196/53624 %U https://mededu.jmir.org/2024/1/e53624 %U https://doi.org/10.2196/53624 %0 Journal Article %@ 2369-2960 %I JMIR Publications %V 10 %N %P e45030 %T Contraceptive Use Measured in a National Population–Based Approach: Cross-Sectional Study of Administrative Versus Survey Data %A Congy,Juliette %A Rahib,Delphine %A Leroy,Céline %A Bouyer,Jean %A de La Rochebrochard,Elise %+ Sexual and Reproductive Health and Rights Unit, Institut National d'Etudes Démographiques, 9 Cours des Humanités, Aubervilliers, 93300, France, 33 616448773, congyjuliette@gmail.com %K contraception %K administrative data %K health data %K implant %K oral contraceptives %K intrauterine device %K IUD %K contraceptive prevalence %K contraceptive %K birth control %K monitoring %K public health issue %K population-based survey %K prevalence %D 2024 %7 22.7.2024 %9 Original Paper %J JMIR Public Health Surveill %G English %X Background: Prescribed contraception is used worldwide by over 400 million women of reproductive age. Monitoring contraceptive use is a major public health issue that usually relies on population-based surveys. However, these surveys are conducted on average every 6 years and do not allow close follow-up of contraceptive use. Moreover, their sample size is often too limited for the study of specific population subgroups such as people with low income. Health administrative data could be an innovative and less costly source to study contraceptive use. Objective: We aimed to explore the potential of health administrative data to study prescribed contraceptive use and compare these data with observations based on survey data. Methods: We selected all women aged 15-49 years, covered by French health insurance and living in France, in the health administrative database, which covers 98% of the resident population (n=14,788,124), and in the last French population–based representative survey, the Health Barometer Survey, conducted in 2016 (n=4285). In health administrative data, contraceptive use was recorded with detailed information on the product delivered, whereas in the survey, it was self-declared by the women. In both sources, the prevalence of contraceptive use was estimated globally for all prescribed contraceptives and by type of contraceptive: oral contraceptives, intrauterine devices (IUDs), and implants. Prevalences were analyzed by age. Results: There were more low-income women in health administrative data than in the population-based survey (1,576,066/14,770,256, 11% vs 188/4285, 7%, respectively; P<.001). In health administrative data, 47.6% (7034,710/14,770,256; 95% CI 47.6%-47.7%) of women aged 15-49 years used a prescribed contraceptive versus 50.5% (2297/4285; 95% CI 49.1%-52.0%) in the population-based survey. Considering prevalences by the type of contraceptive in health administrative data versus survey data, they were 26.9% (95% CI 26.9%-26.9%) versus 27.7% (95% CI 26.4%-29.0%) for oral contraceptives, 17.7% (95% CI 17.7%-17.8%) versus 19.6% (95% CI 18.5%-20.8%) for IUDs, and 3% (95% CI 3.0%-3.0%) versus 3.2% (95% CI 2.7%-3.7%) for implants. In both sources, the same overall tendency in prevalence was observed for these 3 contraceptives. Implants remained little used at all ages, oral contraceptives were highly used among young women, whereas IUD use was low among young women. Conclusions: Compared with survey data, health administrative data exhibited the same overall tendencies for oral contraceptives, IUDs, and implants. One of the main strengths of health administrative data is the high quality of information on contraceptive use and the large number of observations, allowing studies of subgroups of population. Health administrative data therefore appear as a promising new source to monitor contraception in a population-based approach. They could open new perspectives for research and be a valuable new asset to guide public policies on reproductive and sexual health. %M 39037774 %R 10.2196/45030 %U https://publichealth.jmir.org/2024/1/e45030 %U https://doi.org/10.2196/45030 %U http://www.ncbi.nlm.nih.gov/pubmed/39037774 %0 Journal Article %@ 2291-9694 %I %V 12 %N %P e54590 %T Data Lake, Data Warehouse, Datamart, and Feature Store: Their Contributions to the Complete Data Reuse Pipeline %A Lamer,Antoine %A Saint-Dizier,Chloé %A Paris,Nicolas %A Chazard,Emmanuel %K data reuse %K data lake %K data warehouse %K feature extraction %K datamart %K feature store %D 2024 %7 17.7.2024 %9 %J JMIR Med Inform %G English %X The growing adoption and use of health information technology has generated a wealth of clinical data in electronic format, offering opportunities for data reuse beyond direct patient care. However, as data are distributed across multiple software, it becomes challenging to cross-reference information between sources due to differences in formats, vocabularies, and technologies and the absence of common identifiers among software. To address these challenges, hospitals have adopted data warehouses to consolidate and standardize these data for research. Additionally, as a complement or alternative, data lakes store both source data and metadata in a detailed and unprocessed format, empowering exploration, manipulation, and adaptation of the data to meet specific analytical needs. Subsequently, datamarts are used to further refine data into usable information tailored to specific research questions. However, for efficient analysis, a feature store is essential to pivot and denormalize the data, simplifying queries. In conclusion, while data warehouses are crucial, data lakes, datamarts, and feature stores play essential and complementary roles in facilitating data reuse for research and analysis in health care. %R 10.2196/54590 %U https://medinform.jmir.org/2024/1/e54590 %U https://doi.org/10.2196/54590 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e54044 %T Predictive Model for Extended-Spectrum β-Lactamase–Producing Bacterial Infections Using Natural Language Processing Technique and Open Data in Intensive Care Unit Environment: Retrospective Observational Study %A Ito,Genta %A Yada,Shuntaro %A Wakamiya,Shoko %A Aramaki,Eiji %+ Department of Information Science, Nara Institute of Science and Technology, 8916-5 Takayama-cho, Ikoma City, 8916-5, Japan, 81 0743725204, aramaki@is.naist.jp %K predictive modeling %K MIMIC-3 dataset %K natural language processing %K NLP %K QuickUMLS %K named entity recognition %K ESBL-producing bacterial infections %D 2024 %7 10.7.2024 %9 Original Paper %J JMIR Form Res %G English %X Background: Machine learning has advanced medical event prediction, mostly using private data. The public MIMIC-3 (Medical Information Mart for Intensive Care III) data set, which contains detailed data on over 40,000 intensive care unit patients, stands out as it can help develop better models including structured and textual data. Objective: This study aimed to build and test a machine learning model using the MIMIC-3 data set to determine the effectiveness of information extracted from electronic medical record text using a named entity recognition, specifically QuickUMLS, for predicting important medical events. Using the prediction of extended-spectrum β-lactamase (ESBL)–producing bacterial infections as an example, this study shows how open data sources and simple technology can be useful for making clinically meaningful predictions. Methods: The MIMIC-3 data set, including demographics, vital signs, laboratory results, and textual data, such as discharge summaries, was used. This study specifically targeted patients diagnosed with Klebsiella pneumoniae or Escherichia coli infection. Predictions were based on ESBL-producing bacterial standards and the minimum inhibitory concentration criteria. Both the structured data and extracted patient histories were used as predictors. In total, 2 models, an L1-regularized logistic regression model and a LightGBM model, were evaluated using the receiver operating characteristic area under the curve (ROC-AUC) and the precision-recall curve area under the curve (PR-AUC). Results: Of 46,520 MIMIC-3 patients, 4046 were identified with bacterial cultures, indicating the presence of K pneumoniae or E coli. After excluding patients who lacked discharge summary text, 3614 patients remained. The L1-penalized model, with variables from only the structured data, displayed a ROC-AUC of 0.646 and a PR-AUC of 0.307. The LightGBM model, combining structured and textual data, achieved a ROC-AUC of 0.707 and a PR-AUC of 0.369. Key contributors to the LightGBM model included patient age, duration since hospital admission, and specific medical history such as diabetes. The structured data-based model showed improved performance compared to the reference models. Performance was further improved when textual medical history was included. Compared to other models predicting drug-resistant bacteria, the results of this study ranked in the middle. Some misidentifications, potentially due to the limitations of QuickUMLS, may have affected the accuracy of the model. Conclusions: This study successfully developed a predictive model for ESBL-producing bacterial infections using the MIMIC-3 data set, yielding results consistent with existing literature. This model stands out for its transparency and reliance on open data and open-named entity recognition technology. The performance of the model was enhanced using textual information. With advancements in natural language processing tools such as BERT and GPT, the extraction of medical data from text holds substantial potential for future model optimization. %M 38986131 %R 10.2196/54044 %U https://formative.jmir.org/2024/1/e54044 %U https://doi.org/10.2196/54044 %U http://www.ncbi.nlm.nih.gov/pubmed/38986131 %0 Journal Article %@ 1929-073X %I JMIR Publications %V 13 %N %P e51347 %T Research Trends and Evolution in Radiogenomics (2005-2023): Bibliometric Analysis %A Wang,Meng %A Peng,Yun %A Wang,Ya %A Luo,Dehong %+ Department of Radiology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital & Shenzhen Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, No.113 Baohe Road, Longgang Distribution, Shenzhen, 518116, China, 86 1 391 001 6309, cjr.luodehong@vip.163.com %K bibliometric %K radiogenomics %K multiomics %K genomics %K radiomics %D 2024 %7 9.7.2024 %9 Original Paper %J Interact J Med Res %G English %X Background: Radiogenomics is an emerging technology that integrates genomics and medical image–based radiomics, which is considered a promising approach toward achieving precision medicine. Objective: The aim of this study was to quantitatively analyze the research status, dynamic trends, and evolutionary trajectory in the radiogenomics field using bibliometric methods. Methods: The relevant literature published up to 2023 was retrieved from the Web of Science Core Collection. Excel was used to analyze the annual publication trend. VOSviewer was used for constructing the keywords co-occurrence network and the collaboration networks among countries and institutions. CiteSpace was used for citation keywords burst analysis and visualizing the references timeline. Results: A total of 3237 papers were included and exported in plain-text format. The annual number of publications showed an increasing annual trend. China and the United States have published the most papers in this field, with the highest number of citations in the United States and the highest average number per item in the Netherlands. Keywords burst analysis revealed that several keywords, including “big data,” “magnetic resonance spectroscopy,” “renal cell carcinoma,” “stage,” and “temozolomide,” experienced a citation burst in recent years. The timeline views demonstrated that the references can be categorized into 8 clusters: lower-grade glioma, lung cancer histology, lung adenocarcinoma, breast cancer, radiation-induced lung injury, epidermal growth factor receptor mutation, late radiotherapy toxicity, and artificial intelligence. Conclusions: The field of radiogenomics is attracting increasing attention from researchers worldwide, with the United States and the Netherlands being the most influential countries. Exploration of artificial intelligence methods based on big data to predict the response of tumors to various treatment methods represents a hot spot research topic in this field at present. %M 38980713 %R 10.2196/51347 %U https://www.i-jmr.org/2024/1/e51347 %U https://doi.org/10.2196/51347 %U http://www.ncbi.nlm.nih.gov/pubmed/38980713 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e55013 %T Nonrepresentativeness of Human Mobility Data and its Impact on Modeling Dynamics of the COVID-19 Pandemic: Systematic Evaluation %A Liu,Chuchu %A Holme,Petter %A Lehmann,Sune %A Yang,Wenchuan %A Lu,Xin %+ College of Systems Engineering, National University of Defense Technology, No 137 Yanwachi Street, Changsha, 410073, China, 86 18627561577, xin.lu.lab@outlook.com %K human mobility %K data representativeness %K population composition %K COVID-19 %K epidemiological modeling %D 2024 %7 28.6.2024 %9 Original Paper %J JMIR Form Res %G English %X Background: In recent years, a range of novel smartphone-derived data streams about human mobility have become available on a near–real-time basis. These data have been used, for example, to perform traffic forecasting and epidemic modeling. During the COVID-19 pandemic in particular, human travel behavior has been considered a key component of epidemiological modeling to provide more reliable estimates about the volumes of the pandemic’s importation and transmission routes, or to identify hot spots. However, nearly universally in the literature, the representativeness of these data, how they relate to the underlying real-world human mobility, has been overlooked. This disconnect between data and reality is especially relevant in the case of socially disadvantaged minorities. Objective: The objective of this study is to illustrate the nonrepresentativeness of data on human mobility and the impact of this nonrepresentativeness on modeling dynamics of the epidemic. This study systematically evaluates how real-world travel flows differ from census-based estimations, especially in the case of socially disadvantaged minorities, such as older adults and women, and further measures biases introduced by this difference in epidemiological studies. Methods: To understand the demographic composition of population movements, a nationwide mobility data set from 318 million mobile phone users in China from January 1 to February 29, 2020, was curated. Specifically, we quantified the disparity in the population composition between actual migrations and resident composition according to census data, and shows how this nonrepresentativeness impacts epidemiological modeling by constructing an age-structured SEIR (Susceptible-Exposed-Infected- Recovered) model of COVID-19 transmission. Results: We found a significant difference in the demographic composition between those who travel and the overall population. In the population flows, 59% (n=20,067,526) of travelers are young and 36% (n=12,210,565) of them are middle-aged (P<.001), which is completely different from the overall adult population composition of China (where 36% of individuals are young and 40% of them are middle-aged). This difference would introduce a striking bias in epidemiological studies: the estimation of maximum daily infections differs nearly 3 times, and the peak time has a large gap of 46 days. Conclusions: The difference between actual migrations and resident composition strongly impacts outcomes of epidemiological forecasts, which typically assume that flows represent underlying demographics. Our findings imply that it is necessary to measure and quantify the inherent biases related to nonrepresentativeness for accurate epidemiological surveillance and forecasting. %M 38941609 %R 10.2196/55013 %U https://formative.jmir.org/2024/1/e55013 %U https://doi.org/10.2196/55013 %U http://www.ncbi.nlm.nih.gov/pubmed/38941609 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 12 %N %P e50437 %T Considerations for Quality Control Monitoring of Machine Learning Models in Clinical Practice %A Faust,Louis %A Wilson,Patrick %A Asai,Shusaku %A Fu,Sunyang %A Liu,Hongfang %A Ruan,Xiaoyang %A Storlie,Curt %+ Robert D and Patricia E Kern Center for the Science of Health Care Delivery, Mayo Clinic, Mayo Clinic, 200 First St. SW, Rochester, MN, 55905, United States, 1 (507) 284 2511, Faust.Louis@mayo.edu %K artificial intelligence %K machine learning %K implementation science %K quality control %K monitoring %K patient safety %D 2024 %7 28.6.2024 %9 Viewpoint %J JMIR Med Inform %G English %X Integrating machine learning (ML) models into clinical practice presents a challenge of maintaining their efficacy over time. While existing literature offers valuable strategies for detecting declining model performance, there is a need to document the broader challenges and solutions associated with the real-world development and integration of model monitoring solutions. This work details the development and use of a platform for monitoring the performance of a production-level ML model operating in Mayo Clinic. In this paper, we aimed to provide a series of considerations and guidelines necessary for integrating such a platform into a team’s technical infrastructure and workflow. We have documented our experiences with this integration process, discussed the broader challenges encountered with real-world implementation and maintenance, and included the source code for the platform. Our monitoring platform was built as an R shiny application, developed and implemented over the course of 6 months. The platform has been used and maintained for 2 years and is still in use as of July 2023. The considerations necessary for the implementation of the monitoring platform center around 4 pillars: feasibility (what resources can be used for platform development?); design (through what statistics or models will the model be monitored, and how will these results be efficiently displayed to the end user?); implementation (how will this platform be built, and where will it exist within the IT ecosystem?); and policy (based on monitoring feedback, when and what actions will be taken to fix problems, and how will these problems be translated to clinical staff?). While much of the literature surrounding ML performance monitoring emphasizes methodological approaches for capturing changes in performance, there remains a battery of other challenges and considerations that must be addressed for successful real-world implementation. %M 38941140 %R 10.2196/50437 %U https://medinform.jmir.org/2024/1/e50437 %U https://doi.org/10.2196/50437 %U http://www.ncbi.nlm.nih.gov/pubmed/38941140 %0 Journal Article %@ 2563-6316 %I %V 5 %N %P e56759 %T Dental Tissue Density in Healthy Children Based on Radiological Data: Retrospective Analysis %A Reshetnikov,Aleksey %A Shaikhattarova,Natalia %A Mazurok,Margarita %A Kasatkina,Nadezhda %K density %K teeth %K tooth %K dental %K dentist %K dentists %K dentistry %K oral %K tissue %K enamel %K dentin %K Hounsfield %K pathology %K pathological %K radiology %K radiological %K image %K images %K imaging %K teeth density %K Hounsfield unit %K diagnostic imaging %D 2024 %7 20.6.2024 %9 %J JMIRx Med %G English %X Background: Information about the range of Hounsfield values for healthy teeth tissues could become an additional tool in assessing dental health and could be used, among other data, for subsequent machine learning. Objective: The purpose of our study was to determine dental tissue densities in Hounsfield units (HU). Methods: The total sample included 36 healthy children (n=21, 58% girls and n=15, 42% boys) aged 10-11 years at the time of the study. The densities of 320 teeth tissues were analyzed. Data were expressed as means and SDs. The significance was determined using the Student (1-tailed) t test. The statistical significance was set at P<.05. Results: The densities of 320 teeth tissues were analyzed: 72 (22.5%) first permanent molars, 72 (22.5%) permanent central incisors, 27 (8.4%) second primary molars, 40 (12.5%) tooth germs of second premolars, 37 (11.6%) second premolars, 9 (2.8%) second permanent molars, and 63 (19.7%) tooth germs of second permanent molars. The analysis of the data showed that tissues of healthy teeth in children have different density ranges: enamel, from mean 2954.69 (SD 223.77) HU to mean 2071.00 (SD 222.86) HU; dentin, from mean 1899.23 (SD 145.94) HU to mean 1323.10 (SD 201.67) HU; and pulp, from mean 420.29 (SD 196.47) HU to mean 183.63 (SD 97.59) HU. The tissues (enamel and dentin) of permanent central incisors in the mandible and maxilla had the highest mean densities. No gender differences concerning the density of dental tissues were reliably identified. Conclusions: The evaluation of Hounsfield values for dental tissues can be used as an objective method for assessing their densities. If the determined densities of the enamel, dentin, and pulp of the tooth do not correspond to the range of values for healthy tooth tissues, then it may indicate a pathology. %R 10.2196/56759 %U https://xmed.jmir.org/2024/1/e56759 %U https://doi.org/10.2196/56759 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 12 %N %P e50209 %T Retrieval-Based Diagnostic Decision Support: Mixed Methods Study %A Abdullahi,Tassallah %A Mercurio,Laura %A Singh,Ritambhara %A Eickhoff,Carsten %+ School of Medicine, University of Tübingen, Schaffhausenstr, 77, Tübingen, 72072, Germany, 49 7071 29 843, carsten.eickhoff@uni-tuebingen.de %K clinical decision support %K rare diseases %K ensemble learning %K retrieval-augmented learning %K machine learning %K electronic health records %K natural language processing %K retrieval augmented generation %K RAG %K electronic health record %K EHR %K data sparsity %K information retrieval %D 2024 %7 19.6.2024 %9 Original Paper %J JMIR Med Inform %G English %X Background: Diagnostic errors pose significant health risks and contribute to patient mortality. With the growing accessibility of electronic health records, machine learning models offer a promising avenue for enhancing diagnosis quality. Current research has primarily focused on a limited set of diseases with ample training data, neglecting diagnostic scenarios with limited data availability. Objective: This study aims to develop an information retrieval (IR)–based framework that accommodates data sparsity to facilitate broader diagnostic decision support. Methods: We introduced an IR-based diagnostic decision support framework called CliniqIR. It uses clinical text records, the Unified Medical Language System Metathesaurus, and 33 million PubMed abstracts to classify a broad spectrum of diagnoses independent of training data availability. CliniqIR is designed to be compatible with any IR framework. Therefore, we implemented it using both dense and sparse retrieval approaches. We compared CliniqIR’s performance to that of pretrained clinical transformer models such as Clinical Bidirectional Encoder Representations from Transformers (ClinicalBERT) in supervised and zero-shot settings. Subsequently, we combined the strength of supervised fine-tuned ClinicalBERT and CliniqIR to build an ensemble framework that delivers state-of-the-art diagnostic predictions. Results: On a complex diagnosis data set (DC3) without any training data, CliniqIR models returned the correct diagnosis within their top 3 predictions. On the Medical Information Mart for Intensive Care III data set, CliniqIR models surpassed ClinicalBERT in predicting diagnoses with <5 training samples by an average difference in mean reciprocal rank of 0.10. In a zero-shot setting where models received no disease-specific training, CliniqIR still outperformed the pretrained transformer models with a greater mean reciprocal rank of at least 0.10. Furthermore, in most conditions, our ensemble framework surpassed the performance of its individual components, demonstrating its enhanced ability to make precise diagnostic predictions. Conclusions: Our experiments highlight the importance of IR in leveraging unstructured knowledge resources to identify infrequently encountered diagnoses. In addition, our ensemble framework benefits from combining the complementary strengths of the supervised and retrieval-based models to diagnose a broad spectrum of diseases. %M 38896468 %R 10.2196/50209 %U https://medinform.jmir.org/2024/1/e50209 %U https://doi.org/10.2196/50209 %U http://www.ncbi.nlm.nih.gov/pubmed/38896468 %0 Journal Article %@ 2291-9694 %I %V 12 %N %P e55118 %T Comparison of Synthetic Data Generation Techniques for Control Group Survival Data in Oncology Clinical Trials: Simulation Study %A Akiya,Ippei %A Ishihara,Takuma %A Yamamoto,Keiichi %K oncology clinical trial %K survival analysis %K synthetic patient data %K machine learning %K SPD %K simulation %D 2024 %7 18.6.2024 %9 %J JMIR Med Inform %G English %X Background: Synthetic patient data (SPD) generation for survival analysis in oncology trials holds significant potential for accelerating clinical development. Various machine learning methods, including classification and regression trees (CART), random forest (RF), Bayesian network (BN), and conditional tabular generative adversarial network (CTGAN), have been used for this purpose, but their performance in reflecting actual patient survival data remains under investigation. Objective: The aim of this study was to determine the most suitable SPD generation method for oncology trials, specifically focusing on both progression-free survival (PFS) and overall survival (OS), which are the primary evaluation end points in oncology trials. To achieve this goal, we conducted a comparative simulation of 4 generation methods, including CART, RF, BN, and the CTGAN, and the performance of each method was evaluated. Methods: Using multiple clinical trial data sets, 1000 data sets were generated by using each method for each clinical trial data set and evaluated as follows: (1) median survival time (MST) of PFS and OS; (2) hazard ratio distance (HRD), which indicates the similarity between the actual survival function and a synthetic survival function; and (3) visual analysis of Kaplan-Meier (KM) plots. Each method’s ability to mimic the statistical properties of real patient data was evaluated from these multiple angles. Results: In most simulation cases, CART demonstrated the high percentages of MSTs for synthetic data falling within the 95% CI range of the MST of the actual data. These percentages ranged from 88.8% to 98.0% for PFS and from 60.8% to 96.1% for OS. In the evaluation of HRD, CART revealed that HRD values were concentrated at approximately 0.9. Conversely, for the other methods, no consistent trend was observed for either PFS or OS. CART demonstrated better similarity than RF, in that CART caused overfitting and RF (a kind of ensemble learning approach) prevented it. In SPD generation, the statistical properties close to the actual data should be the focus, not a well-generalized prediction model. Both the BN and CTGAN methods cannot accurately reflect the statistical properties of the actual data because small data sets are not suitable. Conclusions: As a method for generating SPD for survival data from small data sets, such as clinical trial data, CART demonstrated to be the most effective method compared to RF, BN, and CTGAN. Additionally, it is possible to improve CART-based generation methods by incorporating feature engineering and other methods in future work. %R 10.2196/55118 %U https://medinform.jmir.org/2024/1/e55118 %U https://doi.org/10.2196/55118 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e50182 %T Developing a Chatbot to Support Individuals With Neurodevelopmental Disorders: Tutorial %A Singla,Ashwani %A Khanna,Ritvik %A Kaur,Manpreet %A Kelm,Karen %A Zaiane,Osmar %A Rosenfelt,Cory Scott %A Bui,Truong An %A Rezaei,Navid %A Nicholas,David %A Reformat,Marek Z %A Majnemer,Annette %A Ogourtsova,Tatiana %A Bolduc,Francois %+ Department of Pediatrics, University of Alberta, 11315 87th Avenue, Edmonton, AB, T6G 2E1, Canada, 1 780 492 9713, fbolduc@ualberta.ca %K chatbot %K user interface %K knowledge graph %K neurodevelopmental disability %K autism %K intellectual disability %K attention-deficit/hyperactivity disorder %D 2024 %7 18.6.2024 %9 Tutorial %J J Med Internet Res %G English %X Families of individuals with neurodevelopmental disabilities or differences (NDDs) often struggle to find reliable health information on the web. NDDs encompass various conditions affecting up to 14% of children in high-income countries, and most individuals present with complex phenotypes and related conditions. It is challenging for their families to develop literacy solely by searching information on the internet. While in-person coaching can enhance care, it is only available to a minority of those with NDDs. Chatbots, or computer programs that simulate conversation, have emerged in the commercial sector as useful tools for answering questions, but their use in health care remains limited. To address this challenge, the researchers developed a chatbot named CAMI (Coaching Assistant for Medical/Health Information) that can provide information about trusted resources covering core knowledge and services relevant to families of individuals with NDDs. The chatbot was developed, in collaboration with individuals with lived experience, to provide information about trusted resources covering core knowledge and services that may be of interest. The developers used the Django framework (Django Software Foundation) for the development and used a knowledge graph to depict the key entities in NDDs and their relationships to allow the chatbot to suggest web resources that may be related to the user queries. To identify NDD domain–specific entities from user input, a combination of standard sources (the Unified Medical Language System) and other entities were used which were identified by health professionals as well as collaborators. Although most entities were identified in the text, some were not captured in the system and therefore went undetected. Nonetheless, the chatbot was able to provide resources addressing most user queries related to NDDs. The researchers found that enriching the vocabulary with synonyms and lay language terms for specific subdomains enhanced entity detection. By using a data set of numerous individuals with NDDs, the researchers developed a knowledge graph that established meaningful connections between entities, allowing the chatbot to present related symptoms, diagnoses, and resources. To the researchers’ knowledge, CAMI is the first chatbot to provide resources related to NDDs. Our work highlighted the importance of engaging end users to supplement standard generic ontologies to named entities for language recognition. It also demonstrates that complex medical and health-related information can be integrated using knowledge graphs and leveraging existing large datasets. This has multiple implications: generalizability to other health domains as well as reducing the need for experts and optimizing their input while keeping health care professionals in the loop. The researchers' work also shows how health and computer science domains need to collaborate to achieve the granularity needed to make chatbots truly useful and impactful. %M 38888947 %R 10.2196/50182 %U https://www.jmir.org/2024/1/e50182 %U https://doi.org/10.2196/50182 %U http://www.ncbi.nlm.nih.gov/pubmed/38888947 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e47560 %T Diverse Mentoring Connections Across Institutional Boundaries in the Biomedical Sciences: Innovative Graph Database Analysis %A Syed,Toufeeq Ahmed %A Thompson,Erika L %A Latif,Zainab %A Johnson,Jay %A Javier,Damaris %A Stinson,Katie %A Saleh,Gabrielle %A Vishwanatha,Jamboor K %+ University of Texas Health Science Center at Houston, 7000 Fannin Street, Suite 600, Houston, TX, 77030, United States, 1 713 500 3591, toufeeq.a.syed@uth.tmc.edu %K online platform %K mentorship %K diversity %K network analysis %K graph database %K online communities %D 2024 %7 17.6.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: With an overarching goal of increasing diversity and inclusion in biomedical sciences, the National Research Mentoring Network (NRMN) developed a web-based national mentoring platform (MyNRMN) that seeks to connect mentors and mentees to support the persistence of underrepresented minorities in the biomedical sciences. As of May 15, 2024, the MyNRMN platform, which provides mentoring, networking, and professional development tools, has facilitated more than 12,100 unique mentoring connections between faculty, students, and researchers in the biomedical domain. Objective: This study aimed to examine the large-scale mentoring connections facilitated by our web-based platform between students (mentees) and faculty (mentors) across institutional and geographic boundaries. Using an innovative graph database, we analyzed diverse mentoring connections between mentors and mentees across demographic characteristics in the biomedical sciences. Methods: Through the MyNRMN platform, we observed profile data and analyzed mentoring connections made between students and faculty across institutional boundaries by race, ethnicity, gender, institution type, and educational attainment between July 1, 2016, and May 31, 2021. Results: In total, there were 15,024 connections with 2222 mentees and 1652 mentors across 1625 institutions contributing data. Female mentees participated in the highest number of connections (3996/6108, 65%), whereas female mentors participated in 58% (5206/8916) of the connections. Black mentees made up 38% (2297/6108) of the connections, whereas White mentors participated in 56% (5036/8916) of the connections. Mentees were predominately from institutions classified as Research 1 (R1; doctoral universities—very high research activity) and historically Black colleges and universities (556/2222, 25% and 307/2222, 14%, respectively), whereas 31% (504/1652) of mentors were from R1 institutions. Conclusions: To date, the utility of mentoring connections across institutions throughout the United States and how mentors and mentees are connected is unknown. This study examined these connections and the diversity of these connections using an extensive web-based mentoring network. %M 38885013 %R 10.2196/47560 %U https://www.jmir.org/2024/1/e47560 %U https://doi.org/10.2196/47560 %U http://www.ncbi.nlm.nih.gov/pubmed/38885013 %0 Journal Article %@ 2369-2960 %I JMIR Publications %V 10 %N %P e57209 %T Pulmonary Tuberculosis Notification Rate Within Shenzhen, China, 2010-2019: Spatial-Temporal Analysis %A Lai,Peixuan %A Cai,Weicong %A Qu,Lin %A Hong,Chuangyue %A Lin,Kaihao %A Tan,Weiguo %A Zhao,Zhiguang %+ Shenzhen Center for Chronic Disease Control, No. 2021 Buxin Road, Shenzhen, 518020, China, 86 0755 2561 8781, 1498384005@qq.com %K tuberculosis %K spatial analysis %K spatial-temporal cluster %K Shenzhen %K China %D 2024 %7 14.6.2024 %9 Original Paper %J JMIR Public Health Surveill %G English %X Background: Pulmonary tuberculosis (PTB) is a chronic communicable disease of major public health and social concern. Although spatial-temporal analysis has been widely used to describe distribution characteristics and transmission patterns, few studies have revealed the changes in the small-scale clustering of PTB at the street level. Objective: The aim of this study was to analyze the temporal and spatial distribution characteristics and clusters of PTB at the street level in the Shenzhen municipality of China to provide a reference for PTB prevention and control. Methods: Data of reported PTB cases in Shenzhen from January 2010 to December 2019 were extracted from the China Information System for Disease Control and Prevention to describe the epidemiological characteristics. Time-series, spatial-autocorrelation, and spatial-temporal scanning analyses were performed to identify the spatial and temporal patterns and high-risk areas at the street level. Results: A total of 58,122 PTB cases from 2010 to 2019 were notified in Shenzhen. The annual notification rate of PTB decreased significantly from 64.97 per 100,000 population in 2010 to 43.43 per 100,000 population in 2019. PTB cases exhibited seasonal variations with peaks in late spring and summer each year. The PTB notification rate was nonrandomly distributed and spatially clustered with a Moran I value of 0.134 (P=.02). One most-likely cluster and 10 secondary clusters were detected, and the most-likely clustering area was centered at Nanshan Street of Nanshan District covering 6 streets, with the clustering time spanning from January 2010 to November 2012. Conclusions: This study identified seasonal patterns and spatial-temporal clusters of PTB cases at the street level in the Shenzhen municipality of China. Resources should be prioritized to the identified high-risk areas for PTB prevention and control. %M 38875687 %R 10.2196/57209 %U https://publichealth.jmir.org/2024/1/e57209 %U https://doi.org/10.2196/57209 %U http://www.ncbi.nlm.nih.gov/pubmed/38875687 %0 Journal Article %@ 2563-3570 %I JMIR Publications %V 5 %N %P e55632 %T It Is in Our DNA: Bringing Electronic Health Records and Genomic Data Together for Precision Medicine %A Robertson,Alan J %A Mallett,Andrew J %A Stark,Zornitza %A Sullivan,Clair %+ Queensland Digital Health Centre, University of Queensland, Health Sciences Building, Herston Campus, Royal Brisbane and Women's Hospital, Brisbane, 4029, Australia, 61 733465343, c.sullivan1@uq.edu.au %K genomics %K digital health %K genetics %K precision medicine %K genomic %K genomic data %K electronic health records %K DNA %K supports %K decision-making %K timeliness %K diagnosis %K risk reduction %K electronic medical records %D 2024 %7 13.6.2024 %9 Viewpoint %J JMIR Bioinform Biotech %G English %X Health care is at a turning point. We are shifting from protocolized medicine to precision medicine, and digital health systems are facilitating this shift. By providing clinicians with detailed information for each patient and analytic support for decision-making at the point of care, digital health technologies are enabling a new era of precision medicine. Genomic data also provide clinicians with information that can improve the accuracy and timeliness of diagnosis, optimize prescribing, and target risk reduction strategies, all of which are key elements for precision medicine. However, genomic data are predominantly seen as diagnostic information and are not routinely integrated into the clinical workflows of electronic medical records. The use of genomic data holds significant potential for precision medicine; however, as genomic data are fundamentally different from the information collected during routine practice, special considerations are needed to use this information in a digital health setting. This paper outlines the potential of genomic data integration with electronic records, and how these data can enable precision medicine. %M 38935958 %R 10.2196/55632 %U https://bioinform.jmir.org/2024/1/e55632 %U https://doi.org/10.2196/55632 %U http://www.ncbi.nlm.nih.gov/pubmed/38935958 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e56686 %T Integrated Real-World Data Warehouses Across 7 Evolving Asian Health Care Systems: Scoping Review %A Shau,Wen-Yi %A Santoso,Handoko %A Jip,Vincent %A Setia,Sajita %+ Transform Medical Communications Limited, 184 Glasgow Street, Wanganui, 4500, New Zealand, 64 276175433, sajita.setia@transform-medcomms.com %K Asia %K health care databases %K cross-country comparison %K electronic health records %K electronic medical records %K data warehousing %K information storage and retrieval %K real-world data %K real-world evidence %K registries %K scoping review %D 2024 %7 11.6.2024 %9 Review %J J Med Internet Res %G English %X Background: Asia consists of diverse nations with extremely variable health care systems. Integrated real-world data (RWD) research warehouses provide vast interconnected data sets that uphold statistical rigor. Yet, their intricate details remain underexplored, restricting their broader applications. Objective: Building on our previous research that analyzed integrated RWD warehouses in India, Thailand, and Taiwan, this study extends the research to 7 distinct health care systems: Hong Kong, Indonesia, Malaysia, Pakistan, the Philippines, Singapore, and Vietnam. We aimed to map the evolving landscape of RWD, preferences for methodologies, and database use and archetype the health systems based on existing intrinsic capability for RWD generation. Methods: A systematic scoping review methodology was used, centering on contemporary English literature on PubMed (search date: May 9, 2023). Rigorous screening as defined by eligibility criteria identified RWD studies from multiple health care facilities in at least 1 of the 7 target Asian nations. Point estimates and their associated errors were determined for the data collected from eligible studies. Results: Of the 1483 real-world evidence citations identified on May 9, 2023, a total of 369 (24.9%) fulfilled the requirements for data extraction and subsequent analysis. Singapore, Hong Kong, and Malaysia contributed to ≥100 publications, with each country marked by a higher proportion of single-country studies at 51% (80/157), 66.2% (86/130), and 50% (50/100), respectively, and were classified as solo scholars. Indonesia, Pakistan, Vietnam, and the Philippines had fewer publications and a higher proportion of cross-country collaboration studies (CCCSs) at 79% (26/33), 58% (18/31), 74% (20/27), and 86% (19/22), respectively, and were classified as global collaborators. Collaboration with countries outside the 7 target nations appeared in 84.2% to 97.7% of the CCCSs of each nation. Among target nations, Singapore and Malaysia emerged as preferred research partners for other nations. From 2018 to 2023, most nations showed an increasing trend in study numbers, with Vietnam (24.5%) and Pakistan (21.2%) leading the growth; the only exception was the Philippines, which declined by –14.5%. Clinical registry databases were predominant across all CCCSs from every target nation. For single-country studies, Indonesia, Malaysia, and the Philippines favored clinical registries; Singapore had a balanced use of clinical registries and electronic medical or health records, whereas Hong Kong, Pakistan, and Vietnam leaned toward electronic medical or health records. Overall, 89.9% (310/345) of the studies took >2 years from completion to publication. Conclusions: The observed variations in contemporary RWD publications across the 7 nations in Asia exemplify distinct research landscapes across nations that are partially explained by their diverse economic, clinical, and research settings. Nevertheless, recognizing these variations is pivotal for fostering tailored, synergistic strategies that amplify RWD’s potential in guiding future health care research and policy decisions. International Registered Report Identifier (IRRID): RR2-10.2196/43741 %M 38749399 %R 10.2196/56686 %U https://www.jmir.org/2024/1/e56686 %U https://doi.org/10.2196/56686 %U http://www.ncbi.nlm.nih.gov/pubmed/38749399 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e50049 %T Creation of Standardized Common Data Elements for Diagnostic Tests in Infectious Disease Studies: Semantic and Syntactic Mapping %A Stellmach,Caroline %A Hopff,Sina Marie %A Jaenisch,Thomas %A Nunes de Miranda,Susana Marina %A Rinaldi,Eugenia %A , %+ Berlin Institute of Health, Charité - Universitätsmedizin Berlin, Anna-Louisa-Karsch-Str 2, Berlin, 10178, Germany, 49 15752614677, caroline.stellmach@charite.de %K core data element %K CDE %K case report form %K CRF %K interoperability %K semantic standards %K infectious disease %K diagnostic test %K covid19 %K COVID-19 %K mpox %K ZIKV %K patient data %K data model %K syntactic interoperability %K clinical data %K FHIR %K SNOMED CT %K LOINC %K virus infection %K common element %D 2024 %7 10.6.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: It is necessary to harmonize and standardize data variables used in case report forms (CRFs) of clinical studies to facilitate the merging and sharing of the collected patient data across several clinical studies. This is particularly true for clinical studies that focus on infectious diseases. Public health may be highly dependent on the findings of such studies. Hence, there is an elevated urgency to generate meaningful, reliable insights, ideally based on a high sample number and quality data. The implementation of core data elements and the incorporation of interoperability standards can facilitate the creation of harmonized clinical data sets. Objective: This study’s objective was to compare, harmonize, and standardize variables focused on diagnostic tests used as part of CRFs in 6 international clinical studies of infectious diseases in order to, ultimately, then make available the panstudy common data elements (CDEs) for ongoing and future studies to foster interoperability and comparability of collected data across trials. Methods: We reviewed and compared the metadata that comprised the CRFs used for data collection in and across all 6 infectious disease studies under consideration in order to identify CDEs. We examined the availability of international semantic standard codes within the Systemized Nomenclature of Medicine - Clinical Terms, the National Cancer Institute Thesaurus, and the Logical Observation Identifiers Names and Codes system for the unambiguous representation of diagnostic testing information that makes up the CDEs. We then proposed 2 data models that incorporate semantic and syntactic standards for the identified CDEs. Results: Of 216 variables that were considered in the scope of the analysis, we identified 11 CDEs to describe diagnostic tests (in particular, serology and sequencing) for infectious diseases: viral lineage/clade; test date, type, performer, and manufacturer; target gene; quantitative and qualitative results; and specimen identifier, type, and collection date. Conclusions: The identification of CDEs for infectious diseases is the first step in facilitating the exchange and possible merging of a subset of data across clinical studies (and with that, large research projects) for possible shared analysis to increase the power of findings. The path to harmonization and standardization of clinical study data in the interest of interoperability can be paved in 2 ways. First, a map to standard terminologies ensures that each data element’s (variable’s) definition is unambiguous and that it has a single, unique interpretation across studies. Second, the exchange of these data is assisted by “wrapping” them in a standard exchange format, such as Fast Health care Interoperability Resources or the Clinical Data Interchange Standards Consortium’s Clinical Data Acquisition Standards Harmonization Model. %M 38857066 %R 10.2196/50049 %U https://www.jmir.org/2024/1/e50049 %U https://doi.org/10.2196/50049 %U http://www.ncbi.nlm.nih.gov/pubmed/38857066 %0 Journal Article %@ 2369-2960 %I JMIR Publications %V 10 %N %P e51323 %T Data-Driven Identification of Potentially Successful Intervention Implementations Using 5 Years of Opioid Prescribing Data: Retrospective Database Study %A Hopcroft,Lisa EM %A Curtis,Helen J %A Croker,Richard %A Pretis,Felix %A Inglesby,Peter %A Evans,David %A Bacon,Sebastian %A Goldacre,Ben %A Walker,Alex J %A MacKenna,Brian %+ Nuffield Department of Primary Care Health Sciences, University of Oxford, Radcliffe Primary Care Building, Observatory Quarter, Oxford, OX2 6GG, United Kingdom, 44 01865289313, alex.walker@phc.ox.ac.uk %K electronic health records %K primary care %K general practice %K opioid analgesics %K data science %K implementation science %K data-driven %K identification %K intervention %K implementations %K proof of concept %K opioid %K unbiased %K prescribing data %K analysis tool %D 2024 %7 5.6.2024 %9 Original Paper %J JMIR Public Health Surveill %G English %X Background: We have previously demonstrated that opioid prescribing increased by 127% between 1998 and 2016. New policies aimed at tackling this increasing trend have been recommended by public health bodies, and there is some evidence that progress is being made. Objective: We sought to extend our previous work and develop a data-driven approach to identify general practices and clinical commissioning groups (CCGs) whose prescribing data suggest that interventions to reduce the prescribing of opioids may have been successfully implemented. Methods: We analyzed 5 years of prescribing data (December 2014 to November 2019) for 3 opioid prescribing measures—total opioid prescribing as oral morphine equivalent per 1000 registered population, the number of high-dose opioids prescribed per 1000 registered population, and the number of high-dose opioids as a percentage of total opioids prescribed. Using a data-driven approach, we applied a modified version of our change detection Python library to identify reductions in these measures over time, which may be consistent with the successful implementation of an intervention to reduce opioid prescribing. This analysis was carried out for general practices and CCGs, and organizations were ranked according to the change in prescribing rate. Results: We identified a reduction in total opioid prescribing in 94 (49.2%) out of 191 CCGs, with a median reduction of 15.1 (IQR 11.8-18.7; range 9.0-32.8) in total oral morphine equivalence per 1000 patients. We present data for the 3 CCGs and practices demonstrating the biggest reduction in opioid prescribing for each of the 3 opioid prescribing measures. We observed a 40% proportional drop (8.9% absolute reduction) in the regular prescribing of high-dose opioids (measured as a percentage of regular opioids) in the highest-ranked CCG (North Tyneside); a 99% drop in this same measure was found in several practices (44%-95% absolute reduction). Decile plots demonstrate that CCGs exhibiting large reductions in opioid prescribing do so via slow and gradual reductions over a long period of time (typically over a period of 2 years); in contrast, practices exhibiting large reductions do so rapidly over a much shorter period of time. Conclusions: By applying 1 of our existing analysis tools to a national data set, we were able to identify rapid and maintained changes in opioid prescribing within practices and CCGs and rank organizations by the magnitude of reduction. Highly ranked organizations are candidates for further qualitative research into intervention design and implementation. %M 38838327 %R 10.2196/51323 %U https://publichealth.jmir.org/2024/1/e51323 %U https://doi.org/10.2196/51323 %U http://www.ncbi.nlm.nih.gov/pubmed/38838327 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e50976 %T Tracking and Profiling Repeated Users Over Time in Text-Based Counseling: Longitudinal Observational Study With Hierarchical Clustering %A Xu,Yucan %A Chan,Christian Shaunlyn %A Chan,Evangeline %A Chen,Junyou %A Cheung,Florence %A Xu,Zhongzhi %A Liu,Joyce %A Yip,Paul Siu Fai %+ Department of Social Work and Social Administration, The University of Hong Kong, Pokfulam, Hong Kong, China (Hong Kong), 852 91401568, sfpyip@hku.hk %K web-based counseling %K text-based counseling %K repeated users %K frequent users %K hierarchical clustering %K service effectiveness %K risk profiling %K psychological profiles %K psycholinguistic analysis %D 2024 %7 30.5.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Due to their accessibility and anonymity, web-based counseling services are expanding at an unprecedented rate. One of the most prominent challenges such services face is repeated users, who represent a small fraction of total users but consume significant resources by continually returning to the system and reiterating the same narrative and issues. A deeper understanding of repeated users and tailoring interventions may help improve service efficiency and effectiveness. Previous studies on repeated users were mainly on telephone counseling, and the classification of repeated users tended to be arbitrary and failed to capture the heterogeneity in this group of users. Objective: In this study, we aimed to develop a systematic method to profile repeated users and to understand what drives their use of the service. By doing so, we aimed to provide insight and practical implications that can inform the provision of service catering to different types of users and improve service effectiveness. Methods: We extracted session data from 29,400 users from a free 24/7 web-based counseling service from 2018 to 2021. To systematically investigate the heterogeneity of repeated users, hierarchical clustering was used to classify the users based on 3 indicators of service use behaviors, including the duration of their user journey, use frequency, and intensity. We then compared the psychological profile of the identified subgroups including their suicide risks and primary concerns to gain insights into the factors driving their patterns of service use. Results: Three clusters of repeated users with clear psychological profiles were detected: episodic, intermittent, and persistent-intensive users. Generally, compared with one-time users, repeated users showed higher suicide risks and more complicated backgrounds, including more severe presenting issues such as suicide or self-harm, bullying, and addictive behaviors. Higher frequency and intensity of service use were also associated with elevated suicide risk levels and a higher proportion of users citing mental disorders as their primary concerns. Conclusions: This study presents a systematic method of identifying and classifying repeated users in web-based counseling services. The proposed bottom-up clustering method identified 3 subgroups of repeated users with distinct service behaviors and psychological profiles. The findings can facilitate frontline personnel in delivering more efficient interventions and the proposed method can also be meaningful to a wider range of services in improving service provision, resource allocation, and service effectiveness. %M 38815258 %R 10.2196/50976 %U https://www.jmir.org/2024/1/e50976 %U https://doi.org/10.2196/50976 %U http://www.ncbi.nlm.nih.gov/pubmed/38815258 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e52655 %T Searching COVID-19 Clinical Research Using Graph Queries: Algorithm Development and Validation %A Invernici,Francesco %A Bernasconi,Anna %A Ceri,Stefano %+ Department of Electronics, Information, and Bioengineering, Politecnico di Milano, Via Ponzio 34/5, Milan, 20133, Italy, 39 23993494, anna.bernasconi@polimi.it %K big data corpus %K clinical research %K co-occurrence network %K COVID-19 Open Research Dataset %K CORD-19 %K graph search %K Named Entity Recognition %K Neo4j %K text mining %D 2024 %7 30.5.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Since the beginning of the COVID-19 pandemic, >1 million studies have been collected within the COVID-19 Open Research Dataset, a corpus of manuscripts created to accelerate research against the disease. Their related abstracts hold a wealth of information that remains largely unexplored and difficult to search due to its unstructured nature. Keyword-based search is the standard approach, which allows users to retrieve the documents of a corpus that contain (all or some of) the words in a target list. This type of search, however, does not provide visual support to the task and is not suited to expressing complex queries or compensating for missing specifications. Objective: This study aims to consider small graphs of concepts and exploit them for expressing graph searches over existing COVID-19–related literature, leveraging the increasing use of graphs to represent and query scientific knowledge and providing a user-friendly search and exploration experience. Methods: We considered the COVID-19 Open Research Dataset corpus and summarized its content by annotating the publications’ abstracts using terms selected from the Unified Medical Language System and the Ontology of Coronavirus Infectious Disease. Then, we built a co-occurrence network that includes all relevant concepts mentioned in the corpus, establishing connections when their mutual information is relevant. A sophisticated graph query engine was built to allow the identification of the best matches of graph queries on the network. It also supports partial matches and suggests potential query completions using shortest paths. Results: We built a large co-occurrence network, consisting of 128,249 entities and 47,198,965 relationships; the GRAPH-SEARCH interface allows users to explore the network by formulating or adapting graph queries; it produces a bibliography of publications, which are globally ranked; and each publication is further associated with the specific parts of the query that it explains, thereby allowing the user to understand each aspect of the matching. Conclusions: Our approach supports the process of query formulation and evidence search upon a large text corpus; it can be reapplied to any scientific domain where documents corpora and curated ontologies are made available. %M 38814687 %R 10.2196/52655 %U https://www.jmir.org/2024/1/e52655 %U https://doi.org/10.2196/52655 %U http://www.ncbi.nlm.nih.gov/pubmed/38814687 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e46160 %T Efficient Use of Biological Data in the Web 3.0 Era by Applying Nonfungible Token Technology %A Wang,Guanyi %A Chen,Chen %A Jiang,Ziyu %A Li,Gang %A Wu,Can %A Li,Sheng %+ Department of Urology, Cancer Precision Diagnosis and Treatment and Translational Medicine Hubei Engineering Research Center, Zhongnan Hospital, Wuhan University, 169 Donghu Road, Wuchang District, Wuhan, 430062, China, 86 18086601827, lisheng-znyy@whu.edu.cn %K NFTs %K biobanks %K blockchains %K health care %K medical big data %K sustainability %K blockchain platform %K platform %K tracing %K virtual %K biomedical data %K transformation %K development %K promoted %D 2024 %7 28.5.2024 %9 Viewpoint %J J Med Internet Res %G English %X CryptoKitties, a trendy game on Ethereum that is an open-source public blockchain platform with a smart contract function, brought nonfungible tokens (NFTs) into the public eye in 2017. NFTs are popular because of their nonfungible properties and their unique and irreplaceable nature in the real world. The embryonic form of NFTs can be traced back to a P2P network protocol improved based on Bitcoin in 2012 that can realize decentralized digital asset transactions. NFTs have recently gained much attention and have shown an unprecedented explosive growth trend. Herein, the concept of digital asset NFTs is introduced into the medical and health field to conduct a subversive discussion on biobank operations. By converting biomedical data into NFTs, the collection and circulation of samples can be accelerated, and the transformation of resources can be promoted. In conclusion, the biobank can achieve sustainable development through “decentralization.” %M 38805706 %R 10.2196/46160 %U https://www.jmir.org/2024/1/e46160 %U https://doi.org/10.2196/46160 %U http://www.ncbi.nlm.nih.gov/pubmed/38805706 %0 Journal Article %@ 2563-3570 %I JMIR Publications %V 5 %N %P e54332 %T Assessing Privacy Vulnerabilities in Genetic Data Sets: Scoping Review %A Thomas,Mara %A Mackes,Nuria %A Preuss-Dodhy,Asad %A Wieland,Thomas %A Bundschus,Markus %+ F. Hoffmann-La Roche AG, Grenzacherstrasse 124, Basel, 4070, Switzerland, 41 616881111, mara.thomas@roche.com %K genetic privacy %K privacy %K data anonymization %K reidentification %D 2024 %7 27.5.2024 %9 Review %J JMIR Bioinform Biotech %G English %X Background: Genetic data are widely considered inherently identifiable. However, genetic data sets come in many shapes and sizes, and the feasibility of privacy attacks depends on their specific content. Assessing the reidentification risk of genetic data is complex, yet there is a lack of guidelines or recommendations that support data processors in performing such an evaluation. Objective: This study aims to gain a comprehensive understanding of the privacy vulnerabilities of genetic data and create a summary that can guide data processors in assessing the privacy risk of genetic data sets. Methods: We conducted a 2-step search, in which we first identified 21 reviews published between 2017 and 2023 on the topic of genomic privacy and then analyzed all references cited in the reviews (n=1645) to identify 42 unique original research studies that demonstrate a privacy attack on genetic data. We then evaluated the type and components of genetic data exploited for these attacks as well as the effort and resources needed for their implementation and their probability of success. Results: From our literature review, we derived 9 nonmutually exclusive features of genetic data that are both inherent to any genetic data set and informative about privacy risk: biological modality, experimental assay, data format or level of processing, germline versus somatic variation content, content of single nucleotide polymorphisms, short tandem repeats, aggregated sample measures, structural variants, and rare single nucleotide variants. Conclusions: On the basis of our literature review, the evaluation of these 9 features covers the great majority of privacy-critical aspects of genetic data and thus provides a foundation and guidance for assessing genetic data risk. %M 38935957 %R 10.2196/54332 %U https://bioinform.jmir.org/2024/1/e54332 %U https://doi.org/10.2196/54332 %U http://www.ncbi.nlm.nih.gov/pubmed/38935957 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e53437 %T Travel Distance Between Participants in US Telemedicine Sessions With Estimates of Emissions Savings: Observational Study %A Cummins,Mollie R %A Shishupal,Sukrut %A Wong,Bob %A Wan,Neng %A Han,Jiuying %A Johnny,Jace D %A Mhatre-Owens,Amy %A Gouripeddi,Ramkiran %A Ivanova,Julia %A Ong,Triton %A Soni,Hiral %A Barrera,Janelle %A Wilczewski,Hattie %A Welch,Brandon M %A Bunnell,Brian E %+ College of Nursing, University of Utah, 10 South 2000 East, Salt Lake City, UT, 84112-5880, United States, 1 8015859740, mollie.cummins@utah.edu %K air pollution %K environmental health %K telemedicine %K greenhouse gases %K clinical research informatics %K informatics %K data science %K telehealth %K eHealth %K travel %K air quality %K pollutant %K pollution %K polluted %K environment %K environmental %K greenhouse gas %K emissions %K retrospective %K observational %K United States %K USA %K North America %K North American %K cost %K costs %K economic %K economics %K saving %K savings %K finance %K financial %K finances %K CO2 %K carbon dioxide %K carbon footprint %D 2024 %7 15.5.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Digital health and telemedicine are potentially important strategies to decrease health care’s environmental impact and contribution to climate change by reducing transportation-related air pollution and greenhouse gas emissions. However, we currently lack robust national estimates of emissions savings attributable to telemedicine. Objective: This study aimed to (1) determine the travel distance between participants in US telemedicine sessions and (2) estimate the net reduction in carbon dioxide (CO2) emissions attributable to telemedicine in the United States, based on national observational data describing the geographical characteristics of telemedicine session participants. Methods: We conducted a retrospective observational study of telemedicine sessions in the United States between January 1, 2022, and February 21, 2023, on the doxy.me platform. Using Google Distance Matrix, we determined the median travel distance between participating providers and patients for a proportional sample of sessions. Further, based on the best available public data, we estimated the total annual emissions costs and savings attributable to telemedicine in the United States. Results: The median round trip travel distance between patients and providers was 49 (IQR 21-145) miles. The median CO2 emissions savings per telemedicine session was 20 (IQR 8-59) kg CO2). Accounting for the energy costs of telemedicine and US transportation patterns, among other factors, we estimate that the use of telemedicine in the United States during the years 2021-2022 resulted in approximate annual CO2 emissions savings of 1,443,800 metric tons. Conclusions: These estimates of travel distance and telemedicine-associated CO2 emissions costs and savings, based on national data, indicate that telemedicine may be an important strategy in reducing the health care sector’s carbon footprint. %M 38536065 %R 10.2196/53437 %U https://www.jmir.org/2024/1/e53437 %U https://doi.org/10.2196/53437 %U http://www.ncbi.nlm.nih.gov/pubmed/38536065 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e49445 %T The Costs of Anonymization: Case Study Using Clinical Data %A Pilgram,Lisa %A Meurers,Thierry %A Malin,Bradley %A Schaeffner,Elke %A Eckardt,Kai-Uwe %A Prasser,Fabian %A , %+ Junior Digital Clinician Scientist Program, Biomedical Innovation Academy, Berlin Institute of Health at Charité—Universitätsmedizin Berlin, Charitéplatz 1, Berlin, 10117, Germany, 49 30 450543049, lisa.pilgram@charite.de %K data sharing %K anonymization %K deidentification %K privacy-utility trade-off %K privacy-enhancing technologies %K medical informatics %K privacy %K anonymized %K security %K identification %K confidentiality %K data science %D 2024 %7 24.4.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Sharing data from clinical studies can accelerate scientific progress, improve transparency, and increase the potential for innovation and collaboration. However, privacy concerns remain a barrier to data sharing. Certain concerns, such as reidentification risk, can be addressed through the application of anonymization algorithms, whereby data are altered so that it is no longer reasonably related to a person. Yet, such alterations have the potential to influence the data set’s statistical properties, such that the privacy-utility trade-off must be considered. This has been studied in theory, but evidence based on real-world individual-level clinical data is rare, and anonymization has not broadly been adopted in clinical practice. Objective: The goal of this study is to contribute to a better understanding of anonymization in the real world by comprehensively evaluating the privacy-utility trade-off of differently anonymized data using data and scientific results from the German Chronic Kidney Disease (GCKD) study. Methods: The GCKD data set extracted for this study consists of 5217 records and 70 variables. A 2-step procedure was followed to determine which variables constituted reidentification risks. To capture a large portion of the risk-utility space, we decided on risk thresholds ranging from 0.02 to 1. The data were then transformed via generalization and suppression, and the anonymization process was varied using a generic and a use case–specific configuration. To assess the utility of the anonymized GCKD data, general-purpose metrics (ie, data granularity and entropy), as well as use case–specific metrics (ie, reproducibility), were applied. Reproducibility was assessed by measuring the overlap of the 95% CI lengths between anonymized and original results. Results: Reproducibility measured by 95% CI overlap was higher than utility obtained from general-purpose metrics. For example, granularity varied between 68.2% and 87.6%, and entropy varied between 25.5% and 46.2%, whereas the average 95% CI overlap was above 90% for all risk thresholds applied. A nonoverlapping 95% CI was detected in 6 estimates across all analyses, but the overwhelming majority of estimates exhibited an overlap over 50%. The use case–specific configuration outperformed the generic one in terms of actual utility (ie, reproducibility) at the same level of privacy. Conclusions: Our results illustrate the challenges that anonymization faces when aiming to support multiple likely and possibly competing uses, while use case–specific anonymization can provide greater utility. This aspect should be taken into account when evaluating the associated costs of anonymized data and attempting to maintain sufficiently high levels of privacy for anonymized data. Trial Registration: German Clinical Trials Register DRKS00003971; https://drks.de/search/en/trial/DRKS00003971 International Registered Report Identifier (IRRID): RR2-10.1093/ndt/gfr456 %M 38657232 %R 10.2196/49445 %U https://www.jmir.org/2024/1/e49445 %U https://doi.org/10.2196/49445 %U http://www.ncbi.nlm.nih.gov/pubmed/38657232 %0 Journal Article %@ 2369-2960 %I JMIR Publications %V 10 %N %P e51880 %T Now Is the Time to Strengthen Government-Academic Data Infrastructures to Jump-Start Future Public Health Crisis Response %A Lee,Jian-Sin %A Tyler,Allison R B %A Veinot,Tiffany Christine %A Yakel,Elizabeth %+ School of Information, University of Michigan, 105 S State St, Ann Arbor, MI, 48109-1285, United States, 1 734 389 9552, jianslee@umich.edu %K COVID-19 %K crisis response %K cross-sector collaboration %K data infrastructures %K data science %K data sharing %K pandemic %K public health %D 2024 %7 24.4.2024 %9 Viewpoint %J JMIR Public Health Surveill %G English %X During public health crises, the significance of rapid data sharing cannot be overstated. In attempts to accelerate COVID-19 pandemic responses, discussions within society and scholarly research have focused on data sharing among health care providers, across government departments at different levels, and on an international scale. A lesser-addressed yet equally important approach to sharing data during the COVID-19 pandemic and other crises involves cross-sector collaboration between government entities and academic researchers. Specifically, this refers to dedicated projects in which a government entity shares public health data with an academic research team for data analysis to receive data insights to inform policy. In this viewpoint, we identify and outline documented data sharing challenges in the context of COVID-19 and other public health crises, as well as broader crisis scenarios encompassing natural disasters and humanitarian emergencies. We then argue that government-academic data collaborations have the potential to alleviate these challenges, which should place them at the forefront of future research attention. In particular, for researchers, data collaborations with government entities should be considered part of the social infrastructure that bolsters their research efforts toward public health crisis response. Looking ahead, we propose a shift from ad hoc, intermittent collaborations to cultivating robust and enduring partnerships. Thus, we need to move beyond viewing government-academic data interactions as 1-time sharing events. Additionally, given the scarcity of scholarly exploration in this domain, we advocate for further investigation into the real-world practices and experiences related to sharing data from government sources with researchers during public health crises. %M 38656780 %R 10.2196/51880 %U https://publichealth.jmir.org/2024/1/e51880 %U https://doi.org/10.2196/51880 %U http://www.ncbi.nlm.nih.gov/pubmed/38656780 %0 Journal Article %@ 2369-2960 %I JMIR Publications %V 10 %N %P e50958 %T Motivators and Demotivators for COVID-19 Vaccination Based on Co-Occurrence Networks of Verbal Reasons for Vaccination Acceptance and Resistance: Repetitive Cross-Sectional Surveys and Network Analysis %A Liao,Qiuyan %A Yuan,Jiehu %A Wong,Irene Oi Ling %A Ni,Michael Yuxuan %A Cowling,Benjamin John %A Lam,Wendy Wing Tak %+ School of Public Health, Li Ka Shing Faculty of Medicine, The University of Hong Kong, 2/F, Patrick Manson Building (North Wing), 7 Sassoon Road, Pokfulam, Hong Kong, China (Hong Kong), 852 39179289, qyliao11@hku.hk %K COVID-19 %K vaccination acceptance %K vaccine hesitancy %K motivators %K co-occurrence network analysis %D 2024 %7 22.4.2024 %9 Original Paper %J JMIR Public Health Surveill %G English %X Background: Vaccine hesitancy is complex and multifaced. People may accept or reject a vaccine due to multiple and interconnected reasons, with some reasons being more salient in influencing vaccine acceptance or resistance and hence the most important intervention targets for addressing vaccine hesitancy. Objective: This study was aimed at assessing the connections and relative importance of motivators and demotivators for COVID-19 vaccination in Hong Kong based on co-occurrence networks of verbal reasons for vaccination acceptance and resistance from repetitive cross-sectional surveys. Methods: We conducted a series of random digit dialing telephone surveys to examine COVID-19 vaccine hesitancy among general Hong Kong adults between March 2021 and July 2022. A total of 5559 and 982 participants provided verbal reasons for accepting and resisting (rejecting or hesitating) a COVID-19 vaccine, respectively. The verbal reasons were initially coded to generate categories of motivators and demotivators for COVID-19 vaccination using a bottom-up approach. Then, all the generated codes were mapped onto the 5C model of vaccine hesitancy. On the basis of the identified reasons, we conducted a co-occurrence network analysis to understand how motivating or demotivating reasons were comentioned to shape people’s vaccination decisions. Each reason’s eigenvector centrality was calculated to quantify their relative importance in the network. Analyses were also stratified by age group. Results: The co-occurrence network analysis found that the perception of personal risk to the disease (egicentrality=0.80) and the social responsibility to protect others (egicentrality=0.58) were the most important comentioned reasons that motivate COVID-19 vaccination, while lack of vaccine confidence (egicentrality=0.89) and complacency (perceived low disease risk and low importance of vaccination; egicentrality=0.45) were the most important comentioned reasons that demotivate COVID-19 vaccination. For older people aged ≥65 years, protecting others was a more important motivator (egicentrality=0.57), while the concern about poor health status was a more important demotivator (egicentrality=0.42); for young people aged 18 to 24 years, recovering life normalcy (egicentrality=0.20) and vaccine mandates (egicentrality=0.26) were the more important motivators, while complacency (egicentrality=0.77) was a more important demotivator for COVID-19 vaccination uptake. Conclusions: When disease risk is perceived to be high, promoting social responsibility to protect others is more important for boosting vaccination acceptance. However, when disease risk is perceived to be low and complacency exists, fostering confidence in vaccines to address vaccine hesitancy becomes more important. Interventions for promoting vaccination acceptance and reducing vaccine hesitancy should be tailored by age. %M 38648099 %R 10.2196/50958 %U https://publichealth.jmir.org/2024/1/e50958 %U https://doi.org/10.2196/50958 %U http://www.ncbi.nlm.nih.gov/pubmed/38648099 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e46777 %T The Alzheimer’s Knowledge Base: A Knowledge Graph for Alzheimer Disease Research %A Romano,Joseph D %A Truong,Van %A Kumar,Rachit %A Venkatesan,Mythreye %A Graham,Britney E %A Hao,Yun %A Matsumoto,Nick %A Li,Xi %A Wang,Zhiping %A Ritchie,Marylyn D %A Shen,Li %A Moore,Jason H %+ Institute for Biomedical Informatics, Perelman School of Medicine, University of Pennsylvania, 403 Blockley Hall, 423 Guardian Drive, Philadelphia, PA, 19104, United States, 1 2155735571, joseph.romano@pennmedicine.upenn.edu %K Alzheimer disease %K knowledge graph %K knowledge base %K artificial intelligence %K drug repurposing %K drug discovery %K open source %K Alzheimer %K etiology %K heterogeneous graph %K therapeutic targets %K machine learning %K therapeutic discovery %D 2024 %7 18.4.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: As global populations age and become susceptible to neurodegenerative illnesses, new therapies for Alzheimer disease (AD) are urgently needed. Existing data resources for drug discovery and repurposing fail to capture relationships central to the disease’s etiology and response to drugs. Objective: We designed the Alzheimer’s Knowledge Base (AlzKB) to alleviate this need by providing a comprehensive knowledge representation of AD etiology and candidate therapeutics. Methods: We designed the AlzKB as a large, heterogeneous graph knowledge base assembled using 22 diverse external data sources describing biological and pharmaceutical entities at different levels of organization (eg, chemicals, genes, anatomy, and diseases). AlzKB uses a Web Ontology Language 2 ontology to enforce semantic consistency and allow for ontological inference. We provide a public version of AlzKB and allow users to run and modify local versions of the knowledge base. Results: AlzKB is freely available on the web and currently contains 118,902 entities with 1,309,527 relationships between those entities. To demonstrate its value, we used graph data science and machine learning to (1) propose new therapeutic targets based on similarities of AD to Parkinson disease and (2) repurpose existing drugs that may treat AD. For each use case, AlzKB recovers known therapeutic associations while proposing biologically plausible new ones. Conclusions: AlzKB is a new, publicly available knowledge resource that enables researchers to discover complex translational associations for AD drug discovery. Through 2 use cases, we show that it is a valuable tool for proposing novel therapeutic hypotheses based on public biomedical knowledge. %M 38635981 %R 10.2196/46777 %U https://www.jmir.org/2024/1/e46777 %U https://doi.org/10.2196/46777 %U http://www.ncbi.nlm.nih.gov/pubmed/38635981 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e47125 %T Evaluating Algorithmic Bias in 30-Day Hospital Readmission Models: Retrospective Analysis %A Wang,H Echo %A Weiner,Jonathan P %A Saria,Suchi %A Kharrazi,Hadi %+ Bloomberg School of Public Health, Johns Hopkins University, 624 N Broadway, Hampton House, Baltimore, MD, United States, 1 443 287 8264, kharrazi@jhu.edu %K algorithmic bias %K model bias %K predictive models %K model fairness %K health disparity %K hospital readmission %K retrospective analysis %D 2024 %7 18.4.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: The adoption of predictive algorithms in health care comes with the potential for algorithmic bias, which could exacerbate existing disparities. Fairness metrics have been proposed to measure algorithmic bias, but their application to real-world tasks is limited. Objective: This study aims to evaluate the algorithmic bias associated with the application of common 30-day hospital readmission models and assess the usefulness and interpretability of selected fairness metrics. Methods: We used 10.6 million adult inpatient discharges from Maryland and Florida from 2016 to 2019 in this retrospective study. Models predicting 30-day hospital readmissions were evaluated: LACE Index, modified HOSPITAL score, and modified Centers for Medicare & Medicaid Services (CMS) readmission measure, which were applied as-is (using existing coefficients) and retrained (recalibrated with 50% of the data). Predictive performances and bias measures were evaluated for all, between Black and White populations, and between low- and other-income groups. Bias measures included the parity of false negative rate (FNR), false positive rate (FPR), 0-1 loss, and generalized entropy index. Racial bias represented by FNR and FPR differences was stratified to explore shifts in algorithmic bias in different populations. Results: The retrained CMS model demonstrated the best predictive performance (area under the curve: 0.74 in Maryland and 0.68-0.70 in Florida), and the modified HOSPITAL score demonstrated the best calibration (Brier score: 0.16-0.19 in Maryland and 0.19-0.21 in Florida). Calibration was better in White (compared to Black) populations and other-income (compared to low-income) groups, and the area under the curve was higher or similar in the Black (compared to White) populations. The retrained CMS and modified HOSPITAL score had the lowest racial and income bias in Maryland. In Florida, both of these models overall had the lowest income bias and the modified HOSPITAL score showed the lowest racial bias. In both states, the White and higher-income populations showed a higher FNR, while the Black and low-income populations resulted in a higher FPR and a higher 0-1 loss. When stratified by hospital and population composition, these models demonstrated heterogeneous algorithmic bias in different contexts and populations. Conclusions: Caution must be taken when interpreting fairness measures’ face value. A higher FNR or FPR could potentially reflect missed opportunities or wasted resources, but these measures could also reflect health care use patterns and gaps in care. Simply relying on the statistical notions of bias could obscure or underplay the causes of health disparity. The imperfect health data, analytic frameworks, and the underlying health systems must be carefully considered. Fairness measures can serve as a useful routine assessment to detect disparate model performances but are insufficient to inform mechanisms or policy changes. However, such an assessment is an important first step toward data-driven improvement to address existing health disparities. %M 38422347 %R 10.2196/47125 %U https://www.jmir.org/2024/1/e47125 %U https://doi.org/10.2196/47125 %U http://www.ncbi.nlm.nih.gov/pubmed/38422347 %0 Journal Article %@ 2291-9694 %I %V 12 %N %P e53075 %T Development of a Trusted Third Party at a Large University Hospital: Design and Implementation Study %A Wündisch,Eric %A Hufnagl,Peter %A Brunecker,Peter %A Meier zu Ummeln,Sophie %A Träger,Sarah %A Kopp,Marcus %A Prasser,Fabian %A Weber,Joachim %K pseudonymisation %K architecture %K scalability %K trusted third party %K application %K security %K consent %K identifying data %K infrastructure %K modular %K software %K implementation %K user interface %K health platform %K data management %K data privacy %K health record %K electronic health record %K EHR %K pseudonymization %D 2024 %7 17.4.2024 %9 %J JMIR Med Inform %G English %X Background: Pseudonymization has become a best practice to securely manage the identities of patients and study participants in medical research projects and data sharing initiatives. This method offers the advantage of not requiring the direct identification of data to support various research processes while still allowing for advanced processing activities, such as data linkage. Often, pseudonymization and related functionalities are bundled in specific technical and organization units known as trusted third parties (TTPs). However, pseudonymization can significantly increase the complexity of data management and research workflows, necessitating adequate tool support. Common tasks of TTPs include supporting the secure registration and pseudonymization of patient and sample identities as well as managing consent. Objective: Despite the challenges involved, little has been published about successful architectures and functional tools for implementing TTPs in large university hospitals. The aim of this paper is to fill this research gap by describing the software architecture and tool set developed and deployed as part of a TTP established at Charité – Universitätsmedizin Berlin. Methods: The infrastructure for the TTP was designed to provide a modular structure while keeping maintenance requirements low. Basic functionalities were realized with the free MOSAIC tools. However, supporting common study processes requires implementing workflows that span different basic services, such as patient registration, followed by pseudonym generation and concluded by consent collection. To achieve this, an integration layer was developed to provide a unified Representational state transfer (REST) application programming interface (API) as a basis for more complex workflows. Based on this API, a unified graphical user interface was also implemented, providing an integrated view of information objects and workflows supported by the TTP. The API was implemented using Java and Spring Boot, while the graphical user interface was implemented in PHP and Laravel. Both services use a shared Keycloak instance as a unified management system for roles and rights. Results: By the end of 2022, the TTP has already supported more than 10 research projects since its launch in December 2019. Within these projects, more than 3000 identities were stored, more than 30,000 pseudonyms were generated, and more than 1500 consent forms were submitted. In total, more than 150 people regularly work with the software platform. By implementing the integration layer and the unified user interface, together with comprehensive roles and rights management, the effort for operating the TTP could be significantly reduced, as personnel of the supported research projects can use many functionalities independently. Conclusions: With the architecture and components described, we created a user-friendly and compliant environment for supporting research projects. We believe that the insights into the design and implementation of our TTP can help other institutions to efficiently and effectively set up corresponding structures. %R 10.2196/53075 %U https://medinform.jmir.org/2024/1/e53075 %U https://doi.org/10.2196/53075 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e48330 %T Comparing Open-Access Database and Traditional Intensive Care Studies Using Machine Learning: Bibliometric Analysis Study %A Ke,Yuhe %A Yang,Rui %A Liu,Nan %+ Centre for Quantitative Medicine, Duke-NUS Medical School, National University of Singapore, 8 College Road, Singapore, 169857, Singapore, 65 66016503, liu.nan@duke-nus.edu.sg %K BERTopic %K critical care %K eICU %K machine learning %K MIMIC %K Medical Information Mart for Intensive Care %K natural language processing %D 2024 %7 17.4.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: Intensive care research has predominantly relied on conventional methods like randomized controlled trials. However, the increasing popularity of open-access, free databases in the past decade has opened new avenues for research, offering fresh insights. Leveraging machine learning (ML) techniques enables the analysis of trends in a vast number of studies. Objective: This study aims to conduct a comprehensive bibliometric analysis using ML to compare trends and research topics in traditional intensive care unit (ICU) studies and those done with open-access databases (OADs). Methods: We used ML for the analysis of publications in the Web of Science database in this study. Articles were categorized into “OAD” and “traditional intensive care” (TIC) studies. OAD studies were included in the Medical Information Mart for Intensive Care (MIMIC), eICU Collaborative Research Database (eICU-CRD), Amsterdam University Medical Centers Database (AmsterdamUMCdb), High Time Resolution ICU Dataset (HiRID), and Pediatric Intensive Care database. TIC studies included all other intensive care studies. Uniform manifold approximation and projection was used to visualize the corpus distribution. The BERTopic technique was used to generate 30 topic-unique identification numbers and to categorize topics into 22 topic families. Results: A total of 227,893 records were extracted. After exclusions, 145,426 articles were identified as TIC and 1301 articles as OAD studies. TIC studies experienced exponential growth over the last 2 decades, culminating in a peak of 16,378 articles in 2021, while OAD studies demonstrated a consistent upsurge since 2018. Sepsis, ventilation-related research, and pediatric intensive care were the most frequently discussed topics. TIC studies exhibited broader coverage than OAD studies, suggesting a more extensive research scope. Conclusions: This study analyzed ICU research, providing valuable insights from a large number of publications. OAD studies complement TIC studies, focusing on predictive modeling, while TIC studies capture essential qualitative information. Integrating both approaches in a complementary manner is the future direction for ICU research. Additionally, natural language processing techniques offer a transformative alternative for literature review and bibliometric analysis. %M 38630522 %R 10.2196/48330 %U https://www.jmir.org/2024/1/e48330 %U https://doi.org/10.2196/48330 %U http://www.ncbi.nlm.nih.gov/pubmed/38630522 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e50897 %T Experiences, Lessons, and Challenges With Adapting REDCap for COVID-19 Laboratory Data Management in a Resource-Limited Country: Descriptive Study %A Ndlovu,Kagiso %A Mauco,Kabelo Leonard %A Makhura,Onalenna %A Hu,Robin %A Motlogelwa,Nkwebi Peace %A Masizana,Audrey %A Lo,Emily %A Mphoyakgosi,Thongbotho %A Moyo,Sikhulile %+ Department of Computer Science, University of Botswana, Private Bag UB 0022, Gaborone, 00267, Botswana, 267 71786953, ndlovuk@ub.ac.bw %K REDCap %K DHIS2 %K COVID-19 %K National Health Laboratory %K eHealth %K interoperability %K data management %K Botswana %D 2024 %7 16.4.2024 %9 Original Paper %J JMIR Form Res %G English %X Background: The COVID-19 pandemic brought challenges requiring timely health data sharing to inform accurate decision-making at national levels. In Botswana, we adapted and integrated the Research Electronic Data Capture (REDCap) and the District Health Information System version 2 (DHIS2) platforms to support timely collection and reporting of COVID-19 cases. We focused on establishing an effective COVID-19 data flow at the national public health laboratory, being guided by the needs of health care professionals at the National Health Laboratory (NHL). This integration contributed to automated centralized reporting of COVID-19 results at the Ministry of Health (MOH). Objective: This paper reports the experiences, challenges, and lessons learned while designing, adapting, and implementing the REDCap and DHIS2 platforms to support COVID-19 data management at the NHL in Botswana. Methods: A participatory design approach was adopted to guide the design, customization, and implementation of the REDCap platform in support of COVID-19 data management at the NHL. Study participants included 29 NHL and 4 MOH personnel, and the study was conducted from March 2, 2020, to June 30, 2020. Participants’ requirements for an ideal COVID-19 data management system were established. NVivo 11 software supported thematic analysis of the challenges and resolutions identified during this study. These were categorized according to the 4 themes of infrastructure, capacity development, platform constraints, and interoperability. Results: Overall, REDCap supported the majority of perceived technical and nontechnical requirements for an ideal COVID-19 data management system at the NHL. Although some implementation challenges were identified, each had mitigation strategies such as procurement of mobile Internet routers, engagement of senior management to resolve conflicting policies, continuous REDCap training, and the development of a third-party web application to enhance REDCap’s capabilities. Lessons learned informed next steps and further refinement of the REDCap platform. Conclusions: Implementation of REDCap at the NHL to streamline COVID-19 data collection and integration with the DHIS2 platform was feasible despite the urgency of implementation during the pandemic. By implementing the REDCap platform at the NHL, we demonstrated the possibility of achieving a centralized reporting system of COVID-19 cases, hence enabling timely and informed decision-making at a national level. Challenges faced presented lessons learned to inform sustainable implementation of digital health innovations in Botswana and similar resource-limited countries. %M 38625736 %R 10.2196/50897 %U https://formative.jmir.org/2024/1/e50897 %U https://doi.org/10.2196/50897 %U http://www.ncbi.nlm.nih.gov/pubmed/38625736 %0 Journal Article %@ 1929-073X %I JMIR Publications %V 13 %N %P e54490 %T Application of AI in Sepsis: Citation Network Analysis and Evidence Synthesis %A Wu,MeiJung %A Islam,Md Mohaimenul %A Poly,Tahmina Nasrin %A Lin,Ming-Chin %+ Graduate Institute of Biomedical Informatics, College of Medical Science and Technology, Taipei Medical University, 250 Wuxing St, Xinyi District, Taipei, 110, Taiwan, 886 66202589, arbiter@tmu.edu.tw %K AI %K artificial intelligence %K bibliometric analysis %K bibliometric %K citation %K deep learning %K machine learning %K network analysis %K publication %K sepsis %K trend %K visualization %K VOSviewer %K Web of Science %K WoS %D 2024 %7 15.4.2024 %9 Original %J Interact J Med Res %G English %X Background: Artificial intelligence (AI) has garnered considerable attention in the context of sepsis research, particularly in personalized diagnosis and treatment. Conducting a bibliometric analysis of existing publications can offer a broad overview of the field and identify current research trends and future research directions. Objective: The objective of this study is to leverage bibliometric data to provide a comprehensive overview of the application of AI in sepsis. Methods: We conducted a search in the Web of Science Core Collection database to identify relevant articles published in English until August 31, 2023. A predefined search strategy was used, evaluating titles, abstracts, and full texts as needed. We used the Bibliometrix and VOSviewer tools to visualize networks showcasing the co-occurrence of authors, research institutions, countries, citations, and keywords. Results: A total of 259 relevant articles published between 2014 and 2023 (until August) were identified. Over the past decade, the annual publication count has consistently risen. Leading journals in this domain include Critical Care Medicine (17/259, 6.6%), Frontiers in Medicine (17/259, 6.6%), and Scientific Reports (11/259, 4.2%). The United States (103/259, 39.8%), China (83/259, 32%), United Kingdom (14/259, 5.4%), and Taiwan (12/259, 4.6%) emerged as the most prolific countries in terms of publications. Notable institutions in this field include the University of California System, Emory University, and Harvard University. The key researchers working in this area include Ritankar Das, Chris Barton, and Rishikesan Kamaleswaran. Although the initial period witnessed a relatively low number of articles focused on AI applications for sepsis, there has been a significant surge in research within this area in recent years (2014-2023). Conclusions: This comprehensive analysis provides valuable insights into AI-related research conducted in the field of sepsis, aiding health care policy makers and researchers in understanding the potential of AI and formulating effective research plans. Such analysis serves as a valuable resource for determining the advantages, sustainability, scope, and potential impact of AI models in sepsis. %M 38621231 %R 10.2196/54490 %U https://www.i-jmr.org/2024/1/e54490 %U https://doi.org/10.2196/54490 %U http://www.ncbi.nlm.nih.gov/pubmed/38621231 %0 Journal Article %@ 2368-7959 %I JMIR Publications %V 11 %N %P e55988 %T Assessing the Alignment of Large Language Models With Human Values for Mental Health Integration: Cross-Sectional Study Using Schwartz’s Theory of Basic Values %A Hadar-Shoval,Dorit %A Asraf,Kfir %A Mizrachi,Yonathan %A Haber,Yuval %A Elyoseph,Zohar %+ Department of Brain Sciences, Faculty of Medicine, Imperial College London, Fulham Palace Rd, London, W6 8RF, United Kingdom, 44 547836088, Zohar.j.a@gmail.com %K large language models %K LLMs %K large language model %K LLM %K machine learning %K ML %K natural language processing %K NLP %K deep learning %K ChatGPT %K Chat-GPT %K chatbot %K chatbots %K chat-bot %K chat-bots %K Claude %K values %K Bard %K artificial intelligence %K AI %K algorithm %K algorithms %K predictive model %K predictive models %K predictive analytics %K predictive system %K practical model %K practical models %K mental health %K mental illness %K mental illnesses %K mental disease %K mental diseases %K mental disorder %K mental disorders %K mobile health %K mHealth %K eHealth %K mood disorder %K mood disorders %D 2024 %7 9.4.2024 %9 Original Paper %J JMIR Ment Health %G English %X Background: Large language models (LLMs) hold potential for mental health applications. However, their opaque alignment processes may embed biases that shape problematic perspectives. Evaluating the values embedded within LLMs that guide their decision-making have ethical importance. Schwartz’s theory of basic values (STBV) provides a framework for quantifying cultural value orientations and has shown utility for examining values in mental health contexts, including cultural, diagnostic, and therapist-client dynamics. Objective: This study aimed to (1) evaluate whether the STBV can measure value-like constructs within leading LLMs and (2) determine whether LLMs exhibit distinct value-like patterns from humans and each other. Methods: In total, 4 LLMs (Bard, Claude 2, Generative Pretrained Transformer [GPT]-3.5, GPT-4) were anthropomorphized and instructed to complete the Portrait Values Questionnaire—Revised (PVQ-RR) to assess value-like constructs. Their responses over 10 trials were analyzed for reliability and validity. To benchmark the LLMs’ value profiles, their results were compared to published data from a diverse sample of 53,472 individuals across 49 nations who had completed the PVQ-RR. This allowed us to assess whether the LLMs diverged from established human value patterns across cultural groups. Value profiles were also compared between models via statistical tests. Results: The PVQ-RR showed good reliability and validity for quantifying value-like infrastructure within the LLMs. However, substantial divergence emerged between the LLMs’ value profiles and population data. The models lacked consensus and exhibited distinct motivational biases, reflecting opaque alignment processes. For example, all models prioritized universalism and self-direction, while de-emphasizing achievement, power, and security relative to humans. Successful discriminant analysis differentiated the 4 LLMs’ distinct value profiles. Further examination found the biased value profiles strongly predicted the LLMs’ responses when presented with mental health dilemmas requiring choosing between opposing values. This provided further validation for the models embedding distinct motivational value-like constructs that shape their decision-making. Conclusions: This study leveraged the STBV to map the motivational value-like infrastructure underpinning leading LLMs. Although the study demonstrated the STBV can effectively characterize value-like infrastructure within LLMs, substantial divergence from human values raises ethical concerns about aligning these models with mental health applications. The biases toward certain cultural value sets pose risks if integrated without proper safeguards. For example, prioritizing universalism could promote unconditional acceptance even when clinically unwise. Furthermore, the differences between the LLMs underscore the need to standardize alignment processes to capture true cultural diversity. Thus, any responsible integration of LLMs into mental health care must account for their embedded biases and motivation mismatches to ensure equitable delivery across diverse populations. Achieving this will require transparency and refinement of alignment techniques to instill comprehensive human values. %M 38593424 %R 10.2196/55988 %U https://mental.jmir.org/2024/1/e55988 %U https://doi.org/10.2196/55988 %U http://www.ncbi.nlm.nih.gov/pubmed/38593424 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e55779 %T Converge or Collide? Making Sense of a Plethora of Open Data Standards in Health Care %A Tsafnat,Guy %A Dunscombe,Rachel %A Gabriel,Davera %A Grieve,Grahame %A Reich,Christian %+ Evidentli Pty Ltd, 50 Holt St, Suite 516, Surry Hills, 2010, Australia, 61 415481043, guyt@evidentli.com %K interoperability %K clinical data %K open data standards %K health care %K digital health %K health care data %D 2024 %7 9.4.2024 %9 Editorial %J J Med Internet Res %G English %X Practitioners of digital health are familiar with disjointed data environments that often inhibit effective communication among different elements of the ecosystem. This fragmentation leads in turn to issues such as inconsistencies in services versus payments, wastage, and notably, care delivered being less than best-practice. Despite the long-standing recognition of interoperable data as a potential solution, efforts in achieving interoperability have been disjointed and inconsistent, resulting in numerous incompatible standards, despite the widespread agreement that fewer standards would enhance interoperability. This paper introduces a framework for understanding health care data needs, discussing the challenges and opportunities of open data standards in the field. It emphasizes the necessity of acknowledging diverse data standards, each catering to specific viewpoints and needs, while proposing a categorization of health care data into three domains, each with its distinct characteristics and challenges, along with outlining overarching design requirements applicable to all domains and specific requirements unique to each domain. %M 38593431 %R 10.2196/55779 %U https://www.jmir.org/2024/1/e55779 %U https://doi.org/10.2196/55779 %U http://www.ncbi.nlm.nih.gov/pubmed/38593431 %0 Journal Article %@ 2369-2960 %I JMIR Publications %V 10 %N %P e48963 %T Leveraging Routinely Collected Program Data to Inform Extrapolated Size Estimates for Key Populations in Namibia: Small Area Estimation Study %A Loeb,Talia %A Willis,Kalai %A Velishavo,Frans %A Lee,Daniel %A Rao,Amrita %A Baral,Stefan %A Rucinski,Katherine %+ Data for Implementation (Data.FI), Department of Epidemiology, Johns Hopkins Bloomberg School of Public Health, 615 North Wolfe Street, Baltimore, MD, 21205, United States, 1 410 955 3227, tloeb2@jh.edu %K female sex workers %K HIV %K key populations %K men who have sex with men %K Namibia %K population size estimation %K small area estimation %D 2024 %7 4.4.2024 %9 Original Paper %J JMIR Public Health Surveill %G English %X Background: Estimating the size of key populations, including female sex workers (FSW) and men who have sex with men (MSM), can inform planning and resource allocation for HIV programs at local and national levels. In geographic areas where direct population size estimates (PSEs) for key populations have not been collected, small area estimation (SAE) can help fill in gaps using supplemental data sources known as auxiliary data. However, routinely collected program data have not historically been used as auxiliary data to generate subnational estimates for key populations, including in Namibia. Objective: To systematically generate regional size estimates for FSW and MSM in Namibia, we used a consensus-informed estimation approach with local stakeholders that included the integration of routinely collected HIV program data provided by key populations’ HIV service providers. Methods: We used quarterly program data reported by key population implementing partners, including counts of the number of individuals accessing HIV services over time, to weight existing PSEs collected through bio-behavioral surveys using a Bayesian triangulation approach. SAEs were generated through simple imputation, stratified imputation, and multivariable Poisson regression models. We selected final estimates using an iterative qualitative ranking process with local key population implementing partners. Results: Extrapolated national estimates for FSW ranged from 4777 to 13,148 across Namibia, comprising 1.5% to 3.6% of female individuals aged between 15 and 49 years. For MSM, estimates ranged from 4611 to 10,171, comprising 0.7% to 1.5% of male individuals aged between 15 and 49 years. After the inclusion of program data as priors, the estimated proportion of FSW derived from simple imputation increased from 1.9% to 2.8%, and the proportion of MSM decreased from 1.5% to 0.75%. When stratified imputation was implemented using HIV prevalence to inform strata, the inclusion of program data increased the proportion of FSW from 2.6% to 4.0% in regions with high prevalence and decreased the proportion from 1.4% to 1.2% in regions with low prevalence. When population density was used to inform strata, the inclusion of program data also increased the proportion of FSW in high-density regions (from 1.1% to 3.4%) and decreased the proportion of MSM in all regions. Conclusions: Using SAE approaches, we combined epidemiologic and program data to generate subnational size estimates for key populations in Namibia. Overall, estimates were highly sensitive to the inclusion of program data. Program data represent a supplemental source of information that can be used to align PSEs with real-world HIV programs, particularly in regions where population-based data collection methods are challenging to implement. Future work is needed to determine how best to include and validate program data in target settings and in key population size estimation studies, ultimately bridging research with practice to support a more comprehensive HIV response. %M 38573760 %R 10.2196/48963 %U https://publichealth.jmir.org/2024/1/e48963 %U https://doi.org/10.2196/48963 %U http://www.ncbi.nlm.nih.gov/pubmed/38573760 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e54580 %T An Entity Extraction Pipeline for Medical Text Records Using Large Language Models: Analytical Study %A Wang,Lei %A Ma,Yinyao %A Bi,Wenshuai %A Lv,Hanlin %A Li,Yuxiang %+ BGI Research, 1-2F, Building 2, Wuhan Optics Valley International Biomedical Enterprise Accelerator Phase 3.1, No 388 Gaoxin Road 2, Donghu New Technology Development Zone, Wuhan, 430074, China, 86 18707190886, lvhanlin@genomics.cn %K clinical data extraction %K large language models %K feature hallucination %K modular approach %K unstructured data processing %D 2024 %7 29.3.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: The study of disease progression relies on clinical data, including text data, and extracting valuable features from text data has been a research hot spot. With the rise of large language models (LLMs), semantic-based extraction pipelines are gaining acceptance in clinical research. However, the security and feature hallucination issues of LLMs require further attention. Objective: This study aimed to introduce a novel modular LLM pipeline, which could semantically extract features from textual patient admission records. Methods: The pipeline was designed to process a systematic succession of concept extraction, aggregation, question generation, corpus extraction, and question-and-answer scale extraction, which was tested via 2 low-parameter LLMs: Qwen-14B-Chat (QWEN) and Baichuan2-13B-Chat (BAICHUAN). A data set of 25,709 pregnancy cases from the People’s Hospital of Guangxi Zhuang Autonomous Region, China, was used for evaluation with the help of a local expert’s annotation. The pipeline was evaluated with the metrics of accuracy and precision, null ratio, and time consumption. Additionally, we evaluated its performance via a quantified version of Qwen-14B-Chat on a consumer-grade GPU. Results: The pipeline demonstrates a high level of precision in feature extraction, as evidenced by the accuracy and precision results of Qwen-14B-Chat (95.52% and 92.93%, respectively) and Baichuan2-13B-Chat (95.86% and 90.08%, respectively). Furthermore, the pipeline exhibited low null ratios and variable time consumption. The INT4-quantified version of QWEN delivered an enhanced performance with 97.28% accuracy and a 0% null ratio. Conclusions: The pipeline exhibited consistent performance across different LLMs and efficiently extracted clinical features from textual data. It also showed reliable performance on consumer-grade hardware. This approach offers a viable and effective solution for mining clinical research data from textual records. %M 38551633 %R 10.2196/54580 %U https://www.jmir.org/2024/1/e54580 %U https://doi.org/10.2196/54580 %U http://www.ncbi.nlm.nih.gov/pubmed/38551633 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e49822 %T Investigating the Roles and Responsibilities of Institutional Signing Officials After Data Sharing Policy Reform for Federally Funded Research in the United States: National Survey %A Baek,Jinyoung %A Lawson,Jonathan %A Rahimzadeh,Vasiliki %+ Center for Medical Ethics and Health Policy, Baylor College of Medicine, 1 Baylor Plaza, Suite 310DF, Houston, TX, 77030, United States, 1 (713) 798 3500, vasiliki.rahimzadeh@bcm.edu %K biomedical research %K survey %K surveys %K data sharing %K data management %K secondary use %K National Institutes of Health %K signing official %K information sharing %K exchange %K access %K data science %K accessibility %K policy %K policies %D 2024 %7 20.3.2024 %9 Original Paper %J JMIR Form Res %G English %X Background: New federal policies along with rapid growth in data generation, storage, and analysis tools are together driving scientific data sharing in the United States. At the same, triangulating human research data from diverse sources can also create situations where data are used for future research in ways that individuals and communities may consider objectionable. Institutional gatekeepers, namely, signing officials (SOs), are therefore at the helm of compliant management and sharing of human data for research. Of those with data governance responsibilities, SOs most often serve as signatories for investigators who deposit, access, and share research data between institutions. Although SOs play important leadership roles in compliant data sharing, we know surprisingly little about their scope of work, roles, and oversight responsibilities. Objective: The purpose of this study was to describe existing institutional policies and practices of US SOs who manage human genomic data access, as well as how these may change in the wake of new Data Management and Sharing requirements for National Institutes of Health–funded research in the United States. Methods: We administered an anonymous survey to institutional SOs recruited from biomedical research institutions across the United States. Survey items probed where data generated from extramurally funded research are deposited, how researchers outside the institution access these data, and what happens to these data after extramural funding ends. Results: In total, 56 institutional SOs participated in the survey. We found that SOs frequently approve duplicate data deposits and impose stricter access controls when data use limitations are unclear or unspecified. In addition, 21% (n=12) of SOs knew where data from federally funded projects are deposited after project funding sunsets. As a consequence, most investigators deposit their scientific data into “a National Institutes of Health–funded repository” to meet the Data Management and Sharing requirements but also within the “institution’s own repository” or a third-party repository. Conclusions: Our findings inform 5 policy recommendations and best practices for US SOs to improve coordination and develop comprehensive and consistent data governance policies that balance the need for scientific progress with effective human data protections. %M 38506894 %R 10.2196/49822 %U https://formative.jmir.org/2024/1/e49822 %U https://doi.org/10.2196/49822 %U http://www.ncbi.nlm.nih.gov/pubmed/38506894 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e50518 %T Agendas on Nursing in South Korea Media: Natural Language Processing and Network Analysis of News From 2005 to 2022 %A Park,Daemin %A Kim,Dasom %A Park,Ah-hyun %+ Home Visit Healthcare Team, Expert Group on Health Promotion for Seoul Metropolitan Government, #410, Life Science Building.Annex, 120, Neungdong-ro, Gwangjin-gu, Seoul, 05029, Republic of Korea, 82 1072040418, dudurdaram@naver.com %K nurses %K news %K South Korea %K natural language processing %K NLP %K network analysis %K politicization %D 2024 %7 19.3.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: In recent years, Korean society has increasingly recognized the importance of nurses in the context of population aging and infectious disease control. However, nurses still face difficulties with regard to policy activities that are aimed at improving the nursing workforce structure and working environment. Media coverage plays an important role in public awareness of a particular issue and can be an important strategy in policy activities. Objective: This study analyzed data from 18 years of news coverage on nursing-related issues. The focus of this study was to examine the drivers of the social, local, economic, and political agendas that were emphasized in the media by the analysis of main sources and their quotes. This analysis revealed which nursing media agendas were emphasized (eg, social aspects), neglected (eg, policy aspects), and negotiated. Methods: Descriptive analysis, natural language processing, and semantic network analysis were applied to analyze data collected from 2005 to 2022. BigKinds were used for the collection of data, automatic multi-categorization of news, named entity recognition of news sources, and extraction and topic modeling of quotes. The main news sources were identified by conducting a 1-mode network analysis with SNAnalyzer. The main agendas of nursing-related news coverage were examined through the qualitative analysis of major sources’ quotes by section. The common and individual interests of the top-ranked sources were analyzed through a 2-mode network analysis using UCINET. Results: In total, 128,339 articles from 54 media outlets on nursing-related issues were analyzed. Descriptive analysis showed that nursing-related news was mainly covered in social (99,868/128,339, 77.82%) and local (48,056/128,339, 48.56%) sections, whereas it was rarely covered in economic (9439/128,339, 7.35%) and political (7301/128,339, 5.69%) sections. Furthermore, 445 sources that had made the top 20 list at least once by year and section were analyzed. Other than “nurse,” the main sources for each section were “labor union,” “local resident,” “government,” and “Moon Jae-in.” “Nursing Bill” emerged as a common interest among nurses and doctors, although the topic did not garner considerable attention from the Ministry of Health and Welfare. Analyzing quotes showed that nurses were portrayed as heroes, laborers, survivors of abuse, and perpetrators. The economic section focused on employment of youth and women in nursing. In the political section, conflicts between nurses and doctors, which may have caused policy confusion, were highlighted. Policy formulation processes were not adequately reported. Media coverage of the enactment of nursing laws tended to relate to confrontations between political parties. Conclusions: The media plays a crucial role in highlighting various aspects of nursing practice. However, policy formulation processes to solve nursing issues were not adequately reported in South Korea. This study suggests that nurses should secure policy compliance by persuading the public to understand their professional perspectives. %M 38393293 %R 10.2196/50518 %U https://www.jmir.org/2024/1/e50518 %U https://doi.org/10.2196/50518 %U http://www.ncbi.nlm.nih.gov/pubmed/38393293 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e42904 %T Validation of 3 Computer-Aided Facial Phenotyping Tools (DeepGestalt, GestaltMatcher, and D-Score): Comparative Diagnostic Accuracy Study %A Reiter,Alisa Maria Vittoria %A Pantel,Jean Tori %A Danyel,Magdalena %A Horn,Denise %A Ott,Claus-Eric %A Mensah,Martin Atta %+ Institute of Medical Genetics and Human Genetics, Charité - Universitätsmedizin Berlin, corporate member of Freie Universität Berlin and Humboldt-Universität zu Berlin, Augustenburger Platz 1, Berlin, 13353, Germany, 49 30450569132, martin-atta.mensah@charite.de %K facial phenotyping %K DeepGestalt %K facial recognition %K Face2Gene %K medical genetics %K diagnostic accuracy %K genetic syndrome %K machine learning %K GestaltMatcher %K D-Score %K genetics %D 2024 %7 13.3.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: While characteristic facial features provide important clues for finding the correct diagnosis in genetic syndromes, valid assessment can be challenging. The next-generation phenotyping algorithm DeepGestalt analyzes patient images and provides syndrome suggestions. GestaltMatcher matches patient images with similar facial features. The new D-Score provides a score for the degree of facial dysmorphism. Objective: We aimed to test state-of-the-art facial phenotyping tools by benchmarking GestaltMatcher and D-Score and comparing them to DeepGestalt. Methods: Using a retrospective sample of 4796 images of patients with 486 different genetic syndromes (London Medical Database, GestaltMatcher Database, and literature images) and 323 inconspicuous control images, we determined the clinical use of D-Score, GestaltMatcher, and DeepGestalt, evaluating sensitivity; specificity; accuracy; the number of supported diagnoses; and potential biases such as age, sex, and ethnicity. Results: DeepGestalt suggested 340 distinct syndromes and GestaltMatcher suggested 1128 syndromes. The top-30 sensitivity was higher for DeepGestalt (88%, SD 18%) than for GestaltMatcher (76%, SD 26%). DeepGestalt generally assigned lower scores but provided higher scores for patient images than for inconspicuous control images, thus allowing the 2 cohorts to be separated with an area under the receiver operating characteristic curve (AUROC) of 0.73. GestaltMatcher could not separate the 2 classes (AUROC 0.55). Trained for this purpose, D-Score achieved the highest discriminatory power (AUROC 0.86). D-Score’s levels increased with the age of the depicted individuals. Male individuals yielded higher D-scores than female individuals. Ethnicity did not appear to influence D-scores. Conclusions: If used with caution, algorithms such as D-score could help clinicians with constrained resources or limited experience in syndromology to decide whether a patient needs further genetic evaluation. Algorithms such as DeepGestalt could support diagnosing rather common genetic syndromes with facial abnormalities, whereas algorithms such as GestaltMatcher could suggest rare diagnoses that are unknown to the clinician in patients with a characteristic, dysmorphic face. %M 38477981 %R 10.2196/42904 %U https://www.jmir.org/2024/1/e42904 %U https://doi.org/10.2196/42904 %U http://www.ncbi.nlm.nih.gov/pubmed/38477981 %0 Journal Article %@ 2369-2960 %I JMIR Publications %V 10 %N %P e48186 %T Improving the Efficiency of Inferences From Hybrid Samples for Effective Health Surveillance Surveys: Comprehensive Review of Quantitative Methods %A Fahimi,Mansour %A Hair,Elizabeth C %A Do,Elizabeth K %A Kreslake,Jennifer M %A Yan,Xiaolu %A Chan,Elisa %A Barlas,Frances M %A Giles,Abigail %A Osborn,Larry %+ Marketing Systems Group, 755 Business Center Drive, Suite 200, Horsham, PA, 19044, United States, 1 2156202880, mfahimi@m-s-g.com %K hybrid samples %K composite estimation %K optimal composition factor %K unequal weighting effect %K composite weighting %K weighting %K surveillance %K sample survey %K data collection %K risk factor %D 2024 %7 7.3.2024 %9 Original Paper %J JMIR Public Health Surveill %G English %X Background: Increasingly, survey researchers rely on hybrid samples to improve coverage and increase the number of respondents by combining independent samples. For instance, it is possible to combine 2 probability samples with one relying on telephone and another on mail. More commonly, however, researchers are now supplementing probability samples with those from online panels that are less costly. Setting aside ad hoc approaches that are void of rigor, traditionally, the method of composite estimation has been used to blend results from different sample surveys. This means individual point estimates from different surveys are pooled together, 1 estimate at a time. Given that for a typical study many estimates must be produced, this piecemeal approach is computationally burdensome and subject to the inferential limitations of the individual surveys that are used in this process. Objective: In this paper, we will provide a comprehensive review of the traditional method of composite estimation. Subsequently, the method of composite weighting is introduced, which is significantly more efficient, both computationally and inferentially when pooling data from multiple surveys. With the growing interest in hybrid sampling alternatives, we hope to offer an accessible methodology for improving the efficiency of inferences from such sample surveys without sacrificing rigor. Methods: Specifically, we will illustrate why the many ad hoc procedures for blending survey data from multiple surveys are void of scientific integrity and subject to misleading inferences. Moreover, we will demonstrate how the traditional approach of composite estimation fails to offer a pragmatic and scalable solution in practice. By relying on theoretical and empirical justifications, in contrast, we will show how our proposed methodology of composite weighting is both scientifically sound and inferentially and computationally superior to the old method of composite estimation. Results: Using data from 3 large surveys that have relied on hybrid samples composed of probability-based and supplemental sample components from online panels, we illustrate that our proposed method of composite weighting is superior to the traditional method of composite estimation in 2 distinct ways. Computationally, it is vastly less demanding and hence more accessible for practitioners. Inferentially, it produces more efficient estimates with higher levels of external validity when pooling data from multiple surveys. Conclusions: The new realities of the digital age have brought about a number of resilient challenges for survey researchers, which in turn have exposed some of the inefficiencies associated with the traditional methods this community has relied upon for decades. The resilience of such challenges suggests that piecemeal approaches that may have limited applicability or restricted accessibility will prove to be inadequate and transient. It is from this perspective that our proposed method of composite weighting has aimed to introduce a durable and accessible solution for hybrid sample surveys. %M 38451620 %R 10.2196/48186 %U https://publichealth.jmir.org/2024/1/e48186 %U https://doi.org/10.2196/48186 %U http://www.ncbi.nlm.nih.gov/pubmed/38451620 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e50421 %T Patient and Public Willingness to Share Personal Health Data for Third-Party or Secondary Uses: Systematic Review %A Baines,Rebecca %A Stevens,Sebastian %A Austin,Daniela %A Anil,Krithika %A Bradwell,Hannah %A Cooper,Leonie %A Maramba,Inocencio Daniel %A Chatterjee,Arunangsu %A Leigh,Simon %+ Centre for Health Technology, University of Plymouth, Portland Square, Drakes Circus, Plymouth, PL4 8AA, United Kingdom, 44 07508916450, rebecca.baines@plymouth.ac.uk %K data sharing %K personal health data %K patient %K public attitudes %K systematic review %K secondary use %K third party %K willingness to share %K data privacy and security %D 2024 %7 5.3.2024 %9 Review %J J Med Internet Res %G English %X Background: International advances in information communication, eHealth, and other digital health technologies have led to significant expansions in the collection and analysis of personal health data. However, following a series of high-profile data sharing scandals and the emergence of COVID-19, critical exploration of public willingness to share personal health data remains limited, particularly for third-party or secondary uses. Objective: This systematic review aims to explore factors that affect public willingness to share personal health data for third-party or secondary uses. Methods: A systematic search of 6 databases (MEDLINE, Embase, PsycINFO, CINAHL, Scopus, and SocINDEX) was conducted with review findings analyzed using inductive-thematic analysis and synthesized using a narrative approach. Results: Of the 13,949 papers identified, 135 were included. Factors most commonly identified as a barrier to data sharing from a public perspective included data privacy, security, and management concerns. Other factors found to influence willingness to share personal health data included the type of data being collected (ie, perceived sensitivity); the type of user requesting their data to be shared, including their perceived motivation, profit prioritization, and ability to directly impact patient care; trust in the data user, as well as in associated processes, often established through individual choice and control over what data are shared with whom, when, and for how long, supported by appropriate models of dynamic consent; the presence of a feedback loop; and clearly articulated benefits or issue relevance including valued incentivization and compensation at both an individual and collective or societal level. Conclusions: There is general, yet conditional public support for sharing personal health data for third-party or secondary use. Clarity, transparency, and individual control over who has access to what data, when, and for how long are widely regarded as essential prerequisites for public data sharing support. Individual levels of control and choice need to operate within the auspices of assured data privacy and security processes, underpinned by dynamic and responsive models of consent that prioritize individual or collective benefits over and above commercial gain. Failure to understand, design, and refine data sharing approaches in response to changeable patient preferences will only jeopardize the tangible benefits of data sharing practices being fully realized. %M 38441944 %R 10.2196/50421 %U https://www.jmir.org/2024/1/e50421 %U https://doi.org/10.2196/50421 %U http://www.ncbi.nlm.nih.gov/pubmed/38441944 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 13 %N %P e53627 %T Data Visualization Support for Tumor Boards and Clinical Oncology: Protocol for a Scoping Review %A Boehm,Dominik %A Strantz,Cosima %A Christoph,Jan %A Busch,Hauke %A Ganslandt,Thomas %A Unberath,Philipp %+ Medical Center for Information and Communication Technology, Universitätsklinikum Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Glückstraße 11, Erlangen, 91054, Germany, 49 91318546478, dominik.boehm@uk-erlangen.de %K clinical oncology %K tumor board %K cancer conference %K multidisciplinary %K visualization %K software %K tool %K scoping review %K tumor %K malignant %K benign %K data sets %K oncology %K interactive visualization %K data %K patient %K patients %K physicians %K medical practitioners %K medical practitioner %K conference %D 2024 %7 5.3.2024 %9 Protocol %J JMIR Res Protoc %G English %X Background: Complex and expanding data sets in clinical oncology applications require flexible and interactive visualization of patient data to provide the maximum amount of information to physicians and other medical practitioners. Interdisciplinary tumor conferences in particular profit from customized tools to integrate, link, and visualize relevant data from all professions involved. Objective: The scoping review proposed in this protocol aims to identify and present currently available data visualization tools for tumor boards and related areas. The objective of the review will be to provide not only an overview of digital tools currently used in tumor board settings, but also the data included, the respective visualization solutions, and their integration into hospital processes. Methods: The planned scoping review process is based on the Arksey and O’Malley scoping study framework. The following electronic databases will be searched for articles published in English: PubMed, Web of Knowledge, and SCOPUS. Eligible articles will first undergo a deduplication step, followed by the screening of titles and abstracts. Second, a full-text screening will be used to reach the final decision about article selection. At least 2 reviewers will independently screen titles, abstracts, and full-text reports. Conflicting inclusion decisions will be resolved by a third reviewer. The remaining literature will be analyzed using a data extraction template proposed in this protocol. The template includes a variety of meta information as well as specific questions aiming to answer the research question: “What are the key features of data visualization solutions used in molecular and organ tumor boards, and how are these elements integrated and used within the clinical setting?” The findings will be compiled, charted, and presented as specified in the scoping study framework. Data for included tools may be supplemented with additional manual literature searches. The entire review process will be documented in alignment with the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) flowchart. Results: The results of this scoping review will be reported per the expanded PRISMA-ScR guidelines. A preliminary search using PubMed, Web of Knowledge, and Scopus resulted in 1320 articles after deduplication that will be included in the further review process. We expect the results to be published during the second quarter of 2024. Conclusions: Visualization is a key process in leveraging a data set’s potentially available information and enabling its use in an interdisciplinary setting. The scoping review described in this protocol aims to present the status quo of visualization solutions for tumor board and clinical oncology applications and their integration into hospital processes. International Registered Report Identifier (IRRID): DERR1-10.2196/53627 %M 38441925 %R 10.2196/53627 %U https://www.researchprotocols.org/2024/1/e53627 %U https://doi.org/10.2196/53627 %U http://www.ncbi.nlm.nih.gov/pubmed/38441925 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e47846 %T Integration of Patient-Reported Outcome Data Collected Via Web Applications and Mobile Apps Into a Nation-Wide COVID-19 Research Platform Using Fast Healthcare Interoperability Resources: Development Study %A Oehm,Johannes Benedict %A Riepenhausen,Sarah Luise %A Storck,Michael %A Dugas,Martin %A Pryss,Rüdiger %A Varghese,Julian %+ Institute of Medical Informatics, University of Münster, Albert-Schweizer-Campus 1, Gebäude 11, Münster, 48149, Germany, 49 251 83 58247, johannes.oehm@uni-muenster.de %K Fast Healthcare Interoperability Resources %K FHIR %K FHIR Questionnaire %K patient-reported outcome %K mobile health %K mHealth %K research compatibility %K interoperability %K Germany %K harmonized data collection %K findable, accessible, interoperable, and reusable %K FAIR data %K mobile phone %D 2024 %7 27.2.2024 %9 Original Paper %J J Med Internet Res %G English %X Background: The Network University Medicine projects are an important part of the German COVID-19 research infrastructure. They comprise 2 subprojects: COVID-19 Data Exchange (CODEX) and Coordination on Mobile Pandemic Apps Best Practice and Solution Sharing (COMPASS). CODEX provides a centralized and secure data storage platform for research data, whereas in COMPASS, expert panels were gathered to develop a reference app framework for capturing patient-reported outcomes (PROs) that can be used by any researcher. Objective: Our study aims to integrate the data collected with the COMPASS reference app framework into the central CODEX platform, so that they can be used by secondary researchers. Although both projects used the Fast Healthcare Interoperability Resources (FHIR) standard, it was not used in a way that data could be shared directly. Given the short time frame and the parallel developments within the CODEX platform, a pragmatic and robust solution for an interface component was required. Methods: We have developed a means to facilitate and promote the use of the German Corona Consensus (GECCO) data set, a core data set for COVID-19 research in Germany. In this way, we ensured semantic interoperability for the app-collected PRO data with the COMPASS app. We also developed an interface component to sustain syntactic interoperability. Results: The use of different FHIR types by the COMPASS reference app framework (the general-purpose FHIR Questionnaire) and the CODEX platform (eg, Patient, Condition, and Observation) was found to be the most significant obstacle. Therefore, we developed an interface component that realigns the Questionnaire items with the corresponding items in the GECCO data set and provides the correct resources for the CODEX platform. We extended the existing COMPASS questionnaire editor with an import function for GECCO items, which also tags them for the interface component. This ensures syntactic interoperability and eases the reuse of the GECCO data set for researchers. Conclusions: This paper shows how PRO data, which are collected across various studies conducted by different researchers, can be captured in a research-compatible way. This means that the data can be shared with a central research infrastructure and be reused by other researchers to gain more insights about COVID-19 and its sequelae. %M 38411999 %R 10.2196/47846 %U https://www.jmir.org/2024/1/e47846 %U https://doi.org/10.2196/47846 %U http://www.ncbi.nlm.nih.gov/pubmed/38411999 %0 Journal Article %@ 2369-2960 %I JMIR Publications %V 10 %N %P e49381 %T AIDSVu Cities’ Progress Toward HIV Care Continuum Goals: Cross-Sectional Study %A Hood,Nicole %A Benbow,Nanette %A Jaggi,Chandni %A Whitby,Shamaya %A Sullivan,Patrick Sean %A , %+ Department of Epidemiology, Rollins School of Public Health, Emory University, 1518 Clifton Rd, Atlanta, GA, 30329, United States, 1 4047273956, nicole.hood@emory.edu %K HIV %K epidemiology %K surveillance %K HIV care continuum %K cities %K HIV public health %K HIV prevention %K diagnosis %K HIV late diagnosis %D 2024 %7 26.2.2024 %9 Original Paper %J JMIR Public Health Surveill %G English %X Background: Public health surveillance data are critical to understanding the current state of the HIV and AIDS epidemics. Surveillance data provide significant insight into patterns within and progress toward achieving targets for each of the steps in the HIV care continuum. Such targets include those outlined in the National HIV/AIDS Strategy (NHAS) goals. If these data are disseminated, they can be used to prioritize certain steps in the continuum, geographic locations, and groups of people. Objective: We sought to develop and report indicators of progress toward the NHAS goals for US cities and to characterize progress toward those goals with categorical metrics. Methods: Health departments used standardized SAS code to calculate care continuum indicators from their HIV surveillance data to ensure comparability across jurisdictions. We report 2018 descriptive statistics for continuum steps (timely diagnosis, linkage to medical care, receipt of medical care, and HIV viral load suppression) for 36 US cities and their progress toward 2020 NHAS goals as of 2018. Indicators are reported categorically as met or surpassed the goal, within 25% of attaining the goal, or further than 25% from achieving the goal. Results: Cities were closest to meeting NHAS goals for timely diagnosis compared to the goals for linkage to care, receipt of care, and viral load suppression, with all cities (n=36, 100%) within 25% of meeting the goal for timely diagnosis. Only 8% (n=3) of cities were >25% from achieving the goal for receipt of care, but 69% (n=25) of cities were >25% from achieving the goal for viral suppression. Conclusions: Display of progress with graphical indicators enables communication of progress to stakeholders. AIDSVu analyses of HIV surveillance data facilitate cities’ ability to benchmark their progress against that of other cities with similar characteristics. By identifying peer cities (eg, cities with analogous populations or similar NHAS goal concerns), the public display of indicators can promote dialogue between cities with comparable challenges and opportunities. %M 38407961 %R 10.2196/49381 %U https://publichealth.jmir.org/2024/1/e49381 %U https://doi.org/10.2196/49381 %U http://www.ncbi.nlm.nih.gov/pubmed/38407961 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 13 %N %P e54681 %T Exploring Shared Implementation Leadership of Point of Care Nursing Leadership Teams on Inpatient Hospital Units: Protocol for a Collective Case Study %A Castiglione,Sonia Angela %A Lavoie-Tremblay,Mélanie %A Kilpatrick,Kelley %A Gifford,Wendy %A Semenic,Sonia Elizabeth %+ Ingram School of Nursing, McGill University, #1800, 680 Rue Sherbrooke O, Montreal, QC, H3A 2M7, Canada, 1 514 398 4144, sonia.castiglione@mcgill.ca %K case study %K evidence-based practices %K implementation leadership %K inpatient hospital units %K nursing leadership %K point of care %D 2024 %7 19.2.2024 %9 Protocol %J JMIR Res Protoc %G English %X Background: Nursing leadership teams at the point of care (POC), consisting of both formal and informal leaders, are regularly called upon to support the implementation of evidence-based practices (EBPs) in hospital units. However, current conceptualizations of effective leadership for successful implementation typically focus on the behaviors of individual leaders in managerial roles. Little is known about how multiple nursing leaders in formal and informal roles share implementation leadership (IL), representing an important knowledge gap. Objective: This study aims to explore shared IL among formal and informal nursing leaders in inpatient hospital units. The central research question is as follows: How is IL shared among members of POC nursing leadership teams on inpatient hospital units? The subquestions are as follows: (1) What IL behaviors are enacted and shared by formal and informal leaders? (2) What social processes enable shared IL by formal and informal leaders? and (3) What factors influence shared IL in nursing leadership teams? Methods: We will use a collective case study approach to describe and generate an in-depth understanding of shared IL in nursing. We will select nursing leadership teams on 2 inpatient hospital units that have successfully implemented an EBP as instrumental cases. We will construct data through focus groups and individual interviews with key informants (leaders, unit staff, and senior nurse leaders), review of organizational documents, and researcher-generated field notes. We have developed a conceptual framework of shared IL to guide data analysis, which describes effective IL behaviors, formal and informal nursing leaders’ roles at the POC, and social processes generating shared leadership and influencing contextual factors. We will use the Framework Method to systematically generate data matrices from deductive and inductive thematic analysis of each case. We will then generate assertions about shared IL following a cross-case analysis. Results: The study protocol received research ethics approval (2022-8408) on February 24, 2022. Data collection began in June 2022, and we have recruited 2 inpatient hospital units and 25 participants. Data collection was completed in December 2023, and data analysis is ongoing. We anticipate findings to be published in a peer-reviewed journal by late 2024. Conclusions: The anticipated results will shed light on how multiple and diverse members of the POC nursing leadership team enact and share IL. This study addresses calls to advance knowledge in promoting effective implementation of EBPs to ensure high-quality health care delivery by further developing the concept of shared IL in a nursing context. We will identify strategies to strengthen shared IL in nursing leadership teams at the POC, informing future intervention studies. International Registered Report Identifier (IRRID): DERR1-10.2196/54681 %M 38373024 %R 10.2196/54681 %U https://www.researchprotocols.org/2024/1/e54681 %U https://doi.org/10.2196/54681 %U http://www.ncbi.nlm.nih.gov/pubmed/38373024 %0 Journal Article %@ 2369-2960 %I JMIR Publications %V 10 %N %P e47703 %T Designing Electronic Data Capture Systems for Sustainability in Low-Resource Settings: Viewpoint With Lessons Learned From Ethiopia and Myanmar %A Benda,Natalie %A Dougherty,Kylie %A Gebremariam Gobezayehu,Abebe %A Cranmer,John N %A Zawtha,Sakie %A Andreadis,Katerina %A Biza,Heran %A Masterson Creber,Ruth %+ School of Nursing, Columbia University, 560 W 168th St, New York, NY, 10032, United States, 1 212 305 5756, nb3115@cumc.columbia.edu %K low and middle income countries %K LMIC %K electronic data capture %K population health surveillance, sociotechnical system %K data infrastructure %K electronic data system %K health care system %K technology %K information system %K health program development %K intervention %D 2024 %7 12.2.2024 %9 Viewpoint %J JMIR Public Health Surveill %G English %X Electronic data capture (EDC) is a crucial component in the design, evaluation, and sustainment of population health interventions. Low-resource settings, however, present unique challenges for developing a robust EDC system due to limited financial capital, differences in technological infrastructure, and insufficient involvement of those who understand the local context. Current literature focuses on the evaluation of health interventions using EDC but does not provide an in-depth description of the systems used or how they are developed. In this viewpoint, we present case descriptions from 2 low- and middle-income countries: Ethiopia and Myanmar. We address a gap in evidence by describing each EDC system in detail and discussing the pros and cons of different approaches. We then present common lessons learned from the 2 case descriptions as recommendations for considerations in developing and implementing EDC in low-resource settings, using a sociotechnical framework for studying health information technology in complex adaptive health care systems. Our recommendations highlight the importance of selecting hardware compatible with local infrastructure, using flexible software systems that facilitate communication across different languages and levels of literacy, and conducting iterative, participatory design with individuals with deep knowledge of local clinical and cultural norms. %M 38345833 %R 10.2196/47703 %U https://publichealth.jmir.org/2024/1/e47703 %U https://doi.org/10.2196/47703 %U http://www.ncbi.nlm.nih.gov/pubmed/38345833 %0 Journal Article %@ 2561-6722 %I JMIR Publications %V 7 %N %P e47092 %T Building a Sustainable Learning Health Care System for Pregnant and Lactating People: Interview Study Among Data Access Providers %A Hollestelle,Marieke J %A van der Graaf,Rieke %A Sturkenboom,Miriam C J M %A Cunnington,Marianne %A van Delden,Johannes J M %+ Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, Heidelberglaan 100, Utrecht, 3508 CA, Netherlands, 31 88755555, m.j.hollestelle-2@umcutrecht.nl %K ethics %K learning health care systems %K pregnancy %K lactation %K real-world data %K governance %K qualitative research %D 2024 %7 8.2.2024 %9 Original Paper %J JMIR Pediatr Parent %G English %X Background: In many areas of health care, learning health care systems (LHSs) are seen as promising ways to accelerate research and outcomes for patients by reusing health and research data. For example, considering pregnant and lactating people, for whom there is still a poor evidence base for medication safety and efficacy, an LHS presents an interesting way forward. Combining unique data sources across Europe in an LHS could help clarify how medications affect pregnancy outcomes and lactation exposures. In general, a remaining challenge of data-intensive health research, which is at the core of an LHS, has been obtaining meaningful access to data. These unique data sources, also called data access providers (DAPs), are both public and private organizations and are important stakeholders in the development of a sustainable and ethically responsible LHS. Sustainability is often discussed as a challenge in LHS development. Moreover, DAPs are increasingly expected to move beyond regulatory compliance and are seen as moral agents tasked with upholding ethical principles, such as transparency, trustworthiness, responsibility, and community engagement. Objective: This study aims to explore the views of people working for DAPs who participate in a public-private partnership to build a sustainable and ethically responsible LHS. Methods: Using a qualitative interview design, we interviewed 14 people involved in the Innovative Medicines Initiative (IMI) ConcePTION (Continuum of Evidence from Pregnancy Exposures, Reproductive Toxicology and Breastfeeding to Improve Outcomes Now) project, a public-private collaboration with the goal of building an LHS for pregnant and lactating people. The pseudonymized transcripts were analyzed thematically. Results: A total of 3 themes were identified: opportunities and responsibilities, conditions for participation and commitment, and challenges for a knowledge-generating ecosystem. The respondents generally regarded the collaboration as an opportunity for various reasons beyond the primary goal of generating knowledge about medication safety during pregnancy and lactation. Respondents had different interpretations of responsibility in the context of data-intensive research in a public-private network. Respondents explained that resources (financial and other), scientific output, motivation, agreements collaboration with the pharmaceutical industry, trust, and transparency are important conditions for participating in and committing to the ConcePTION LHS. Respondents also discussed the challenges of an LHS, including the limitations to (real-world) data analyses and governance procedures. Conclusions: Our respondents were motivated by diverse opportunities to contribute to an LHS for pregnant and lactating people, primarily centered on advancing knowledge on medication safety. Although a shared responsibility for enabling real-world data analyses is acknowledged, their focus remains on their work and contribution to the project rather than on safeguarding ethical data handling. The results of our interviews underline the importance of a transparent governance structure, emphasizing the trust between DAPs and the public for the success and sustainability of an LHS. %M 38329780 %R 10.2196/47092 %U https://pediatrics.jmir.org/2024/1/e47092 %U https://doi.org/10.2196/47092 %U http://www.ncbi.nlm.nih.gov/pubmed/38329780 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 26 %N %P e51640 %T Limitations of the Cough Sound-Based COVID-19 Diagnosis Artificial Intelligence Model and its Future Direction: Longitudinal Observation Study %A Kim,Jina %A Choi,Yong Sung %A Lee,Young Joo %A Yeo,Seung Geun %A Kim,Kyung Won %A Kim,Min Seo %A Rahmati,Masoud %A Yon,Dong Keon %A Lee,Jinseok %+ Department of Biomedical Engineering, Kyung Hee University, 1732 Deogyeong-daero, Giheung-gu, Seoul, 17104, Republic of Korea, 82 2 6935 2476, gonasago@khu.ac.kr %K COVID-19 variants %K cough sound %K artificial intelligence %K diagnosis %K human lifestyle %K SARS-CoV-2 %K AI model %K cough %K sound-based %K diagnosis %K sounds app %K development %K COVID-19 %K AI %D 2024 %7 6.2.2024 %9 Short Paper %J J Med Internet Res %G English %X Background: The outbreak of SARS-CoV-2 in 2019 has necessitated the rapid and accurate detection of COVID-19 to manage patients effectively and implement public health measures. Artificial intelligence (AI) models analyzing cough sounds have emerged as promising tools for large-scale screening and early identification of potential cases. Objective: This study aimed to investigate the efficacy of using cough sounds as a diagnostic tool for COVID-19, considering the unique acoustic features that differentiate positive and negative cases. We investigated whether an AI model trained on cough sound recordings from specific periods, especially the early stages of the COVID-19 pandemic, were applicable to the ongoing situation with persistent variants. Methods: We used cough sound recordings from 3 data sets (Cambridge, Coswara, and Virufy) representing different stages of the pandemic and variants. Our AI model was trained using the Cambridge data set with subsequent evaluation against all data sets. The performance was analyzed based on the area under the receiver operating curve (AUC) across different data measurement periods and COVID-19 variants. Results: The AI model demonstrated a high AUC when tested with the Cambridge data set, indicative of its initial effectiveness. However, the performance varied significantly with other data sets, particularly in detecting later variants such as Delta and Omicron, with a marked decline in AUC observed for the latter. These results highlight the challenges in maintaining the efficacy of AI models against the backdrop of an evolving virus. Conclusions: While AI models analyzing cough sounds offer a promising noninvasive and rapid screening method for COVID-19, their effectiveness is challenged by the emergence of new virus variants. Ongoing research and adaptations in AI methodologies are crucial to address these limitations. The adaptability of AI models to evolve with the virus underscores their potential as a foundational technology for not only the current pandemic but also future outbreaks, contributing to a more agile and resilient global health infrastructure. %M 38319694 %R 10.2196/51640 %U https://www.jmir.org/2024/1/e51640 %U https://doi.org/10.2196/51640 %U http://www.ncbi.nlm.nih.gov/pubmed/38319694 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 12 %N %P e52080 %T The Current Status and Promotional Strategies for Cloud Migration of Hospital Information Systems in China: Strengths, Weaknesses, Opportunities, and Threats Analysis %A Xu,Jian %+ Department of Health Policy, Beijing Municipal Health Big Data and Policy Research Center, Building 1, Number 6 Daji Street, Tongzhou District, Beijing, 101160, China, 86 01055532146, _xujian@163.com %K hospital information system %K HIS %K cloud computing %K cloud migration %K Strengths, Weaknesses, Opportunities, and Threats analysis %D 2024 %7 5.2.2024 %9 Viewpoint %J JMIR Med Inform %G English %X Background: In the 21st century, Chinese hospitals have witnessed innovative medical business models, such as online diagnosis and treatment, cross-regional multidepartment consultation, and real-time sharing of medical test results, that surpass traditional hospital information systems (HISs). The introduction of cloud computing provides an excellent opportunity for hospitals to address these challenges. However, there is currently no comprehensive research assessing the cloud migration of HISs in China. This lack may hinder the widespread adoption and secure implementation of cloud computing in hospitals. Objective: The objective of this study is to comprehensively assess external and internal factors influencing the cloud migration of HISs in China and propose promotional strategies. Methods: Academic articles from January 1, 2007, to February 21, 2023, on the topic were searched in PubMed and HuiyiMd databases, and relevant documents such as national policy documents, white papers, and survey reports were collected from authoritative sources for analysis. A systematic assessment of factors influencing cloud migration of HISs in China was conducted by combining a Strengths, Weaknesses, Opportunities, and Threats (SWOT) analysis and literature review methods. Then, various promotional strategies based on different combinations of external and internal factors were proposed. Results: After conducting a thorough search and review, this study included 94 academic articles and 37 relevant documents. The analysis of these documents reveals the increasing application of and research on cloud computing in Chinese hospitals, and that it has expanded to 22 disciplinary domains. However, more than half (n=49, 52%) of the documents primarily focused on task-specific cloud-based systems in hospitals, while only 22% (n=21 articles) discussed integrated cloud platforms shared across the entire hospital, medical alliance, or region. The SWOT analysis showed that cloud computing adoption in Chinese hospitals benefits from policy support, capital investment, and social demand for new technology. However, it also faces threats like loss of digital sovereignty, supplier competition, cyber risks, and insufficient supervision. Factors driving cloud migration for HISs include medical big data analytics and use, interdisciplinary collaboration, health-centered medical service provision, and successful cases. Barriers include system complexity, security threats, lack of strategic planning and resource allocation, relevant personnel shortages, and inadequate investment. This study proposes 4 promotional strategies: encouraging more hospitals to migrate, enhancing hospitals’ capabilities for migration, establishing a provincial-level unified medical hybrid multi-cloud platform, strengthening legal frameworks, and providing robust technical support. Conclusions: Cloud computing is an innovative technology that has gained significant attention from both the Chinese government and the global community. In order to effectively support the rapid growth of a novel, health-centered medical industry, it is imperative for Chinese health authorities and hospitals to seize this opportunity by implementing comprehensive strategies aimed at encouraging hospitals to migrate their HISs to the cloud. %M 38315519 %R 10.2196/52080 %U https://medinform.jmir.org/2024/1/e52080 %U https://doi.org/10.2196/52080 %U http://www.ncbi.nlm.nih.gov/pubmed/38315519 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e53302 %T Clinical Informatics Team Members’ Perspectives on Health Information Technology Safety After Experiential Learning and Safety Process Development: Qualitative Descriptive Study %A Recsky,Chantelle %A Rush,Kathy L %A MacPhee,Maura %A Stowe,Megan %A Blackburn,Lorraine %A Muniak,Allison %A Currie,Leanne M %+ School of Nursing, University of British Columbia, T201-2211 Wesbrook Mall, Vancouver, BC, V6T 2B5, Canada, 1 604 822 7417, chantelle.recsky@ubc.ca %K informatics %K community health services %K knowledge translation %K qualitative research %K patient safety %D 2024 %7 5.2.2024 %9 Original Paper %J JMIR Form Res %G English %X Background: Although intended to support improvement, the rapid adoption and evolution of technologies in health care can also bring about unintended consequences related to safety. In this project, an embedded researcher with expertise in patient safety and clinical education worked with a clinical informatics team to examine safety and harm related to health information technologies (HITs) in primary and community care settings. The clinical informatics team participated in learning activities around relevant topics (eg, human factors, high reliability organizations, and sociotechnical systems) and cocreated a process to address safety events related to technology (ie, safety huddles and sociotechnical analysis of safety events). Objective: This study aimed to explore clinical informaticians’ experiences of incorporating safety practices into their work. Methods: We used a qualitative descriptive design and conducted web-based focus groups with clinical informaticians. Thematic analysis was used to analyze the data. Results: A total of 10 informants participated. Barriers to addressing safety and harm in their context included limited prior knowledge of HIT safety, previous assumptions and perspectives, competing priorities and organizational barriers, difficulty with the reporting system and processes, and a limited number of reports for learning. Enablers to promoting safety and mitigating harm included participating in learning sessions, gaining experience analyzing reported events, participating in safety huddles, and role modeling and leadership from the embedded researcher. Individual outcomes included increased ownership and interest in HIT safety, the development of a sociotechnical systems perspective, thinking differently about safety, and increased consideration for user perspectives. Team outcomes included enhanced communication within the team, using safety events to inform future work and strategic planning, and an overall promotion of a culture of safety. Conclusions: As HITs are integrated into care delivery, it is important for clinical informaticians to recognize the risks related to safety. Experiential learning activities, including reviewing safety event reports and participating in safety huddles, were identified as particularly impactful. An HIT safety learning initiative is a feasible approach for clinical informaticians to become more knowledgeable and engaged in HIT safety issues in their work. %M 38315544 %R 10.2196/53302 %U https://formative.jmir.org/2024/1/e53302 %U https://doi.org/10.2196/53302 %U http://www.ncbi.nlm.nih.gov/pubmed/38315544 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 13 %N %P e50339 %T Blockchain-Based Dynamic Consent and its Applications for Patient-Centric Research and Health Information Sharing: Protocol for an Integrative Review %A Charles,Wendy M %A van der Waal,Mark B %A Flach,Joost %A Bisschop,Arno %A van der Waal,Raymond X %A Es-Sbai,Hadil %A McLeod,Christopher J %+ Health Administration Program, Business School, University of Colorado, Denver, 1475 Lawrence Street, Denver, CO, 80202, United States, 1 303 250 1148, wendy.charles@cuanschutz.edu %K best practices %K blockchain %K clinical trial %K data reuse %K data sharing %K dynamic consent %K health care data %K integrative research review %K scientific rigor %K technology implementation %D 2024 %7 5.2.2024 %9 Protocol %J JMIR Res Protoc %G English %X Background: Blockchain has been proposed as a critical technology to facilitate more patient-centric research and health information sharing. For instance, it can be applied to coordinate and document dynamic informed consent, a procedure that allows individuals to continuously review and renew their consent to the collection, use, or sharing of their private health information. Such has been suggested to facilitate ethical, compliant longitudinal research, and patient engagement. However, blockchain-based dynamic consent is a relatively new concept, and it is not yet clear how well the suggested implementations will work in practice. Efforts to critically evaluate implementations in health research contexts are limited. Objective: The objective of this protocol is to guide the identification and critical appraisal of implementations of blockchain-based dynamic consent in health research contexts, thereby facilitating the development of best practices for future research, innovation, and implementation. Methods: The protocol describes methods for an integrative review to allow evaluation of a broad range of quantitative and qualitative research designs. The PRISMA-P (Preferred Reporting Items for Systematic Review and Meta-Analysis Protocols) framework guided the review’s structure and nature of reporting findings. We developed search strategies and syntax with the help of an academic librarian. Multiple databases were selected to identify pertinent academic literature (CINAHL, Embase, Ovid MEDLINE, PubMed, Scopus, and Web of Science) and gray literature (Electronic Theses Online Service, ProQuest Dissertations and Theses, Open Access Theses and Dissertations, and Google Scholar) for a comprehensive picture of the field’s progress. Eligibility criteria were defined based on PROSPERO (International Prospective Register of Systematic Reviews) requirements and a criteria framework for technology readiness. A total of 2 reviewers will independently review and extract data, while a third reviewer will adjudicate discrepancies. Quality appraisal of articles and discussed implementations will proceed based on the validated Mixed Method Appraisal Tool, and themes will be identified through thematic data synthesis. Results: Literature searches were conducted, and after duplicates were removed, 492 articles were eligible for screening. Title and abstract screening allowed the removal of 312 articles, leaving 180 eligible articles for full-text review against inclusion criteria and confirming a sufficient body of literature for project feasibility. Results will synthesize the quality of evidence on blockchain-based dynamic consent for patient-centric research and health information sharing, covering effectiveness, efficiency, satisfaction, regulatory compliance, and methods of managing identity. Conclusions: The review will provide a comprehensive picture of the progress of emerging blockchain-based dynamic consent technologies and the rigor with which implementations are approached. Resulting insights are expected to inform best practices for future research, innovation, and implementation to benefit patient-centric research and health information sharing. Trial Registration: PROSPERO CRD42023396983; http://tinyurl.com/cn8a5x7t International Registered Report Identifier (IRRID): DERR1-10.2196/50339 %M 38315514 %R 10.2196/50339 %U https://www.researchprotocols.org/2024/1/e50339 %U https://doi.org/10.2196/50339 %U http://www.ncbi.nlm.nih.gov/pubmed/38315514 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 8 %N %P e49031 %T Use of Machine Learning Tools in Evidence Synthesis of Tobacco Use Among Sexual and Gender Diverse Populations: Algorithm Development and Validation %A Ma,Shaoying %A Jiang,Shuning %A Yang,Olivia %A Zhang,Xuanzhi %A Fu,Yu %A Zhang,Yusen %A Kaareen,Aadeeba %A Ling,Meng %A Chen,Jian %A Shang,Ce %+ Center for Tobacco Research, The Ohio State University Comprehensive Cancer Center, 3650 Olentangy River Road, 1st Floor, Suite 110, Columbus, OH, 43214, United States, 1 6148976063, shaoying.ma@osumc.edu %K machine learning %K natural language processing %K tobacco control %K sexual and gender diverse populations %K lesbian %K gay %K bisexual %K transgender %K queer %K LGBTQ+ %K evidence synthesis %D 2024 %7 24.1.2024 %9 Original Paper %J JMIR Form Res %G English %X Background: From 2016 to 2021, the volume of peer-reviewed publications related to tobacco has experienced a significant increase. This presents a considerable challenge in efficiently summarizing, synthesizing, and disseminating research findings, especially when it comes to addressing specific target populations, such as the LGBTQ+ (lesbian, gay, bisexual, transgender, queer, intersex, asexual, Two Spirit, and other persons who identify as part of this community) populations. Objective: In order to expedite evidence synthesis and research gap discoveries, this pilot study has the following three aims: (1) to compile a specialized semantic database for tobacco policy research to extract information from journal article abstracts, (2) to develop natural language processing (NLP) algorithms that comprehend the literature on nicotine and tobacco product use among sexual and gender diverse populations, and (3) to compare the discoveries of the NLP algorithms with an ongoing systematic review of tobacco policy research among LGBTQ+ populations. Methods: We built a tobacco research domain–specific semantic database using data from 2993 paper abstracts from 4 leading tobacco-specific journals, with enrichment from other publicly available sources. We then trained an NLP model to extract named entities after learning patterns and relationships between words and their context in text, which further enriched the semantic database. Using this iterative process, we extracted and assessed studies relevant to LGBTQ+ tobacco control issues, further comparing our findings with an ongoing systematic review that also focuses on evidence synthesis for this demographic group. Results: In total, 33 studies were identified as relevant to sexual and gender diverse individuals’ nicotine and tobacco product use. Consistent with the ongoing systematic review, the NLP results showed that there is a scarcity of studies assessing policy impact on this demographic using causal inference methods. In addition, the literature is dominated by US data. We found that the product drawing the most attention in the body of existing research is cigarettes or cigarette smoking and that the number of studies of various age groups is almost evenly distributed between youth or young adults and adults, consistent with the research needs identified by the US health agencies. Conclusions: Our pilot study serves as a compelling demonstration of the capabilities of NLP tools in expediting the processes of evidence synthesis and the identification of research gaps. While future research is needed to statistically test the NLP tool’s performance, there is potential for NLP tools to fundamentally transform the approach to evidence synthesis. %M 38265858 %R 10.2196/49031 %U https://formative.jmir.org/2024/1/e49031 %U https://doi.org/10.2196/49031 %U http://www.ncbi.nlm.nih.gov/pubmed/38265858 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 12 %N %P e47761 %T The Implementation of an Electronic Medical Record in a German Hospital and the Change in Completeness of Documentation: Longitudinal Document Analysis %A Wurster,Florian %A Beckmann,Marina %A Cecon-Stabel,Natalia %A Dittmer,Kerstin %A Hansen,Till Jes %A Jaschke,Julia %A Köberlein-Neu,Juliane %A Okumu,Mi-Ran %A Rusniok,Carsten %A Pfaff,Holger %A Karbach,Ute %+ Chair of Quality Development and Evaluation in Rehabilitation, Institute of Medical Sociology, Health Services Research, and Rehabilitation Science, Faculty of Human Sciences & Faculty of Medicine and University Hospital Cologne, University of Cologne, Eupener Str. 129, Cologne, 50933, Germany, 49 22147897116, florian.wurster@uni-koeln.de %K clinical documentation %K digital transformation %K document analysis %K electronic medical record %K EMR %K Germany %K health services research %K hospital %K implementation %D 2024 %7 19.1.2024 %9 Original Paper %J JMIR Med Inform %G English %X Background: Electronic medical records (EMR) are considered a key component of the health care system’s digital transformation. The implementation of an EMR promises various improvements, for example, in the availability of information, coordination of care, or patient safety, and is required for big data analytics. To ensure those possibilities, the included documentation must be of high quality. In this matter, the most frequently described dimension of data quality is the completeness of documentation. In this regard, little is known about how and why the completeness of documentation might change after the implementation of an EMR. Objective: This study aims to compare the completeness of documentation in paper-based medical records and EMRs and to discuss the possible impact of an EMR on the completeness of documentation. Methods: A retrospective document analysis was conducted, comparing the completeness of paper-based medical records and EMRs. Data were collected before and after the implementation of an EMR on an orthopaedical ward in a German academic teaching hospital. The anonymized records represent all treated patients for a 3-week period each. Unpaired, 2-tailed t tests, chi-square tests, and relative risks were calculated to analyze and compare the mean completeness of the 2 record types in general and of 10 specific items in detail (blood pressure, body temperature, diagnosis, diet, excretions, height, pain, pulse, reanimation status, and weight). For this purpose, each of the 10 items received a dichotomous score of 1 if it was documented on the first day of patient care on the ward; otherwise, it was scored as 0. Results: The analysis consisted of 180 medical records. The average completeness was 6.25 (SD 2.15) out of 10 in the paper-based medical record, significantly rising to an average of 7.13 (SD 2.01) in the EMR (t178=–2.469; P=.01; d=–0.428). When looking at the significant changes of the 10 items in detail, the documentation of diet (P<.001), height (P<.001), and weight (P<.001) was more complete in the EMR, while the documentation of diagnosis (P<.001), excretions (P=.02), and pain (P=.008) was less complete in the EMR. The completeness remained unchanged for the documentation of pulse (P=.28), blood pressure (P=.47), body temperature (P=.497), and reanimation status (P=.73). Conclusions: Implementing EMRs can influence the completeness of documentation, with a possible change in both increased and decreased completeness. However, the mechanisms that determine those changes are often neglected. There are mechanisms that might facilitate an improved completeness of documentation and could decrease or increase the staff’s burden caused by documentation tasks. Research is needed to take advantage of these mechanisms and use them for mutual profit in the interests of all stakeholders. Trial Registration: German Clinical Trials Register DRKS00023343; https://drks.de/search/de/trial/DRKS00023343 %M 38241076 %R 10.2196/47761 %U https://medinform.jmir.org/2024/1/e47761 %U https://doi.org/10.2196/47761 %U http://www.ncbi.nlm.nih.gov/pubmed/38241076 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 12 %N %P e49007 %T Additional Value From Free-Text Diagnoses in Electronic Health Records: Hybrid Dictionary and Machine Learning Classification Study %A Mehra,Tarun %A Wekhof,Tobias %A Keller,Dagmar Iris %+ Department for Medical Oncology and Hematology, University Hospital of Zurich, Rämistrasse 100, Zurich, 8091, Switzerland, 41 44255 ext 1111, tarun.mehra@usz.ch %K electronic health records %K free text %K natural language processing %K NLP %K artificial intelligence %K AI %D 2024 %7 17.1.2024 %9 Original Paper %J JMIR Med Inform %G English %X Background: Physicians are hesitant to forgo the opportunity of entering unstructured clinical notes for structured data entry in electronic health records. Does free text increase informational value in comparison with structured data? Objective: This study aims to compare information from unstructured text-based chief complaints harvested and processed by a natural language processing (NLP) algorithm with clinician-entered structured diagnoses in terms of their potential utility for automated improvement of patient workflows. Methods: Electronic health records of 293,298 patient visits at the emergency department of a Swiss university hospital from January 2014 to October 2021 were analyzed. Using emergency department overcrowding as a case in point, we compared supervised NLP-based keyword dictionaries of symptom clusters from unstructured clinical notes and clinician-entered chief complaints from a structured drop-down menu with the following 2 outcomes: hospitalization and high Emergency Severity Index (ESI) score. Results: Of 12 symptom clusters, the NLP cluster was substantial in predicting hospitalization in 11 (92%) clusters; 8 (67%) clusters remained significant even after controlling for the cluster of clinician-determined chief complaints in the model. All 12 NLP symptom clusters were significant in predicting a low ESI score, of which 9 (75%) remained significant when controlling for clinician-determined chief complaints. The correlation between NLP clusters and chief complaints was low (r=−0.04 to 0.6), indicating complementarity of information. Conclusions: The NLP-derived features and clinicians’ knowledge were complementary in explaining patient outcome heterogeneity. They can provide an efficient approach to patient flow management, for example, in an emergency medicine setting. We further demonstrated the feasibility of creating extensive and precise keyword dictionaries with NLP by medical experts without requiring programming knowledge. Using the dictionary, we could classify short and unstructured clinical texts into diagnostic categories defined by the clinician. %M 38231569 %R 10.2196/49007 %U https://medinform.jmir.org/2024/1/e49007 %U https://doi.org/10.2196/49007 %U http://www.ncbi.nlm.nih.gov/pubmed/38231569 %0 Journal Article %@ 2369-2960 %I JMIR Publications %V 10 %N %P e47673 %T Combining Digital and Molecular Approaches Using Health and Alternate Data Sources in a Next-Generation Surveillance System for Anticipating Outbreaks of Pandemic Potential %A Ramos,Pablo Ivan P %A Marcilio,Izabel %A Bento,Ana I %A Penna,Gerson O %A de Oliveira,Juliane F %A Khouri,Ricardo %A Andrade,Roberto F S %A Carreiro,Roberto P %A Oliveira,Vinicius de A %A Galvão,Luiz Augusto C %A Landau,Luiz %A Barreto,Mauricio L %A van der Horst,Kay %A Barral-Netto,Manoel %A , %+ Center for Data and Knowledge Integration for Health (CIDACS), Instituto Gonçalo Moniz, Fundação Oswaldo Cruz (Fiocruz), Rua Mundo, 121, Parque Tecnológico da Bahia, Edf. Tecnocentro, Trobogy, Salvador, 41745-715, Brazil, 55 7131762234, manoel.barral@fiocruz.br %K data integration %K digital public health %K infectious disease surveillance %K pandemic preparedness %K prevention %K response %D 2024 %7 9.1.2024 %9 Viewpoint %J JMIR Public Health Surveill %G English %X Globally, millions of lives are impacted every year by infectious diseases outbreaks. Comprehensive and innovative surveillance strategies aiming at early alert and timely containment of emerging and reemerging pathogens are a pressing priority. Shortcomings and delays in current pathogen surveillance practices further disturbed informing responses, interventions, and mitigation of recent pandemics, including H1N1 influenza and SARS-CoV-2. We present the design principles of the architecture for an early-alert surveillance system that leverages the vast available data landscape, including syndromic data from primary health care, drug sales, and rumors from the lay media and social media to identify areas with an increased number of cases of respiratory disease. In these potentially affected areas, an intensive and fast sample collection and advanced high-throughput genome sequencing analyses would inform on circulating known or novel pathogens by metagenomics-enabled pathogen characterization. Concurrently, the integration of bioclimatic and socioeconomic data, as well as transportation and mobility network data, into a data analytics platform, coupled with advanced mathematical modeling using artificial intelligence or machine learning, will enable more accurate estimation of outbreak spread risk. Such an approach aims to readily identify and characterize regions in the early stages of an outbreak development, as well as model risk and patterns of spread, informing targeted mitigation and control measures. A fully operational system must integrate diverse and robust data streams to translate data into actionable intelligence and actions, ultimately paving the way toward constructing next-generation surveillance systems. %M 38194263 %R 10.2196/47673 %U https://publichealth.jmir.org/2024/1/e47673 %U https://doi.org/10.2196/47673 %U http://www.ncbi.nlm.nih.gov/pubmed/38194263 %0 Journal Article %@ 2369-2960 %I JMIR Publications %V 10 %N %P e50379 %T Generating Contextual Variables From Web-Based Data for Health Research: Tutorial on Web Scraping, Text Mining, and Spatial Overlay Analysis %A Galvez-Hernandez,Pablo %A Gonzalez-Viana,Angelina %A Gonzalez-de Paz,Luis %A Shankardass,Ketan %A Muntaner,Carles %+ Institute of Health Policy, Management and Evaluation, Dalla Lana School of Public Health, University of Toronto, Health Sciences Building, 4th Fl., 155 College St, Toronto, ON, M5T 3M6, Canada, 1 6475752195, pau.galvez@utoronto.ca %K web scraping %K text mining %K spatial overlay analysis %K program evaluation %K social environment %K contextual variables %K health assets %K social connection %K multilevel analysis %K health services research %D 2024 %7 8.1.2024 %9 Tutorial %J JMIR Public Health Surveill %G English %X Background: Contextual variables that capture the characteristics of delimited geographic or jurisdictional areas are vital for health and social research. However, obtaining data sets with contextual-level data can be challenging in the absence of monitoring systems or public census data. Objective: We describe and implement an 8-step method that combines web scraping, text mining, and spatial overlay analysis (WeTMS) to transform extensive text data from government websites into analyzable data sets containing contextual data for jurisdictional areas. Methods: This tutorial describes the method and provides resources for its application by health and social researchers. We used this method to create data sets of health assets aimed at enhancing older adults’ social connections (eg, activities and resources such as walking groups and senior clubs) across the 374 health jurisdictions in Catalonia from 2015 to 2022. These assets are registered on a web-based government platform by local stakeholders from various health and nonhealth organizations as part of a national public health program. Steps 1 to 3 involved defining the variables of interest, identifying data sources, and using Python to extract information from 50,000 websites linked to the platform. Steps 4 to 6 comprised preprocessing the scraped text, defining new variables to classify health assets based on social connection constructs, analyzing word frequencies in titles and descriptions of the assets, creating topic-specific dictionaries, implementing a rule-based classifier in R, and verifying the results. Steps 7 and 8 integrate the spatial overlay analysis to determine the geographic location of each asset. We conducted a descriptive analysis of the data sets to report the characteristics of the assets identified and the patterns of asset registrations across areas. Results: We identified and extracted data from 17,305 websites describing health assets. The titles and descriptions of the activities and resources contained 12,560 and 7301 unique words, respectively. After applying our classifier and spatial analysis algorithm, we generated 2 data sets containing 9546 health assets (5022 activities and 4524 resources) with the potential to enhance social connections among older adults. Stakeholders from 318 health jurisdictions registered identified assets on the platform between July 2015 and December 2022. The agreement rate between the classification algorithm and verified data sets ranged from 62.02% to 99.47% across variables. Leisure and skill development activities were the most prevalent (1844/5022, 36.72%). Leisure and cultural associations, such as social clubs for older adults, were the most common resources (878/4524, 19.41%). Health asset registration varied across areas, ranging between 0 and 263 activities and 0 and 265 resources. Conclusions: The sequential use of WeTMS offers a robust method for generating data sets containing contextual-level variables from internet text data. This study can guide health and social researchers in efficiently generating ready-to-analyze data sets containing contextual variables. %M 38190245 %R 10.2196/50379 %U https://publichealth.jmir.org/2024/1/e50379 %U https://doi.org/10.2196/50379 %U http://www.ncbi.nlm.nih.gov/pubmed/38190245 %0 Journal Article %I JMIR Publications %V 5 %N %P e53365 %T Development of Depression Data Sets and a Language Model for Depression Detection: Mixed Methods Study %A Tumaliuan,Faye Beatriz %A Grepo,Lorelie %A Jalao,Eugene Rex %+ Department of Industrial Engineering and Operations Research, University of the Philippines Diliman, Melchor Hall, Magsaysay Avenue, Quezon City, 1101, Philippines, 63 9176593613, fayetumaliuan@gmail.com %K depression data set %K depression detection %K social media %K natural language processing %K Filipino %D 2024 %7 4.9.2024 %9 Original Paper %J JMIR Data %G English %X Background: Depression detection in social media has gained attention in recent years with the help of natural language processing (NLP) techniques. Because of the low-resource standing of Filipino depression data, valid data sets need to be created to aid various machine learning techniques in depression detection classification tasks. Objective: The primary objective is to build a depression corpus of Philippine Twitter users who were clinically diagnosed with depression by mental health professionals and develop from this a corpus of depression symptoms that can later serve as a baseline for predicting depression symptoms in the Filipino and English languages. Methods: The proposed process included the implementation of clinical screening methods with the help of clinical psychologists in the recruitment of study participants who were young adults aged 18 to 30 years. A total of 72 participants were assessed by clinical psychologists and provided their Twitter data: 60 with depression and 12 with no depression. Six participants provided 2 Twitter accounts each, making 78 Twitter accounts. A data set was developed consisting of depression symptom–annotated tweets with 13 depression categories. These were created through manual annotation in a process constructed, guided, and validated by clinical psychologists. Results: Three annotators completed the process for approximately 79,614 tweets, resulting in a substantial interannotator agreement score of 0.735 using Fleiss κ and a 95.59% psychologist validation score. A word2vec language model was developed using Filipino and English data sets to create a 300-feature word embedding that can be used in various machine learning techniques for NLP. Conclusions: This study contributes to depression research by constructing depression data sets from social media to aid NLP in the Philippine setting. These 2 validated data sets can be significant in user detection or tweet-level detection of depression in young adults in further studies. %R 10.2196/53365 %U https://data.jmir.org/2024/1/e53365 %U https://doi.org/10.2196/53365 %0 Journal Article %@ 2369-2960 %I JMIR Publications %V 8 %N 12 %P e39336 %T Crowdsourced Perceptions of Human Behavior to Improve Computational Forecasts of US National Incident Cases of COVID-19: Survey Study %A Braun,David %A Ingram,Daniel %A Ingram,David %A Khan,Bilal %A Marsh,Jessecae %A McAndrew,Thomas %+ Department of Psychology, Lehigh University, 17 Memorial Dr E, Bethlehem, PA, 18015, United States, 1 6107583000, dab414@lehigh.edu %K crowdsourcing %K COVID-19 %K forecasting %K human judgment %D 2022 %7 30.12.2022 %9 Original Paper %J JMIR Public Health Surveill %G English %X Background: Past research has shown that various signals associated with human behavior (eg, social media engagement) can benefit computational forecasts of COVID-19. One behavior that has been shown to reduce the spread of infectious agents is compliance with nonpharmaceutical interventions (NPIs). However, the extent to which the public adheres to NPIs is difficult to measure and consequently difficult to incorporate into computational forecasts of infectious diseases. Soliciting judgments from many individuals (ie, crowdsourcing) can lead to surprisingly accurate estimates of both current and future targets of interest. Therefore, asking a crowd to estimate community-level compliance with NPIs may prove to be an accurate and predictive signal of an infectious disease such as COVID-19. Objective: We aimed to show that crowdsourced perceptions of compliance with NPIs can be a fast and reliable signal that can predict the spread of an infectious agent. We showed this by measuring the correlation between crowdsourced perceptions of NPIs and US incident cases of COVID-19 1-4 weeks ahead, and evaluating whether incorporating crowdsourced perceptions improves the predictive performance of a computational forecast of incident cases. Methods: For 36 weeks from September 2020 to April 2021, we asked 2 crowds 21 questions about their perceptions of community adherence to NPIs and public health guidelines, and collected 10,120 responses. Self-reported state residency was compared to estimates from the US census to determine the representativeness of the crowds. Crowdsourced NPI signals were mapped to 21 mean perceived adherence (MEPA) signals and analyzed descriptively to investigate features, such as how MEPA signals changed over time and whether MEPA time series could be clustered into groups based on response patterns. We investigated whether MEPA signals were associated with incident cases of COVID-19 1-4 weeks ahead by (1) estimating correlations between MEPA and incident cases, and (2) including MEPA into computational forecasts. Results: The crowds were mostly geographically representative of the US population with slight overrepresentation in the Northeast. MEPA signals tended to converge toward moderate levels of compliance throughout the survey period, and an unsupervised analysis revealed signals clustered into 4 groups roughly based on the type of question being asked. Several MEPA signals linearly correlated with incident cases of COVID-19 1-4 weeks ahead at the US national level. Including questions related to social distancing, testing, and limiting large gatherings increased out-of-sample predictive performance for probabilistic forecasts of incident cases of COVID-19 1-3 weeks ahead when compared to a model that was trained on only past incident cases. Conclusions: Crowdsourced perceptions of nonpharmaceutical adherence may be an important signal to improve forecasts of the trajectory of an infectious agent and increase public health situational awareness. %M 36219845 %R 10.2196/39336 %U https://publichealth.jmir.org/2022/12/e39336 %U https://doi.org/10.2196/39336 %U http://www.ncbi.nlm.nih.gov/pubmed/36219845 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 11 %N 12 %P e42754 %T Clinical Source Data Production and Quality Control in Real-world Studies: Proposal for Development of the eSource Record System %A Wang,Bin %A Lai,Junkai %A Jin,Feifei %A Liao,Xiwen %A Zhu,Huan %A Yao,Chen %+ Peking University Clinical Research Institute, Peking University First Hospital, No. 8, Xishiku Street, Xicheng District, Beijing, 100034, China, 86 01083325822, yaochen@hsc.pku.edu.cn %K electronic medical record %K electronic health record %K eSource %K real-world data %K eSource record %K clinical research %K data collection %K data transcription %K data quality %K interoperability %D 2022 %7 23.12.2022 %9 Proposal %J JMIR Res Protoc %G English %X Background: An eSource generally includes the direct capture, collection, and storage of electronic data to simplify clinical research. It can improve data quality and patient safety and reduce clinical trial costs. There has been some eSource-related research progress in relatively large projects. However, most of these studies focused on technical explorations to improve interoperability among systems to reuse retrospective data for research. Few studies have explored source data collection and quality control during prospective data collection from a methodological perspective. Objective: This study aimed to design a clinical source data collection method that is suitable for real-world studies and meets the data quality standards for clinical research and to improve efficiency when writing electronic medical records (EMRs). Methods: On the basis of our group’s previous research experience, TransCelerate BioPharm Inc eSource logical architecture, and relevant regulations and guidelines, we designed a source data collection method and invited relevant stakeholders to optimize it. On the basis of this method, we proposed the eSource record (ESR) system as a solution and invited experts with different roles in the contract research organization company to discuss and design a flowchart for data connection between the ESR and electronic data capture (EDC). Results: The ESR method included 5 steps: research project preparation, initial survey collection, in-hospital medical record writing, out-of-hospital follow-up, and electronic case report form (eCRF) traceability. The data connection between the ESR and EDC covered the clinical research process from creating the eCRF to collecting data for the analysis. The intelligent data acquisition function of the ESR will automatically complete the empty eCRF to create an eCRF with values. When the clinical research associate and data manager conduct data verification, they can query the certified copy database through interface traceability and send data queries. The data queries are transmitted to the ESR through the EDC interface. The EDC and EMR systems interoperate through the ESR. The EMR and EDC systems transmit data to the ESR system through the data standards of the Health Level Seven Clinical Document Architecture and the Clinical Data Interchange Standards Consortium operational data model, respectively. When the implemented data standards for a given system are not consistent, the ESR will approach the problem by first automating mappings between standards and then handling extensions or corrections to a given data format through human evaluation. Conclusions: The source data collection method proposed in this study will help to realize eSource’s new strategy. The ESR solution is standardized and sustainable. It aims to ensure that research data meet the attributable, legible, contemporaneous, original, accurate, complete, consistent, enduring, and available standards for clinical research data quality and to provide a new model for prospective data collection in real-world studies. %M 36563036 %R 10.2196/42754 %U https://www.researchprotocols.org/2022/12/e42754 %U https://doi.org/10.2196/42754 %U http://www.ncbi.nlm.nih.gov/pubmed/36563036 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 10 %N 12 %P e41200 %T Identifying Patterns of Clinical Interest in Clinicians’ Treatment Preferences: Hypothesis-free Data Science Approach to Prioritizing Prescribing Outliers for Clinical Review %A MacKenna,Brian %A Curtis,Helen J %A Hopcroft,Lisa E M %A Walker,Alex J %A Croker,Richard %A Macdonald,Orla %A Evans,Stephen J W %A Inglesby,Peter %A Evans,David %A Morley,Jessica %A Bacon,Sebastian C J %A Goldacre,Ben %+ Bennett Institute for Applied Data Science, Nuffield Department of Primary Care Health Sciences, University of Oxford, Radcliffe Primary Care Building, Radcliffe Observatory Quarter, 32 Woodstock Road, Oxford, OX2 6GG, United Kingdom, 44 01865 617855, ben.goldacre@phc.ox.ac.uk %K prescribing %K NHS England %K antipsychotics %K promazine hydrochloride %K pericyazine %K clinical audit %K data science %D 2022 %7 20.12.2022 %9 Original Paper %J JMIR Med Inform %G English %X Background: Data analysis is used to identify signals suggestive of variation in treatment choice or clinical outcome. Analyses to date have generally focused on a hypothesis-driven approach. Objective: This study aimed to develop a hypothesis-free approach to identify unusual prescribing behavior in primary care data. We aimed to apply this methodology to a national data set in a cross-sectional study to identify chemicals with significant variation in use across Clinical Commissioning Groups (CCGs) for further clinical review, thereby demonstrating proof of concept for prioritization approaches. Methods: Here we report a new data-driven approach to identify unusual prescribing behaviour in primary care data. This approach first applies a set of filtering steps to identify chemicals with prescribing rate distributions likely to contain outliers, then applies two ranking approaches to identify the most extreme outliers amongst those candidates. This methodology has been applied to three months of national prescribing data (June-August 2017). Results: Our methodology provides rankings for all chemicals by administrative region. We provide illustrative results for 2 antipsychotic drugs of particular clinical interest: promazine hydrochloride and pericyazine, which rank highly by outlier metrics. Specifically, our method identifies that, while promazine hydrochloride and pericyazine are barely used by most clinicians (with national prescribing rates of 11.1 and 6.2 per 1000 antipsychotic prescriptions, respectively), they make up a substantial proportion of antipsychotic prescribing in 2 small geographic regions in England during the study period (with maximum regional prescribing rates of 298.7 and 241.1 per 1000 antipsychotic prescriptions, respectively). Conclusions: Our hypothesis-free approach is able to identify candidates for audit and review in clinical practice. To illustrate this, we provide 2 examples of 2 very unusual antipsychotics used disproportionately in 2 small geographic areas of England. %M 36538350 %R 10.2196/41200 %U https://medinform.jmir.org/2022/12/e41200 %U https://doi.org/10.2196/41200 %U http://www.ncbi.nlm.nih.gov/pubmed/36538350 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 10 %N 12 %P e40743 %T State-of-the-Art Evidence Retriever for Precision Medicine: Algorithm Development and Validation %A Jin,Qiao %A Tan,Chuanqi %A Chen,Mosha %A Yan,Ming %A Zhang,Ningyu %A Huang,Songfang %A Liu,Xiaozhong %+ Alibaba Group, No. 969 West Wen Yi Road, Yuhang District, Hangzhou, 311121, China, 86 15201162567, chuanqi.tcq@alibaba-inc.com %K precision medicine %K evidence-based medicine %K information retrieval %K active learning %K pretrained language models %K digital health intervention %K data retrieval %K big data %K algorithm development %D 2022 %7 15.12.2022 %9 Original Paper %J JMIR Med Inform %G English %X Background: Under the paradigm of precision medicine (PM), patients with the same disease can receive different personalized therapies according to their clinical and genetic features. These therapies are determined by the totality of all available clinical evidence, including results from case reports, clinical trials, and systematic reviews. However, it is increasingly difficult for physicians to find such evidence from scientific publications, whose size is growing at an unprecedented pace. Objective: In this work, we propose the PM-Search system to facilitate the retrieval of clinical literature that contains critical evidence for or against giving specific therapies to certain cancer patients. Methods: The PM-Search system combines a baseline retriever that selects document candidates at a large scale and an evidence reranker that finely reorders the candidates based on their evidence quality. The baseline retriever uses query expansion and keyword matching with the ElasticSearch retrieval engine, and the evidence reranker fits pretrained language models to expert annotations that are derived from an active learning strategy. Results: The PM-Search system achieved the best performance in the retrieval of high-quality clinical evidence at the Text Retrieval Conference PM Track 2020, outperforming the second-ranking systems by large margins (0.4780 vs 0.4238 for standard normalized discounted cumulative gain at rank 30 and 0.4519 vs 0.4193 for exponential normalized discounted cumulative gain at rank 30). Conclusions: We present PM-Search, a state-of-the-art search engine to assist the practicing of evidence-based PM. PM-Search uses a novel Bidirectional Encoder Representations from Transformers for Biomedical Text Mining–based active learning strategy that models evidence quality and improves the model performance. Our analyses show that evidence quality is a distinct aspect from general relevance, and specific modeling of evidence quality beyond general relevance is required for a PM search engine. %M 36409468 %R 10.2196/40743 %U https://medinform.jmir.org/2022/12/e40743 %U https://doi.org/10.2196/40743 %U http://www.ncbi.nlm.nih.gov/pubmed/36409468 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 10 %N 12 %P e37239 %T A Framework for Modeling and Interpreting Patient Subgroups Applied to Hospital Readmission: Visual Analytical Approach %A Bhavnani,Suresh K %A Zhang,Weibin %A Visweswaran,Shyam %A Raji,Mukaila %A Kuo,Yong-Fang %+ School of Public and Population Health, University of Texas Medical Branch, Institute for Translational Sciences, 301 University Blvd., Galveston, TX, 77555-0129, United States, 1 (409) 772 1928, subhavna@utmb.edu %K visual analytics %K Bipartite Network analysis %K hospital readmission %K precision medicine %K modeling %K Medicare %D 2022 %7 7.12.2022 %9 Original Paper %J JMIR Med Inform %G English %X Background: A primary goal of precision medicine is to identify patient subgroups and infer their underlying disease processes with the aim of designing targeted interventions. Although several studies have identified patient subgroups, there is a considerable gap between the identification of patient subgroups and their modeling and interpretation for clinical applications. Objective: This study aimed to develop and evaluate a novel analytical framework for modeling and interpreting patient subgroups (MIPS) using a 3-step modeling approach: visual analytical modeling to automatically identify patient subgroups and their co-occurring comorbidities and determine their statistical significance and clinical interpretability; classification modeling to classify patients into subgroups and measure its accuracy; and prediction modeling to predict a patient’s risk of an adverse outcome and compare its accuracy with and without patient subgroup information. Methods: The MIPS framework was developed using bipartite networks to identify patient subgroups based on frequently co-occurring high-risk comorbidities, multinomial logistic regression to classify patients into subgroups, and hierarchical logistic regression to predict the risk of an adverse outcome using subgroup membership compared with standard logistic regression without subgroup membership. The MIPS framework was evaluated for 3 hospital readmission conditions: chronic obstructive pulmonary disease (COPD), congestive heart failure (CHF), and total hip arthroplasty/total knee arthroplasty (THA/TKA) (COPD: n=29,016; CHF: n=51,550; THA/TKA: n=16,498). For each condition, we extracted cases defined as patients readmitted within 30 days of hospital discharge. Controls were defined as patients not readmitted within 90 days of discharge, matched by age, sex, race, and Medicaid eligibility. Results: In each condition, the visual analytical model identified patient subgroups that were statistically significant (Q=0.17, 0.17, 0.31; P<.001, <.001, <.05), significantly replicated (Rand Index=0.92, 0.94, 0.89; P<.001, <.001, <.01), and clinically meaningful to clinicians. In each condition, the classification model had high accuracy in classifying patients into subgroups (mean accuracy=99.6%, 99.34%, 99.86%). In 2 conditions (COPD and THA/TKA), the hierarchical prediction model had a small but statistically significant improvement in discriminating between readmitted and not readmitted patients as measured by net reclassification improvement (0.059, 0.11) but not as measured by the C-statistic or integrated discrimination improvement. Conclusions: Although the visual analytical models identified statistically and clinically significant patient subgroups, the results pinpoint the need to analyze subgroups at different levels of granularity for improving the interpretability of intra- and intercluster associations. The high accuracy of the classification models reflects the strong separation of patient subgroups, despite the size and density of the data sets. Finally, the small improvement in predictive accuracy suggests that comorbidities alone were not strong predictors of hospital readmission, and the need for more sophisticated subgroup modeling methods. Such advances could improve the interpretability and predictive accuracy of patient subgroup models for reducing the risk of hospital readmission, and beyond. %M 35537203 %R 10.2196/37239 %U https://medinform.jmir.org/2022/12/e37239 %U https://doi.org/10.2196/37239 %U http://www.ncbi.nlm.nih.gov/pubmed/35537203 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 24 %N 11 %P e42185 %T Artificial Intelligence in Intensive Care Medicine: Bibliometric Analysis %A Tang,Ri %A Zhang,Shuyi %A Ding,Chenling %A Zhu,Mingli %A Gao,Yuan %+ Department of Intensive Care Medicine, Renji Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, No 1630, Dongfang Road, Pudong New District, Shanghai, 200127, China, 86 13917816250, shuishui286@qq.com %K intensive care medicine %K artificial intelligence %K bibliometric analysis %K machine learning %K sepsis %D 2022 %7 30.11.2022 %9 Original Paper %J J Med Internet Res %G English %X Background: Interest in critical care–related artificial intelligence (AI) research is growing rapidly. However, the literature is still lacking in comprehensive bibliometric studies that measure and analyze scientific publications globally. Objective: The objective of this study was to assess the global research trends in AI in intensive care medicine based on publication outputs, citations, coauthorships between nations, and co-occurrences of author keywords. Methods: A total of 3619 documents published until March 2022 were retrieved from the Scopus database. After selecting the document type as articles, the titles and abstracts were checked for eligibility. In the final bibliometric study using VOSviewer, 1198 papers were included. The growth rate of publications, preferred journals, leading research countries, international collaborations, and top institutions were computed. Results: The number of publications increased steeply between 2018 and 2022, accounting for 72.53% (869/1198) of all the included papers. The United States and China contributed to approximately 55.17% (661/1198) of the total publications. Of the 15 most productive institutions, 9 were among the top 100 universities worldwide. Detecting clinical deterioration, monitoring, predicting disease progression, mortality, prognosis, and classifying disease phenotypes or subtypes were some of the research hot spots for AI in patients who are critically ill. Neural networks, decision support systems, machine learning, and deep learning were all commonly used AI technologies. Conclusions: This study highlights popular areas in AI research aimed at improving health care in intensive care units, offers a comprehensive look at the research trend in AI application in the intensive care unit, and provides an insight into potential collaboration and prospects for future research. The 30 articles that received the most citations were listed in detail. For AI-based clinical research to be sufficiently convincing for routine critical care practice, collaborative research efforts are needed to increase the maturity and robustness of AI-driven models. %M 36449345 %R 10.2196/42185 %U https://www.jmir.org/2022/11/e42185 %U https://doi.org/10.2196/42185 %U http://www.ncbi.nlm.nih.gov/pubmed/36449345 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 24 %N 11 %P e42261 %T Uncovering the Reasons Behind COVID-19 Vaccine Hesitancy in Serbia: Sentiment-Based Topic Modeling %A Ljajić,Adela %A Prodanović,Nikola %A Medvecki,Darija %A Bašaragin,Bojana %A Mitrović,Jelena %+ The Institute for Artificial Intelligence Research and Development of Serbia, Fruškogorska 1, Novi Sad, 21000, 381 652626347, adela.ljajic@ivi.ac.rs %K topic modeling %K sentiment analysis %K LDA %K NMF %K BERT %K vaccine hesitancy %K COVID-19 %K Twitter %K Serbian language processing %K vaccine %K public health %K NLP %K vaccination %K Serbia %D 2022 %7 17.11.2022 %9 Original Paper %J J Med Internet Res %G English %X Background: Since the first COVID-19 vaccine appeared, there has been a growing tendency to automatically determine public attitudes toward it. In particular, it was important to find the reasons for vaccine hesitancy, since it was directly correlated with pandemic protraction. Natural language processing (NLP) and public health researchers have turned to social media (eg, Twitter, Reddit, and Facebook) for user-created content from which they can gauge public opinion on vaccination. To automatically process such content, they use a number of NLP techniques, most notably topic modeling. Topic modeling enables the automatic uncovering and grouping of hidden topics in the text. When applied to content that expresses a negative sentiment toward vaccination, it can give direct insight into the reasons for vaccine hesitancy. Objective: This study applies NLP methods to classify vaccination-related tweets by sentiment polarity and uncover the reasons for vaccine hesitancy among the negative tweets in the Serbian language. Methods: To study the attitudes and beliefs behind vaccine hesitancy, we collected 2 batches of tweets that mention some aspects of COVID-19 vaccination. The first batch of 8817 tweets was manually annotated as either relevant or irrelevant regarding the COVID-19 vaccination sentiment, and then the relevant tweets were annotated as positive, negative, or neutral. We used the annotated tweets to train a sequential bidirectional encoder representations from transformers (BERT)-based classifier for 2 tweet classification tasks to augment this initial data set. The first classifier distinguished between relevant and irrelevant tweets. The second classifier used the relevant tweets and classified them as negative, positive, or neutral. This sequential classifier was used to annotate the second batch of tweets. The combined data sets resulted in 3286 tweets with a negative sentiment: 1770 (53.9%) from the manually annotated data set and 1516 (46.1%) as a result of automatic classification. Topic modeling methods (latent Dirichlet allocation [LDA] and nonnegative matrix factorization [NMF]) were applied using the 3286 preprocessed tweets to detect the reasons for vaccine hesitancy. Results: The relevance classifier achieved an F-score of 0.91 and 0.96 for relevant and irrelevant tweets, respectively. The sentiment polarity classifier achieved an F-score of 0.87, 0.85, and 0.85 for negative, neutral, and positive sentiments, respectively. By summarizing the topics obtained in both models, we extracted 5 main groups of reasons for vaccine hesitancy: concern over vaccine side effects, concern over vaccine effectiveness, concern over insufficiently tested vaccines, mistrust of authorities, and conspiracy theories. Conclusions: This paper presents a combination of NLP methods applied to find the reasons for vaccine hesitancy in Serbia. Given these reasons, it is now possible to better understand the concerns of people regarding the vaccination process. %M 36301673 %R 10.2196/42261 %U https://www.jmir.org/2022/11/e42261 %U https://doi.org/10.2196/42261 %U http://www.ncbi.nlm.nih.gov/pubmed/36301673 %0 Journal Article %@ 2369-2960 %I JMIR Publications %V 8 %N 10 %P e38450 %T Prediction of COVID-19 Infections for Municipalities in the Netherlands: Algorithm Development and Interpretation %A van der Ploeg,Tjeerd %A Gobbens,Robbert J J %+ Faculty of Health, Sports and Social Work, Inholland University of Applied Sciences, De Boelelaan 1109, Amsterdam, 1081 HV, Netherlands, 31 653519264, tvdploeg@quicknet.nl %K municipality properties %K data merging %K modeling technique %K variable selection %K prediction model %K public health %K COVID-19 %K surveillance %K static data %K Dutch public domain %K pandemic %K Wuhan %K virus %K public %K infections %K fever %K cough %K congestion %K fatigue %K symptoms %K pneumonia %K dyspnea %K death %D 2022 %7 20.10.2022 %9 Original Paper %J JMIR Public Health Surveill %G English %X Background: COVID-19 was first identified in December 2019 in the city of Wuhan, China. The virus quickly spread and was declared a pandemic on March 11, 2020. After infection, symptoms such as fever, a (dry) cough, nasal congestion, and fatigue can develop. In some cases, the virus causes severe complications such as pneumonia and dyspnea and could result in death. The virus also spread rapidly in the Netherlands, a small and densely populated country with an aging population. Health care in the Netherlands is of a high standard, but there were nevertheless problems with hospital capacity, such as the number of available beds and staff. There were also regions and municipalities that were hit harder than others. In the Netherlands, there are important data sources available for daily COVID-19 numbers and information about municipalities. Objective: We aimed to predict the cumulative number of confirmed COVID-19 infections per 10,000 inhabitants per municipality in the Netherlands, using a data set with the properties of 355 municipalities in the Netherlands and advanced modeling techniques. Methods: We collected relevant static data per municipality from data sources that were available in the Dutch public domain and merged these data with the dynamic daily number of infections from January 1, 2020, to May 9, 2021, resulting in a data set with 355 municipalities in the Netherlands and variables grouped into 20 topics. The modeling techniques random forest and multiple fractional polynomials were used to construct a prediction model for predicting the cumulative number of confirmed COVID-19 infections per 10,000 inhabitants per municipality in the Netherlands. Results: The final prediction model had an R2 of 0.63. Important properties for predicting the cumulative number of confirmed COVID-19 infections per 10,000 inhabitants in a municipality in the Netherlands were exposure to particulate matter with diameters <10 μm (PM10) in the air, the percentage of Labour party voters, and the number of children in a household. Conclusions: Data about municipality properties in relation to the cumulative number of confirmed infections in a municipality in the Netherlands can give insight into the most important properties of a municipality for predicting the cumulative number of confirmed COVID-19 infections per 10,000 inhabitants in a municipality. This insight can provide policy makers with tools to cope with COVID-19 and may also be of value in the event of a future pandemic, so that municipalities are better prepared. %M 36219835 %R 10.2196/38450 %U https://publichealth.jmir.org/2022/10/e38450 %U https://doi.org/10.2196/38450 %U http://www.ncbi.nlm.nih.gov/pubmed/36219835 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 6 %N 10 %P e39373 %T Mental Illness Concordance Between Hospital Clinical Records and Mentions in Domestic Violence Police Narratives: Data Linkage Study %A Karystianis,George %A Cabral,Rina Carines %A Adily,Armita %A Lukmanjaya,Wilson %A Schofield,Peter %A Buchan,Iain %A Nenadic,Goran %A Butler,Tony %+ School of Population Health, University of New South Wales, Level 3, Samuels Building, Gate 11, Botany Street, UNSW Kensington Campus, Sydney, 2052, Australia, 61 93852517, g.karystianis@unsw.edu.au %K data linkage %K mental health %K domestic violence %K police records %K hospital records %K text mining %D 2022 %7 20.10.2022 %9 Original Paper %J JMIR Form Res %G English %X Background: To better understand domestic violence, data sources from multiple sectors such as police, justice, health, and welfare are needed. Linking police data to data collections from other agencies could provide unique insights and promote an all-of-government response to domestic violence. The New South Wales Police Force attends domestic violence events and records information in the form of both structured data and a free-text narrative, with the latter shown to be a rich source of information on the mental health status of persons of interest (POIs) and victims, abuse types, and sustained injuries. Objective: This study aims to examine the concordance (ie, matching) between mental illness mentions extracted from the police’s event narratives and mental health diagnoses from hospital and emergency department records. Methods: We applied a rule-based text mining method on 416,441 domestic violence police event narratives between December 2005 and January 2016 to identify mental illness mentions for POIs and victims. Using different window periods (1, 3, 6, and 12 months) before and after a domestic violence event, we linked the extracted mental illness mentions of victims and POIs to clinical records from the Emergency Department Data Collection and the Admitted Patient Data Collection in New South Wales, Australia using a unique identifier for each individual in the same cohort. Results: Using a 2-year window period (ie, 12 months before and after the domestic violence event), less than 1% (3020/416,441, 0.73%) of events had a mental illness mention and also a corresponding hospital record. About 16% of domestic violence events for both POIs (382/2395, 15.95%) and victims (101/631, 16.01%) had an agreement between hospital records and police narrative mentions of mental illness. A total of 51,025/416,441 (12.25%) events for POIs and 14,802/416,441 (3.55%) events for victims had mental illness mentions in their narratives but no hospital record. Only 841 events for POIs and 919 events for victims had a documented hospital record within 48 hours of the domestic violence event. Conclusions: Our findings suggest that current surveillance systems used to report on domestic violence may be enhanced by accessing rich information (ie, mental illness) contained in police text narratives, made available for both POIs and victims through the application of text mining. Additional insights can be gained by linkage to other health and welfare data collections. %M 36264613 %R 10.2196/39373 %U https://formative.jmir.org/2022/10/e39373 %U https://doi.org/10.2196/39373 %U http://www.ncbi.nlm.nih.gov/pubmed/36264613 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 24 %N 9 %P e33720 %T Next-Generation Capabilities in Trusted Research Environments: Interview Study %A Kavianpour,Sanaz %A Sutherland,James %A Mansouri-Benssassi,Esma %A Coull,Natalie %A Jefferson,Emily %+ Health Informatics Centre, University of Dundee, Dundee, DD1 4HN, United Kingdom, 1 5153054219, j.a.sutherland@dundee.ac.uk %K data safe haven %K health data analysis %K trusted research environment %K TRE %D 2022 %7 20.9.2022 %9 Original Paper %J J Med Internet Res %G English %X Background: A Trusted Research Environment (TRE; also known as a Safe Haven) is an environment supported by trained staff and agreed processes (principles and standards), providing access to data for research while protecting patient confidentiality. Accessing sensitive data without compromising the privacy and security of the data is a complex process. Objective: This paper presents the security measures, administrative procedures, and technical approaches adopted by TREs. Methods: We contacted 73 TRE operators, 22 (30%) of whom, in the United Kingdom and internationally, agreed to be interviewed remotely under a nondisclosure agreement and to complete a questionnaire about their TRE. Results: We observed many similar processes and standards that TREs follow to adhere to the Seven Safes principles. The security processes and TRE capabilities for supporting observational studies using classical statistical methods were mature, and the requirements were well understood. However, we identified limitations in the security measures and capabilities of TREs to support “next-generation” requirements such as wide ranges of data types, ability to develop artificial intelligence algorithms and software within the environment, handling of big data, and timely import and export of data. Conclusions: We found a lack of software or other automation tools to support the community and limited knowledge of how to meet the next-generation requirements from the research community. Disclosure control for exporting artificial intelligence algorithms and software was found to be particularly challenging, and there is a clear need for additional controls to support this capability within TREs. %M 36125859 %R 10.2196/33720 %U https://www.jmir.org/2022/9/e33720 %U https://doi.org/10.2196/33720 %U http://www.ncbi.nlm.nih.gov/pubmed/36125859 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 24 %N 8 %P e38319 %T Assessing Social Media Data as a Resource for Firearm Research: Analysis of Tweets Pertaining to Firearm Deaths %A Singh,Lisa %A Gresenz,Carole Roan %A Wang,Yanchen %A Hu,Sonya %+ Department of Computer Science, Massive Data Institute, Georgetown University, 3700 O Street, NW, Washington, DC, 20057, United States, 1 2026879253, lisa.singh@georgetown.edu %K firearms %K fatalities %K Twitter %K firearm research %K social media data %D 2022 %7 25.8.2022 %9 Original Paper %J J Med Internet Res %G English %X Background: Historic constraints on research dollars and reliable information have limited firearm research. At the same time, interest in the power and potential of social media analytics, particularly in health contexts, has surged. Objective: The aim of this study is to contribute toward the goal of establishing a foundation for how social media data may best be used, alone or in conjunction with other data resources, to improve the information base for firearm research. Methods: We examined the value of social media data for estimating a firearm outcome for which robust benchmark data exist—specifically, firearm mortality, which is captured in the National Vital Statistics System (NVSS). We hand curated tweet data from the Twitter application programming interface spanning January 1, 2017, to December 31, 2018. We developed machine learning classifiers to identify tweets that pertain to firearm deaths and develop estimates of the volume of Twitter firearm discussion by month. We compared within-state variation over time in the volume of tweets pertaining to firearm deaths with within-state trends in NVSS-based estimates of firearm fatalities using Pearson linear correlations. Results: The correlation between the monthly number of firearm fatalities measured by the NVSS and the monthly volume of tweets pertaining to firearm deaths was weak (median 0.081) and highly dispersed across states (range –0.31 to 0.535). The median correlation between month-to-month changes in firearm fatalities in the NVSS and firearm deaths discussed in tweets was moderate (median 0.30) and exhibited less dispersion among states (range –0.06 to 0.69). Conclusions: Our findings suggest that Twitter data may hold value for tracking dynamics in firearm-related outcomes, particularly for relatively populous cities that are identifiable through location mentions in tweet content. The data are likely to be particularly valuable for understanding firearm outcomes not currently measured, not measured well, or not measurable through other available means. This research provides an important building block for future work that continues to develop the usefulness of social media data for firearm research. %M 36006693 %R 10.2196/38319 %U https://www.jmir.org/2022/8/e38319 %U https://doi.org/10.2196/38319 %U http://www.ncbi.nlm.nih.gov/pubmed/36006693 %0 Journal Article %@ 2817-092X %I JMIR Publications %V 1 %N 1 %P e41122 %T JMIR Neurotechnology: Connecting Clinical Neuroscience and (Information) Technology %A Kubben,Pieter %+ Faculty of Health, Medicine and Life Sciences, School for Mental Health and Neuroscience, Maastricht University Medical Center, PO Box 616, Maastricht, 6200 MD, Netherlands, 31 43 388 2222, pieter.kubben@maastrichtuniversity.nl %K neurotechnology %K neurological disorders %K treatment tools %K chronic neurological disease %K information technology %D 2022 %7 11.8.2022 %9 Editorial %J JMIR Neurotech %G English %X %R 10.2196/41122 %U https://neuro.jmir.org/2022/1/e41122 %U https://doi.org/10.2196/41122 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 24 %N 8 %P e39888 %T Deciphering the Diversity of Mental Models in Neurodevelopmental Disorders: Knowledge Graph Representation of Public Data Using Natural Language Processing %A Kaur,Manpreet %A Costello,Jeremy %A Willis,Elyse %A Kelm,Karen %A Reformat,Marek Z %A Bolduc,Francois V %+ Department of Pediatrics, University of Alberta, 3-020 Katz Building, 11315 87 Avenue, Edmonton, AB, T6G 2E1, Canada, 1 780 492 9713, fbolduc@ualberta.ca %K concept map %K neurodevelopmental disorder %K knowledge graph %K text analysis %K semantic relatedness %K PubMed %K forums %K mental model %D 2022 %7 5.8.2022 %9 Original Paper %J J Med Internet Res %G English %X Background: Understanding how individuals think about a topic, known as the mental model, can significantly improve communication, especially in the medical domain where emotions and implications are high. Neurodevelopmental disorders (NDDs) represent a group of diagnoses, affecting up to 18% of the global population, involving differences in the development of cognitive or social functions. In this study, we focus on 2 NDDs, attention deficit hyperactivity disorder (ADHD) and autism spectrum disorder (ASD), which involve multiple symptoms and interventions requiring interactions between 2 important stakeholders: parents and health professionals. There is a gap in our understanding of differences between mental models for each stakeholder, making communication between stakeholders more difficult than it could be. Objective: We aim to build knowledge graphs (KGs) from web-based information relevant to each stakeholder as proxies of mental models. These KGs will accelerate the identification of shared and divergent concerns between stakeholders. The developed KGs can help improve knowledge mobilization, communication, and care for individuals with ADHD and ASD. Methods: We created 2 data sets by collecting the posts from web-based forums and PubMed abstracts related to ADHD and ASD. We utilized the Unified Medical Language System (UMLS) to detect biomedical concepts and applied Positive Pointwise Mutual Information followed by truncated Singular Value Decomposition to obtain corpus-based concept embeddings for each data set. Each data set is represented as a KG using a property graph model. Semantic relatedness between concepts is calculated to rank the relation strength of concepts and stored in the KG as relation weights. UMLS disorder-relevant semantic types are used to provide additional categorical information about each concept’s domain. Results: The developed KGs contain concepts from both data sets, with node sizes representing the co-occurrence frequency of concepts and edge sizes representing relevance between concepts. ADHD- and ASD-related concepts from different semantic types shows diverse areas of concerns and complex needs of the conditions. KG identifies converging and diverging concepts between health professionals literature (PubMed) and parental concerns (web-based forums), which may correspond to the differences between mental models for each stakeholder. Conclusions: We show for the first time that generating KGs from web-based data can capture the complex needs of families dealing with ADHD or ASD. Moreover, we showed points of convergence between families and health professionals’ KGs. Natural language processing–based KG provides access to a large sample size, which is often a limiting factor for traditional in-person mental model mapping. Our work offers a high throughput access to mental model maps, which could be used for further in-person validation, knowledge mobilization projects, and basis for communication about potential blind spots from stakeholders in interactions about NDDs. Future research will be needed to identify how concepts could interact together differently for each stakeholder. %M 35930346 %R 10.2196/39888 %U https://www.jmir.org/2022/8/e39888 %U https://doi.org/10.2196/39888 %U http://www.ncbi.nlm.nih.gov/pubmed/35930346 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 24 %N 8 %P e37486 %T Improving the Performance of Outcome Prediction for Inpatients With Acute Myocardial Infarction Based on Embedding Representation Learned From Electronic Medical Records: Development and Validation Study %A Huang,Yanqun %A Zheng,Zhimin %A Ma,Moxuan %A Xin,Xin %A Liu,Honglei %A Fei,Xiaolu %A Wei,Lan %A Chen,Hui %+ School of Biomedical Engineering, Capital Medical University, No. 10, Xitoutiao, You An Men, Fengtai District, Beijing, 100069, China, 86 01083911545, chenhui@ccmu.edu.cn %K representation learning %K skip-gram %K feature association strengths %K feature importance %K mortality risk prediction %K acute myocardial infarction %D 2022 %7 3.8.2022 %9 Original Paper %J J Med Internet Res %G English %X Background: The widespread secondary use of electronic medical records (EMRs) promotes health care quality improvement. Representation learning that can automatically extract hidden information from EMR data has gained increasing attention. Objective: We aimed to propose a patient representation with more feature associations and task-specific feature importance to improve the outcome prediction performance for inpatients with acute myocardial infarction (AMI). Methods: Medical concepts, including patients’ age, gender, disease diagnoses, laboratory tests, structured radiological features, procedures, and medications, were first embedded into real-value vectors using the improved skip-gram algorithm, where concepts in the context windows were selected by feature association strengths measured by association rule confidence. Then, each patient was represented as the sum of the feature embeddings weighted by the task-specific feature importance, which was applied to facilitate predictive model prediction from global and local perspectives. We finally applied the proposed patient representation into mortality risk prediction for 3010 and 1671 AMI inpatients from a public data set and a private data set, respectively, and compared it with several reference representation methods in terms of the area under the receiver operating characteristic curve (AUROC), area under the precision-recall curve (AUPRC), and F1-score. Results: Compared with the reference methods, the proposed embedding-based representation showed consistently superior predictive performance on the 2 data sets, achieving mean AUROCs of 0.878 and 0.973, AUPRCs of 0.220 and 0.505, and F1-scores of 0.376 and 0.674 for the public and private data sets, respectively, while the greatest AUROCs, AUPRCs, and F1-scores among the reference methods were 0.847 and 0.939, 0.196 and 0.283, and 0.344 and 0.361 for the public and private data sets, respectively. Feature importance integrated in patient representation reflected features that were also critical in prediction tasks and clinical practice. Conclusions: The introduction of feature associations and feature importance facilitated an effective patient representation and contributed to prediction performance improvement and model interpretation. %M 35921141 %R 10.2196/37486 %U https://www.jmir.org/2022/8/e37486 %U https://doi.org/10.2196/37486 %U http://www.ncbi.nlm.nih.gov/pubmed/35921141 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 10 %N 8 %P e37817 %T A Syntactic Information–Based Classification Model for Medical Literature: Algorithm Development and Validation Study %A Tang,Wentai %A Wang,Jian %A Lin,Hongfei %A Zhao,Di %A Xu,Bo %A Zhang,Yijia %A Yang,Zhihao %+ College of Computer Science and Technology, Dalian University of Technology, No 2 Linggong Road, Ganjingzi District, Dalian, 116023, China, 86 13604119266, wangjian@dlut.edu.cn %K medical relation extraction %K syntactic features %K pruning method %K neural networks %K medical literature %K medical text %K extraction %K syntactic %K classification %K interaction %K text %K literature %K semantic %D 2022 %7 2.8.2022 %9 Original Paper %J JMIR Med Inform %G English %X Background: The ever-increasing volume of medical literature necessitates the classification of medical literature. Medical relation extraction is a typical method of classifying a large volume of medical literature. With the development of arithmetic power, medical relation extraction models have evolved from rule-based models to neural network models. The single neural network model discards the shallow syntactic information while discarding the traditional rules. Therefore, we propose a syntactic information–based classification model that complements and equalizes syntactic information to enhance the model. Objective: We aim to complete a syntactic information–based relation extraction model for more efficient medical literature classification. Methods: We devised 2 methods for enhancing syntactic information in the model. First, we introduced shallow syntactic information into the convolutional neural network to enhance nonlocal syntactic interactions. Second, we devise a cross-domain pruning method to equalize local and nonlocal syntactic interactions. Results: We experimented with 3 data sets related to the classification of medical literature. The F1 values were 65.5% and 91.5% on the BioCreative ViCPR (CPR) and Phenotype-Gene Relationship data sets, respectively, and the accuracy was 88.7% on the PubMed data set. Our model outperforms the current state-of-the-art baseline model in the experiments. Conclusions: Our model based on syntactic information effectively enhances medical relation extraction. Furthermore, the results of the experiments show that shallow syntactic information helps obtain nonlocal interaction in sentences and effectively reinforces syntactic features. It also provides new ideas for future research directions. %M 35917162 %R 10.2196/37817 %U https://medinform.jmir.org/2022/8/e37817 %U https://doi.org/10.2196/37817 %U http://www.ncbi.nlm.nih.gov/pubmed/35917162 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 6 %N 8 %P e27990 %T A Personalized Ontology-Based Decision Support System for Complex Chronic Patients: Retrospective Observational Study %A Román-Villarán,Esther %A Alvarez-Romero,Celia %A Martínez-García,Alicia %A Escobar-Rodríguez,German Antonio %A García-Lozano,María José %A Barón-Franco,Bosco %A Moreno-Gaviño,Lourdes %A Moreno-Conde,Jesús %A Rivas-González,José Antonio %A Parra-Calderón,Carlos Luis %+ Computational Health Informatics Group, Institute of Biomedicine of Seville, Virgen del Rocío University Hospital, Consejo Superior de Investigaciones Científicas, University of Seville, Avenida Manuel Siurot s/n, Seville, 41013, Spain, 34 955 013 662, celia.alvarez@juntadeandalucia.es %K adherence %K ontology %K clinical decision support system %K CDSS %K complex chronic patients %K functional validation %K multimorbidity %K polypharmacy %K atrial fibrillation %K anticoagulants %D 2022 %7 2.8.2022 %9 Original Paper %J JMIR Form Res %G English %X Background: Due to an increase in life expectancy, the prevalence of chronic diseases is also on the rise. Clinical practice guidelines (CPGs) provide recommendations for suitable interventions regarding different chronic diseases, but a deficiency in the implementation of these CPGs has been identified. The PITeS-TiiSS (Telemedicine and eHealth Innovation Platform: Information Communications Technology for Research and Information Challenges in Health Services) tool, a personalized ontology-based clinical decision support system (CDSS), aims to reduce variability, prevent errors, and consider interactions between different CPG recommendations, among other benefits. Objective: The aim of this study is to design, develop, and validate an ontology-based CDSS that provides personalized recommendations related to drug prescription. The target population is older adult patients with chronic diseases and polypharmacy, and the goal is to reduce complications related to these types of conditions while offering integrated care. Methods: A study scenario about atrial fibrillation and treatment with anticoagulants was selected to validate the tool. After this, a series of knowledge sources were identified, including CPGs, PROFUND index, LESS/CHRON criteria, and STOPP/START criteria, to extract the information. Modeling was carried out using an ontology, and mapping was done with Health Level 7 Fast Healthcare Interoperability Resources (HL7 FHIR) and Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT; International Health Terminology Standards Development Organisation). Once the CDSS was developed, validation was carried out by using a retrospective case study. Results: This project was funded in January 2015 and approved by the Virgen del Rocio University Hospital ethics committee on November 24, 2015. Two different tasks were carried out to test the functioning of the tool. First, retrospective data from a real patient who met the inclusion criteria were used. Second, the analysis of an adoption model was performed through the study of the requirements and characteristics that a CDSS must meet in order to be well accepted and used by health professionals. The results are favorable and allow the proposed research to continue to the next phase. Conclusions: An ontology-based CDSS was successfully designed, developed, and validated. However, in future work, validation in a real environment should be performed to ensure the tool is usable and reliable. %M 35916719 %R 10.2196/27990 %U https://formative.jmir.org/2022/8/e27990 %U https://doi.org/10.2196/27990 %U http://www.ncbi.nlm.nih.gov/pubmed/35916719 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 6 %N 7 %P e38068 %T Predicting Participant Engagement in a Social Media–Delivered Lifestyle Intervention Using Microlevel Conversational Data: Secondary Analysis of Data From a Pilot Randomized Controlled Trial %A Xu,Ran %A Divito,Joseph %A Bannor,Richard %A Schroeder,Matthew %A Pagoto,Sherry %+ Department of Allied Health Sciences, Institute for Collaboration in Health, Interventions, and Policy, University of Connecticut, Koons Hall, Room 326, Storrs, CT, 06269, United States, 1 860 486 2945, ran.2.xu@uconn.edu %K weight loss %K social media intervention %K engagement %K data science %K natural language processing %K NLP %K social media %K lifestyle %K machine learning %K mobile phone %D 2022 %7 28.7.2022 %9 Original Paper %J JMIR Form Res %G English %X Background: Social media–delivered lifestyle interventions have shown promising outcomes, often generating modest but significant weight loss. Participant engagement appears to be an important predictor of weight loss outcomes; however, engagement generally declines over time and is highly variable both within and across studies. Research on factors that influence participant engagement remains scant in the context of social media–delivered lifestyle interventions. Objective: This study aimed to identify predictors of participant engagement from the content generated during a social media–delivered lifestyle intervention, including characteristics of the posts, the conversation that followed the post, and participants’ previous engagement patterns. Methods: We performed secondary analyses using data from a pilot randomized trial that delivered 2 lifestyle interventions via Facebook. We analyzed 80 participants’ engagement data over a 16-week intervention period and linked them to predictors, including characteristics of the posts, conversations that followed the post, and participants’ previous engagement, using a mixed-effects model. We also performed machine learning–based classification to confirm the importance of the significant predictors previously identified and explore how well these measures can predict whether participants will engage with a specific post. Results: The probability of participants’ engagement with each post decreased by 0.28% each week (P<.001; 95% CI 0.16%-0.4%). The probability of participants engaging with posts generated by interventionists was 6.3% (P<.001; 95% CI 5.1%-7.5%) higher than posts generated by other participants. Participants also had a 6.5% (P<.001; 95% CI 4.9%-8.1%) and 6.1% (P<.001; 95% CI 4.1%-8.1%) higher probability of engaging with posts that directly mentioned weight and goals, respectively, than other types of posts. Participants were 44.8% (P<.001; 95% CI 42.8%-46.9%) and 46% (P<.001; 95% CI 44.1%-48.0%) more likely to engage with a post when they were replied to by other participants and by interventionists, respectively. A 1 SD decrease in the sentiment of the conversation on a specific post was associated with a 5.4% (P<.001; 95% CI 4.9%-5.9%) increase in the probability of participants’ subsequent engagement with the post. Participants’ engagement in previous posts was also a predictor of engagement in subsequent posts (P<.001; 95% CI 0.74%-0.79%). Moreover, using a machine learning approach, we confirmed the importance of the predictors previously identified and achieved an accuracy of 90.9% in terms of predicting participants’ engagement using a balanced testing sample with 1600 observations. Conclusions: Findings revealed several predictors of engagement derived from the content generated by interventionists and other participants. Results have implications for increasing engagement in asynchronous, remotely delivered lifestyle interventions, which could improve outcomes. Our results also point to the potential of data science and natural language processing to analyze microlevel conversational data and identify factors influencing participant engagement. Future studies should validate these results in larger trials. Trial Registration: ClinicalTrials.gov NCT02656680; https://clinicaltrials.gov/ct2/show/NCT02656680 %M 35900824 %R 10.2196/38068 %U https://formative.jmir.org/2022/7/e38068 %U https://doi.org/10.2196/38068 %U http://www.ncbi.nlm.nih.gov/pubmed/35900824 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 10 %N 7 %P e37201 %T Extraction of Explicit and Implicit Cause-Effect Relationships in Patient-Reported Diabetes-Related Tweets From 2017 to 2021: Deep Learning Approach %A Ahne,Adrian %A Khetan,Vivek %A Tannier,Xavier %A Rizvi,Md Imbesat Hassan %A Czernichow,Thomas %A Orchard,Francisco %A Bour,Charline %A Fano,Andrew %A Fagherazzi,Guy %+ Center of Epidemiology and Population Health, Inserm, Hospital Gustave Roussy, Paris-Saclay University, 20 Rue du Dr Pinel, Villejuif, 94800, France, 33 142115386, adrian.ahne@protonmail.com %K causality %K deep learning %K natural language processing %K diabetes %K social media %K causal relation extraction %K social media data %K machine learning %D 2022 %7 19.7.2022 %9 Original Paper %J JMIR Med Inform %G English %X Background: Intervening in and preventing diabetes distress requires an understanding of its causes and, in particular, from a patient’s perspective. Social media data provide direct access to how patients see and understand their disease and consequently show the causes of diabetes distress. Objective: Leveraging machine learning methods, we aim to extract both explicit and implicit cause-effect relationships in patient-reported diabetes-related tweets and provide a methodology to better understand the opinions, feelings, and observations shared within the diabetes online community from a causality perspective. Methods: More than 30 million diabetes-related tweets in English were collected between April 2017 and January 2021. Deep learning and natural language processing methods were applied to focus on tweets with personal and emotional content. A cause-effect tweet data set was manually labeled and used to train (1) a fine-tuned BERTweet model to detect causal sentences containing a causal relation and (2) a conditional random field model with Bidirectional Encoder Representations from Transformers (BERT)-based features to extract possible cause-effect associations. Causes and effects were clustered in a semisupervised approach and visualized in an interactive cause-effect network. Results: Causal sentences were detected with a recall of 68% in an imbalanced data set. A conditional random field model with BERT-based features outperformed a fine-tuned BERT model for cause-effect detection with a macro recall of 68%. This led to 96,676 sentences with cause-effect relationships. “Diabetes” was identified as the central cluster followed by “death” and “insulin.” Insulin pricing–related causes were frequently associated with death. Conclusions: A novel methodology was developed to detect causal sentences and identify both explicit and implicit, single and multiword cause, and the corresponding effect, as expressed in diabetes-related tweets leveraging BERT-based architectures and visualized as cause-effect network. Extracting causal associations in real life, patient-reported outcomes in social media data provide a useful complementary source of information in diabetes research. %M 35852829 %R 10.2196/37201 %U https://medinform.jmir.org/2022/7/e37201 %U https://doi.org/10.2196/37201 %U http://www.ncbi.nlm.nih.gov/pubmed/35852829 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 24 %N 7 %P e29056 %T Use of Multiple Correspondence Analysis and K-means to Explore Associations Between Risk Factors and Likelihood of Colorectal Cancer: Cross-sectional Study %A Florensa,Dídac %A Mateo-Fornés,Jordi %A Solsona,Francesc %A Pedrol Aige,Teresa %A Mesas Julió,Miquel %A Piñol,Ramon %A Godoy,Pere %+ Department of Computer Science, University of Lleida, Jaume II, 69, Lleida, 25001, Spain, 34 973 70 27 00, didac.florensa@gencat.cat %K colorectal cancer %K cancer registry %K multiple correspondence analysis %K k-means %K risk factors %D 2022 %7 19.7.2022 %9 Original Paper %J J Med Internet Res %G English %X Background: Previous works have shown that risk factors are associated with an increased likelihood of colorectal cancer. Objective: The purpose of this study was to detect these associations in the region of Lleida (Catalonia) by using multiple correspondence analysis (MCA) and k-means. Methods: This cross-sectional study was made up of 1083 colorectal cancer episodes between 2012 and 2015, extracted from the population-based cancer registry for the province of Lleida (Spain), the Primary Care Centers database, and the Catalan Health Service Register. The data set included risk factors such as smoking and BMI as well as sociodemographic information and tumor details. The relations between the risk factors and patient characteristics were identified using MCA and k-means. Results: The combination of these techniques helps to detect clusters of patients with similar risk factors. Risk of death is associated with being elderly and obesity or being overweight. Stage III cancer is associated with people aged ≥65 years and rural/semiurban populations, while younger people were associated with stage 0. Conclusions: MCA and k-means were significantly useful for detecting associations between risk factors and patient characteristics. These techniques have proven to be effective tools for analyzing the incidence of some factors in colorectal cancer. The outcomes obtained help corroborate suspected trends and stimulate the use of these techniques for finding the association of risk factors with the incidence of other cancers. %M 35852835 %R 10.2196/29056 %U https://www.jmir.org/2022/7/e29056 %U https://doi.org/10.2196/29056 %U http://www.ncbi.nlm.nih.gov/pubmed/35852835 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 24 %N 7 %P e38584 %T Deep Denoising of Raw Biomedical Knowledge Graph From COVID-19 Literature, LitCovid, and Pubtator: Framework Development and Validation %A Jiang,Chao %A Ngo,Victoria %A Chapman,Richard %A Yu,Yue %A Liu,Hongfang %A Jiang,Guoqian %A Zong,Nansu %+ Department of Artificial Intelligence and Informatics Research, Mayo Clinic, 200 First St SW, Rochester, MN, 55905, United States, 1 507 284 2511, Zong.Nansu@mayo.edu %K adversarial generative network %K knowledge graph %K deep denoising %K machine learning %K COVID-19 %K biomedical %K neural network %K network model %K training data %D 2022 %7 6.7.2022 %9 Original Paper %J J Med Internet Res %G English %X Background: Multiple types of biomedical associations of knowledge graphs, including COVID-19–related ones, are constructed based on co-occurring biomedical entities retrieved from recent literature. However, the applications derived from these raw graphs (eg, association predictions among genes, drugs, and diseases) have a high probability of false-positive predictions as co-occurrences in the literature do not always mean there is a true biomedical association between two entities. Objective: Data quality plays an important role in training deep neural network models; however, most of the current work in this area has been focused on improving a model’s performance with the assumption that the preprocessed data are clean. Here, we studied how to remove noise from raw knowledge graphs with limited labeled information. Methods: The proposed framework used generative-based deep neural networks to generate a graph that can distinguish the unknown associations in the raw training graph. Two generative adversarial network models, NetGAN and Cross-Entropy Low-rank Logits (CELL), were adopted for the edge classification (ie, link prediction), leveraging unlabeled link information based on a real knowledge graph built from LitCovid and Pubtator. Results: The performance of link prediction, especially in the extreme case of training data versus test data at a ratio of 1:9, demonstrated that the proposed method still achieved favorable results (area under the receiver operating characteristic curve >0.8 for the synthetic data set and 0.7 for the real data set), despite the limited amount of testing data available. Conclusions: Our preliminary findings showed the proposed framework achieved promising results for removing noise during data preprocessing of the biomedical knowledge graph, potentially improving the performance of downstream applications by providing cleaner data. %M 35658098 %R 10.2196/38584 %U https://www.jmir.org/2022/7/e38584 %U https://doi.org/10.2196/38584 %U http://www.ncbi.nlm.nih.gov/pubmed/35658098 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 24 %N 6 %P e32728 %T Research Trends in Immune Checkpoint Blockade for Melanoma: Visualization and Bibliometric Analysis %A Xu,Yantao %A Jiang,Zixi %A Kuang,Xinwei %A Chen,Xiang %A Liu,Hong %+ Department of Dermatology, Xiangya Hospital, Central South University, 87 Xiangya Road, Kaifu District, Changsha, 410005, China, 86 0731 89753999, hongliu1014@csu.edu.cn %K melanoma %K immune checkpoint blockade %K bibliometric %K research trends %K dermatology %K cancer %D 2022 %7 27.6.2022 %9 Original Paper %J J Med Internet Res %G English %X Background: Melanoma is one of the most life-threatening skin cancers; immune checkpoint blockade is widely used in the treatment of melanoma because of its remarkable efficacy. Objective: This study aimed to conduct a comprehensive bibliometric analysis of research conducted in recent decades on immune checkpoint blockade for melanoma, while exploring research trends and public interest in this topic. Methods: We summarized the articles in the Web of Science Core Collection on immune checkpoint blockade for melanoma in each year from 1999 to 2020. The R package bibliometrix was used for data extraction and visualization of the distribution of publication year and the top 10 core authors. Keyword citation burst analysis and cocitation networks were calculated with CiteSpace. A Gunn online world map was used to evaluate distribution by country and region. Ranking was performed using the Standard Competition Ranking method. Coauthorship analysis and co-occurrence were analyzed and visualized with VOSviewer. Results: After removing duplicates, a total of 9169 publications were included. The distribution of publications by year showed that the number of publications rose sharply from 2015 onwards and either reached a peak in 2020 or has yet to reach a peak. The geographical distribution indicated that there was a large gap between the number of publications in the United States and other countries. The coauthorship analysis showed that the 149 top institutions were grouped into 8 clusters, each covering approximately a single country, suggesting that international cooperation among institutions should be strengthened. The core author extraction revealed changes in the most prolific authors. The keyword analysis revealed clustering and top citation bursts. The cocitation analysis of references from 2010 to 2020 revealed the number of citations and the centrality of the top articles. Conclusions: This study revealed trends in research and public interest in immune checkpoint blockade for melanoma. Our findings suggest that the field is growing rapidly, has several core authors, and that the United States is taking the lead position. Moreover, cooperation between countries should be strengthened, and future research hot spots might focus on deeper exploration of drug mechanisms, prediction of treatment efficacy, prediction of adverse events, and new modes of administration, such as combination therapy, which may pave the way for further research. %M 35759331 %R 10.2196/32728 %U https://www.jmir.org/2022/6/e32728 %U https://doi.org/10.2196/32728 %U http://www.ncbi.nlm.nih.gov/pubmed/35759331 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 6 %N 6 %P e34366 %T Fairness in Mobile Phone–Based Mental Health Assessment Algorithms: Exploratory Study %A Park,Jinkyung %A Arunachalam,Ramanathan %A Silenzio,Vincent %A Singh,Vivek K %+ School of Communication & Information, Rutgers University, 4 Huntington Street, New Brunswick, NJ, 08901, United States, 1 848 932 7588, v.singh@rutgers.edu %K algorithmic bias %K mental health %K health equity %K medical informatics %K health information systems %K gender bias %K mobile phone %D 2022 %7 14.6.2022 %9 Original Paper %J JMIR Form Res %G English %X Background: Approximately 1 in 5 American adults experience mental illness every year. Thus, mobile phone–based mental health prediction apps that use phone data and artificial intelligence techniques for mental health assessment have become increasingly important and are being rapidly developed. At the same time, multiple artificial intelligence–related technologies (eg, face recognition and search results) have recently been reported to be biased regarding age, gender, and race. This study moves this discussion to a new domain: phone-based mental health assessment algorithms. It is important to ensure that such algorithms do not contribute to gender disparities through biased predictions across gender groups. Objective: This research aimed to analyze the susceptibility of multiple commonly used machine learning approaches for gender bias in mobile mental health assessment and explore the use of an algorithmic disparate impact remover (DIR) approach to reduce bias levels while maintaining high accuracy. Methods: First, we performed preprocessing and model training using the data set (N=55) obtained from a previous study. Accuracy levels and differences in accuracy across genders were computed using 5 different machine learning models. We selected the random forest model, which yielded the highest accuracy, for a more detailed audit and computed multiple metrics that are commonly used for fairness in the machine learning literature. Finally, we applied the DIR approach to reduce bias in the mental health assessment algorithm. Results: The highest observed accuracy for the mental health assessment was 78.57%. Although this accuracy level raises optimism, the audit based on gender revealed that the performance of the algorithm was statistically significantly different between the male and female groups (eg, difference in accuracy across genders was 15.85%; P<.001). Similar trends were obtained for other fairness metrics. This disparity in performance was found to reduce significantly after the application of the DIR approach by adapting the data used for modeling (eg, the difference in accuracy across genders was 1.66%, and the reduction is statistically significant with P<.001). Conclusions: This study grounds the need for algorithmic auditing in phone-based mental health assessment algorithms and the use of gender as a protected attribute to study fairness in such settings. Such audits and remedial steps are the building blocks for the widespread adoption of fair and accurate mental health assessment algorithms in the future. %M 35699997 %R 10.2196/34366 %U https://formative.jmir.org/2022/6/e34366 %U https://doi.org/10.2196/34366 %U http://www.ncbi.nlm.nih.gov/pubmed/35699997 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 24 %N 5 %P e32845 %T Improving Research Patient Data Repositories From a Health Data Industry Viewpoint %A Tang,Chunlei %A Ma,Jing %A Zhou,Li %A Plasek,Joseph %A He,Yuqing %A Xiong,Yun %A Zhu,Yangyong %A Huang,Yajun %A Bates,David %+ Brigham and Women’s Hospital, Harvard Medical School, 1620 Tremont Street, BS-3, Brigham and Women’s Hospital, Boston, MA, 02115, United States, 1 857 366 7211, towne.tang@gmail.com %K data science %K big data %K data mining %K data warehousing %K information storage and retrieval %D 2022 %7 11.5.2022 %9 Viewpoint %J J Med Internet Res %G English %X Organizational, administrative, and educational challenges in establishing and sustaining biomedical data science infrastructures lead to the inefficient use of Research Patient Data Repositories (RPDRs). The challenges, including but not limited to deployment, sustainability, cost optimization, collaboration, governance, security, rapid response, reliability, stability, scalability, and convenience, restrict each other and may not be naturally alleviated through traditional hardware upgrades or protocol enhancements. This article attempts to borrow data science thinking and practices in the business realm, which we call the data industry viewpoint, to improve RPDRs. %M 35544299 %R 10.2196/32845 %U https://www.jmir.org/2022/5/e32845 %U https://doi.org/10.2196/32845 %U http://www.ncbi.nlm.nih.gov/pubmed/35544299 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 24 %N 4 %P e30898 %T Social Networking Service, Patient-Generated Health Data, and Population Health Informatics: National Cross-sectional Study of Patterns and Implications of Leveraging Digital Technologies to Support Mental Health and Well-being %A Ye,Jiancheng %A Wang,Zidan %A Hai,Jiarui %+ Feinberg School of Medicine, Northwestern University, 633 N. Saint Clair St, Chicago, IL, 60611, United States, 1 312 503 3690, jiancheng.ye@u.northwestern.edu %K patient-generated health data %K social network %K population health informatics %K mental health %K social determinants of health %K health data sharing %K technology acceptability %K mobile phone %K mobile health %D 2022 %7 29.4.2022 %9 Original Paper %J J Med Internet Res %G English %X Background: The emerging health technologies and digital services provide effective ways of collecting health information and gathering patient-generated health data (PGHD), which provide a more holistic view of a patient’s health and quality of life over time, increase visibility into a patient’s adherence to a treatment plan or study protocol, and enable timely intervention before a costly care episode. Objective: Through a national cross-sectional survey in the United States, we aimed to describe and compare the characteristics of populations with and without mental health issues (depression or anxiety disorders), including physical health, sleep, and alcohol use. We also examined the patterns of social networking service use, PGHD, and attitudes toward health information sharing and activities among the participants, which provided nationally representative estimates. Methods: We drew data from the 2019 Health Information National Trends Survey of the National Cancer Institute. The participants were divided into 2 groups according to mental health status. Then, we described and compared the characteristics of the social determinants of health, health status, sleeping and drinking behaviors, and patterns of social networking service use and health information data sharing between the 2 groups. Multivariable logistic regression models were applied to assess the predictors of mental health. All the analyses were weighted to provide nationally representative estimates. Results: Participants with mental health issues were significantly more likely to be younger, White, female, and lower-income; have a history of chronic diseases; and be less capable of taking care of their own health. Regarding behavioral health, they slept <6 hours on average, had worse sleep quality, and consumed more alcohol. In addition, they were more likely to visit and share health information on social networking sites, write online diary blogs, participate in online forums or support groups, and watch health-related videos. Conclusions: This study illustrates that individuals with mental health issues have inequitable social determinants of health, poor physical health, and poor behavioral health. However, they are more likely to use social networking platforms and services, share their health information, and actively engage with PGHD. Leveraging these digital technologies and services could be beneficial for developing tailored and effective strategies for self-monitoring and self-management. %M 35486428 %R 10.2196/30898 %U https://www.jmir.org/2022/4/e30898 %U https://doi.org/10.2196/30898 %U http://www.ncbi.nlm.nih.gov/pubmed/35486428 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 10 %N 4 %P e35789 %T Generation of a Fast Healthcare Interoperability Resources (FHIR)-based Ontology for Federated Feasibility Queries in the Context of COVID-19: Feasibility Study %A Rosenau,Lorenz %A Majeed,Raphael W %A Ingenerf,Josef %A Kiel,Alexander %A Kroll,Björn %A Köhler,Thomas %A Prokosch,Hans-Ulrich %A Gruendner,Julian %+ IT Center for Clinical Research, Gebäude 64, 2.OG, Raum 05, Ratzeburger Allee 160, Lübeck, 23562, Germany, 49 451 3101 5636, lorenz.rosenau@uni-luebeck.de %K federated queries %K feasibility study %K Fast Healthcare Interoperability Resource %K FHIR Search %K CQL %K ontology %K terminology server %K query %K feasibility %K FHIR %K terminology %K development %K COVID-19 %K automation %K user interface %K map %K input %K hospital %K data %K Germany %K accessibility %K harmonized %D 2022 %7 27.4.2022 %9 Original Paper %J JMIR Med Inform %G English %X Background: The COVID-19 pandemic highlighted the importance of making research data from all German hospitals available to scientists to respond to current and future pandemics promptly. The heterogeneous data originating from proprietary systems at hospitals' sites must be harmonized and accessible. The German Corona Consensus Dataset (GECCO) specifies how data for COVID-19 patients will be standardized in Fast Healthcare Interoperability Resources (FHIR) profiles across German hospitals. However, given the complexity of the FHIR standard, the data harmonization is not sufficient to make the data accessible. A simplified visual representation is needed to reduce the technical burden, while allowing feasibility queries. Objective: This study investigates how a search ontology can be automatically generated using FHIR profiles and a terminology server. Furthermore, it describes how this ontology can be used in a user interface (UI) and how a mapping and a terminology tree created together with the ontology can translate user input into FHIR queries. Methods: We used the FHIR profiles from the GECCO data set combined with a terminology server to generate an ontology and the required mapping files for the translation. We analyzed the profiles and identified search criteria for the visual representation. In this process, we reduced the complex profiles to code value pairs for improved usability. We enriched our ontology with the necessary information to display it in a UI. We also developed an intermediate query language to transform the queries from the UI to federated FHIR requests. Separation of concerns resulted in discrepancies between the criteria used in the intermediate query format and the target query language. Therefore, a mapping was created to reintroduce all information relevant for creating the query in its target language. Further, we generated a tree representation of the ontology hierarchy, which allows resolving child concepts in the process. Results: In the scope of this project, 82 (99%) of 83 elements defined in the GECCO profile were successfully implemented. We verified our solution based on an independently developed test patient. A discrepancy between the test data and the criteria was found in 6 cases due to different versions used to generate the test data and the UI profiles, the support for specific code systems, and the evaluation of postcoordinated Systematized Nomenclature of Medicine (SNOMED) codes. Our results highlight the need for governance mechanisms for version changes, concept mapping between values from different code systems encoding the same concept, and support for different unit dimensions. Conclusions: We developed an automatic process to generate ontology and mapping files for FHIR-formatted data. Our tests found that this process works for most of our chosen FHIR profile criteria. The process established here works directly with FHIR profiles and a terminology server, making it extendable to other FHIR profiles and demonstrating that automatic ontology generation on FHIR profiles is feasible. %M 35380548 %R 10.2196/35789 %U https://medinform.jmir.org/2022/4/e35789 %U https://doi.org/10.2196/35789 %U http://www.ncbi.nlm.nih.gov/pubmed/35380548 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 24 %N 4 %P e32776 %T Mechanism of Impact of Big Data Resources on Medical Collaborative Networks From the Perspective of Transaction Efficiency of Medical Services: Survey Study %A Yuan,Junyi %A Wang,Sufen %A Pan,Changqing %+ Hospital’s Office, Shanghai Chest Hospital, Shanghai Jiao Tong University, No 241 West Huaihai Road, Xuhui District, Shanghai, 200030, China, 86 21 62805080, panchangqing@shchest.org %K medical collaborative networks %K big data resources %K transaction efficiency %D 2022 %7 21.4.2022 %9 Original Paper %J J Med Internet Res %G English %X Background: The application of big data resources and the development of medical collaborative networks (MCNs) boost each other. However, MCNs are often assumed to be exogenous. How big data resources affect the emergence, development, and evolution of endogenous MCNs has not been well explained. Objective: This study aimed to explore and understand the influence of the mechanism of a wide range of shared and private big data resources on the transaction efficiency of medical services to reveal the impact of big data resources on the emergence and development of endogenous MCNs. Methods: This study was conducted by administering a survey questionnaire to information technology staff and medical staff from 132 medical institutions in China. Data from information technology staff and medical staff were integrated. Structural equation modeling was used to test the direct impact of big data resources on transaction efficiency of medical services. For those big data resources that had no direct impact, we analyzed their indirect impact. Results: Sharing of diagnosis and treatment data (β=.222; P=.03) and sharing of medical research data (β=.289; P=.04) at the network level (as big data itself) positively directly affected the transaction efficiency of medical services. Network protection of the external link systems (β=.271; P=.008) at the level of medical institutions (as big data technology) positively directly affected the transaction efficiency of medical services. Encryption security of web-based data (as big data technology) at the level of medical institutions, medical service capacity available for external use, real-time data of diagnosis and treatment services (as big data itself) at the level of medical institutions, and policies and regulations at the network level indirectly affected the transaction efficiency through network protection of the external link systems at the level of medical institutions. Conclusions: This study found that big data technology, big data itself, and policy at the network and organizational levels interact with, and influence, each other to form the transaction efficiency of medical services. On the basis of the theory of neoclassical economics, the study highlighted the implications of big data resources for the emergence and development of endogenous MCNs. %M 35318187 %R 10.2196/32776 %U https://www.jmir.org/2022/4/e32776 %U https://doi.org/10.2196/32776 %U http://www.ncbi.nlm.nih.gov/pubmed/35318187 %0 Journal Article %@ 2371-4379 %I JMIR Publications %V 7 %N 1 %P e33213 %T Open-source Web Portal for Managing Self-reported Data and Real-world Data Donation in Diabetes Research: Platform Feasibility Study %A Cooper,Drew %A Ubben,Tebbe %A Knoll,Christine %A Ballhausen,Hanne %A O'Donnell,Shane %A Braune,Katarina %A Lewis,Dana %+ Department of Pediatric Endocrinology and Diabetes, Charité – Universitätsmedizin Berlin, Augustenburger Platz 1, Berlin, 13353, Germany, 49 30450 ext 566261, drew.cooper@charite.de %K diabetes %K type 1 diabetes %K automated insulin delivery %K diabetes technology %K open-source %K patient-reported outcomes %K real-world data %K research methods %K mixed methods %K insulin %K digital health %K web portal %D 2022 %7 31.3.2022 %9 Original Paper %J JMIR Diabetes %G English %X Background: People with diabetes and their support networks have developed open-source automated insulin delivery systems to help manage their diabetes therapy, as well as to improve their quality of life and glycemic outcomes. Under the hashtag #WeAreNotWaiting, a wealth of knowledge and real-world data have been generated by users of these systems but have been left largely untapped by research; opportunities for such multimodal studies remain open. Objective: We aimed to evaluate the feasibility of several aspects of open-source automated insulin delivery systems including challenges related to data management and security across multiple disparate web-based platforms and challenges related to implementing follow-up studies. Methods: We developed a mixed methods study to collect questionnaire responses and anonymized diabetes data donated by participants—which included adults and children with diabetes and their partners or caregivers recruited through multiple diabetes online communities. We managed both front-end participant interactions and back-end data management with our web portal (called the Gateway). Participant questionnaire data from electronic data capture (REDCap) and personal device data aggregation (Open Humans) platforms were pseudonymously and securely linked and stored within a custom-built database that used both open-source and commercial software. Participants were later given the option to include their health care providers in the study to validate their questionnaire responses; the database architecture was designed specifically with this kind of extensibility in mind. Results: Of 1052 visitors to the study landing page, 930 participated and completed at least one questionnaire. After the implementation of health care professional validation of self-reported clinical outcomes to the study, an additional 164 individuals visited the landing page, with 142 completing at least one questionnaire. Of the optional study elements, 7 participant–health care professional dyads participated in the survey, and 97 participants who completed the survey donated their anonymized medical device data. Conclusions: The platform was accessible to participants while maintaining compliance with data regulations. The Gateway formalized a system of automated data matching between multiple data sets, which was a major benefit to researchers. Scalability of the platform was demonstrated with the later addition of self-reported data validation. This study demonstrated the feasibility of custom software solutions in addressing complex study designs. The Gateway portal code has been made available open-source and can be leveraged by other research groups. %M 35357312 %R 10.2196/33213 %U https://diabetes.jmir.org/2022/1/e33213 %U https://doi.org/10.2196/33213 %U http://www.ncbi.nlm.nih.gov/pubmed/35357312 %0 Journal Article %@ 2292-9495 %I JMIR Publications %V 9 %N 1 %P e30258 %T Designing Formulae for Ranking Search Results: Mixed Methods Evaluation Study %A Douze,Laura %A Pelayo,Sylvia %A Messaadi,Nassir %A Grosjean,Julien %A Kerdelhué,Gaétan %A Marcilly,Romaric %+ Inserm, Centre d'Investigation Clinique pour les Innovations Technologiques 1403, Institut Coeur-Poumon, 3ème étage Aile Est, CS 70001, Bd du Professeur Jules Leclercq, Lille, 59037, France, 33 0362943939, laura.douze@univ-lille.fr %K information retrieval %K search engine %K topical relevance %K search result ranking %K user testing %K human factors %D 2022 %7 25.3.2022 %9 Original Paper %J JMIR Hum Factors %G English %X Background: A major factor in the success of any search engine is the relevance of the search results; a tool should sort the search results to present the most relevant documents first. Assessing the performance of the ranking formula is an important part of search engine evaluation. However, the methods currently used to evaluate ranking formulae mainly collect quantitative data and do not gather qualitative data, which help to understand what needs to be improved to tailor the formulae to their end users. Objective: This study aims to evaluate 2 different parameter settings of the ranking formula of LiSSa (the French acronym for scientific literature in health care; Department of Medical Informatics and Information), a tool that provides access to health scientific literature in French, to adapt the formula to the needs of the end users. Methods: To collect quantitative and qualitative data, user tests were carried out with representative end users of LiSSa: 10 general practitioners and 10 registrars. Participants first assessed the relevance of the search results and then rated the ranking criteria used in the 2 formulae. Verbalizations were analyzed to characterize each criterion. Results: A formula that prioritized articles representing a consensus in the field was preferred. When users assess an article’s relevance, they judge its topic, methods, and value in clinical practice. Conclusions: Following the evaluation, several improvements were implemented to give more weight to articles that match the search topic and to downgrade articles that have less informative or scientific value for the reader. Applying a qualitative methodology generates valuable user inputs to improve the ranking formula and move toward a highly usable search engine. %M 35333180 %R 10.2196/30258 %U https://humanfactors.jmir.org/2022/1/e30258 %U https://doi.org/10.2196/30258 %U http://www.ncbi.nlm.nih.gov/pubmed/35333180 %0 Journal Article %@ 2292-9495 %I JMIR Publications %V 9 %N 1 %P e31021 %T Concept Libraries for Repeatable and Reusable Research: Qualitative Study Exploring the Needs of Users %A Almowil,Zahra %A Zhou,Shang-Ming %A Brophy,Sinead %A Croxall,Jodie %+ Data Science Building, Medical School, Swansea University, Sketty, Swansea, Wales, SA2 8PP, United Kingdom, 44 07552894384, 934467@swansea.ac.uk %K electronic health records %K record linkage %K reproducible research %K clinical codes %K concept libraries %D 2022 %7 15.3.2022 %9 Original Paper %J JMIR Hum Factors %G English %X Background: Big data research in the field of health sciences is hindered by a lack of agreement on how to identify and define different conditions and their medications. This means that researchers and health professionals often have different phenotype definitions for the same condition. This lack of agreement makes it difficult to compare different study findings and hinders the ability to conduct repeatable and reusable research. Objective: This study aims to examine the requirements of various users, such as researchers, clinicians, machine learning experts, and managers, in the development of a data portal for phenotypes (a concept library). Methods: This was a qualitative study using interviews and focus group discussion. One-to-one interviews were conducted with researchers, clinicians, machine learning experts, and senior research managers in health data science (N=6) to explore their specific needs in the development of a concept library. In addition, a focus group discussion with researchers (N=14) working with the Secured Anonymized Information Linkage databank, a national eHealth data linkage infrastructure, was held to perform a SWOT (strengths, weaknesses, opportunities, and threats) analysis for the phenotyping system and the proposed concept library. The interviews and focus group discussion were transcribed verbatim, and 2 thematic analyses were performed. Results: Most of the participants thought that the prototype concept library would be a very helpful resource for conducting repeatable research, but they specified that many requirements are needed before its development. Although all the participants stated that they were aware of some existing concept libraries, most of them expressed negative perceptions about them. The participants mentioned several facilitators that would stimulate them to share their work and reuse the work of others, and they pointed out several barriers that could inhibit them from sharing their work and reusing the work of others. The participants suggested some developments that they would like to see to improve reproducible research output using routine data. Conclusions: The study indicated that most interviewees valued a concept library for phenotypes. However, only half of the participants felt that they would contribute by providing definitions for the concept library, and they reported many barriers regarding sharing their work on a publicly accessible platform. Analysis of interviews and the focus group discussion revealed that different stakeholders have different requirements, facilitators, barriers, and concerns about a prototype concept library. %M 35289755 %R 10.2196/31021 %U https://humanfactors.jmir.org/2022/1/e31021 %U https://doi.org/10.2196/31021 %U http://www.ncbi.nlm.nih.gov/pubmed/35289755 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 10 %N 3 %P e35104 %T Patient-Level Fall Risk Prediction Using the Observational Medical Outcomes Partnership’s Common Data Model: Pilot Feasibility Study %A Jung,Hyesil %A Yoo,Sooyoung %A Kim,Seok %A Heo,Eunjeong %A Kim,Borham %A Lee,Ho-Young %A Hwang,Hee %+ Office of eHealth Research and Business, Seoul National University Bundang Hospital, 172 Dolma-ro, Bundang-gu, Seongnam-si, 13620, Republic of Korea, 82 31 787 8980, yoosoo0@snubh.org %K common data model %K accidental falls %K Observational Medical Outcomes Partnership %K nursing records %K medical informatics %K health data %K electronic health record %K data model %K prediction model %K risk prediction %K fall risk %D 2022 %7 11.3.2022 %9 Original Paper %J JMIR Med Inform %G English %X Background: Falls in acute care settings threaten patients’ safety. Researchers have been developing fall risk prediction models and exploring risk factors to provide evidence-based fall prevention practices; however, such efforts are hindered by insufficient samples, limited covariates, and a lack of standardized methodologies that aid study replication. Objective: The objectives of this study were to (1) convert fall-related electronic health record data into the standardized Observational Medical Outcome Partnership's (OMOP) common data model format and (2) develop models that predict fall risk during 2 time periods. Methods: As a pilot feasibility test, we converted fall-related electronic health record data (nursing notes, fall risk assessment sheet, patient acuity assessment sheet, and clinical observation sheet) into standardized OMOP common data model format using an extraction, transformation, and load process. We developed fall risk prediction models for 2 time periods (within 7 days of admission and during the entire hospital stay) using 2 algorithms (least absolute shrinkage and selection operator logistic regression and random forest). Results: In total, 6277 nursing statements, 747,049,486 clinical observation sheet records, 1,554,775 fall risk scores, and 5,685,011 patient acuity scores were converted into OMOP common data model format. All our models (area under the receiver operating characteristic curve 0.692-0.726) performed better than the Hendrich II Fall Risk Model. Patient acuity score, fall history, age ≥60 years, movement disorder, and central nervous system agents were the most important predictors in the logistic regression models. Conclusions: To enhance model performance further, we are currently converting all nursing records into the OMOP common data model data format, which will then be included in the models. Thus, in the near future, the performance of fall risk prediction models could be improved through the application of abundant nursing records and external validation. %M 35275076 %R 10.2196/35104 %U https://medinform.jmir.org/2022/3/e35104 %U https://doi.org/10.2196/35104 %U http://www.ncbi.nlm.nih.gov/pubmed/35275076 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 24 %N 3 %P e31684 %T A National Network of Safe Havens: Scottish Perspective %A Gao,Chuang %A McGilchrist,Mark %A Mumtaz,Shahzad %A Hall,Christopher %A Anderson,Lesley Ann %A Zurowski,John %A Gordon,Sharon %A Lumsden,Joanne %A Munro,Vicky %A Wozniak,Artur %A Sibley,Michael %A Banks,Christopher %A Duncan,Chris %A Linksted,Pamela %A Hume,Alastair %A Stables,Catherine L %A Mayor,Charlie %A Caldwell,Jacqueline %A Wilde,Katie %A Cole,Christian %A Jefferson,Emily %+ Health Informatics Centre, Ninewells Hospital & Medical School, University of Dundee, Mail Box 15, , Dundee, DD1 9SY, United Kingdom, 44 (0)1382 383943, e.r.jefferson@dundee.ac.uk %K electronic health records %K Safe Haven %K data governance %D 2022 %7 9.3.2022 %9 Viewpoint %J J Med Internet Res %G English %X For over a decade, Scotland has implemented and operationalized a system of Safe Havens, which provides secure analytics platforms for researchers to access linked, deidentified electronic health records (EHRs) while managing the risk of unauthorized reidentification. In this paper, a perspective is provided on the state-of-the-art Scottish Safe Haven network, including its evolution, to define the key activities required to scale the Scottish Safe Haven network’s capability to facilitate research and health care improvement initiatives. A set of processes related to EHR data and their delivery in Scotland have been discussed. An interview with each Safe Haven was conducted to understand their services in detail, as well as their commonalities. The results show how Safe Havens in Scotland have protected privacy while facilitating the reuse of the EHR data. This study provides a common definition of a Safe Haven and promotes a consistent understanding among the Scottish Safe Haven network and the clinical and academic research community. We conclude by identifying areas where efficiencies across the network can be made to meet the needs of population-level studies at scale. %M 35262495 %R 10.2196/31684 %U https://www.jmir.org/2022/3/e31684 %U https://doi.org/10.2196/31684 %U http://www.ncbi.nlm.nih.gov/pubmed/35262495 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 24 %N 2 %P e27146 %T Age- and Sex-Specific Differences in Multimorbidity Patterns and Temporal Trends on Assessing Hospital Discharge Records in Southwest China: Network-Based Study %A Wang,Liya %A Qiu,Hang %A Luo,Li %A Zhou,Li %+ School of Computer Science and Engineering, University of Electronic Science and Technology of China, No.2006, Xiyuan Ave, West Hi-Tech Zone, Chengdu, 611731, China, 86 28 61830278, qiuhang@uestc.edu.cn %K multimorbidity pattern %K temporal trend %K network analysis %K multimorbidity prevalence %K administrative data %K longitudinal study %K regional research %D 2022 %7 25.2.2022 %9 Original Paper %J J Med Internet Res %G English %X Background: Multimorbidity represents a global health challenge, which requires a more global understanding of multimorbidity patterns and trends. However, the majority of studies completed to date have often relied on self-reported conditions, and a simultaneous assessment of the entire spectrum of chronic disease co-occurrence, especially in developing regions, has not yet been performed. Objective: We attempted to provide a multidimensional approach to understand the full spectrum of chronic disease co-occurrence among general inpatients in southwest China, in order to investigate multimorbidity patterns and temporal trends, and assess their age and sex differences. Methods: We conducted a retrospective cohort analysis based on 8.8 million hospital discharge records of about 5.0 million individuals of all ages from 2015 to 2019 in a megacity in southwest China. We examined all chronic diagnoses using the ICD-10 (International Classification of Diseases, 10th revision) codes at 3 digits and focused on chronic diseases with ≥1% prevalence for each of the age and sex strata, which resulted in a total of 149 and 145 chronic diseases in males and females, respectively. We constructed multimorbidity networks in the general population based on sex and age, and used the cosine index to measure the co-occurrence of chronic diseases. Then, we divided the networks into communities and assessed their temporal trends. Results: The results showed complex interactions among chronic diseases, with more intensive connections among males and inpatients ≥40 years old. A total of 9 chronic diseases were simultaneously classified as central diseases, hubs, and bursts in the multimorbidity networks. Among them, 5 diseases were common to both males and females, including hypertension, chronic ischemic heart disease, cerebral infarction, other cerebrovascular diseases, and atherosclerosis. The earliest leaps (degree leaps ≥6) appeared at a disorder of glycoprotein metabolism that happened at 25-29 years in males, about 15 years earlier than in females. The number of chronic diseases in the community increased over time, but the new entrants did not replace the root of the community. Conclusions: Our multimorbidity network analysis identified specific differences in the co-occurrence of chronic diagnoses by sex and age, which could help in the design of clinical interventions for inpatient multimorbidity. %M 35212632 %R 10.2196/27146 %U https://www.jmir.org/2022/2/e27146 %U https://doi.org/10.2196/27146 %U http://www.ncbi.nlm.nih.gov/pubmed/35212632 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 24 %N 2 %P e34560 %T Building a Precision Medicine Delivery Platform for Clinics: The University of California, San Francisco, BRIDGE Experience %A Bove,Riley %A Schleimer,Erica %A Sukhanov,Paul %A Gilson,Michael %A Law,Sindy M %A Barnecut,Andrew %A Miller,Bruce L %A Hauser,Stephen L %A Sanders,Stephan J %A Rankin,Katherine P %+ UCSF Weill Institute for Neurosciences, University of California, San Francisco, 1651 4th Street, San Francisco, CA, 94158, United States, 1 415 353 2069, Riley.bove@ucsf.edu %K precision medicine %K clinical implementation %K in silico trials %K clinical dashboard %K precision %K implementation %K dashboard %K design %K experience %K analytic %K tool %K analysis %K decision-making %K real time %K platform %K human-centered design %D 2022 %7 15.2.2022 %9 Viewpoint %J J Med Internet Res %G English %X Despite an ever-expanding number of analytics with the potential to impact clinical care, the field currently lacks point-of-care technological tools that allow clinicians to efficiently select disease-relevant data about their patients, algorithmically derive clinical indices (eg, risk scores), and view these data in straightforward graphical formats to inform real-time clinical decisions. Thus far, solutions to this problem have relied on either bottom-up approaches that are limited to a single clinic or generic top-down approaches that do not address clinical users’ specific setting-relevant or disease-relevant needs. As a road map for developing similar platforms, we describe our experience with building a custom but institution-wide platform that enables economies of time, cost, and expertise. The BRIDGE platform was designed to be modular and scalable and was customized to data types relevant to given clinical contexts within a major university medical center. The development process occurred by using a series of human-centered design phases with extensive, consistent stakeholder input. This institution-wide approach yielded a unified, carefully regulated, cross-specialty clinical research platform that can be launched during a patient’s electronic health record encounter. The platform pulls clinical data from the electronic health record (Epic; Epic Systems) as well as other clinical and research sources in real time; analyzes the combined data to derive clinical indices; and displays them in simple, clinician-designed visual formats specific to each disorder and clinic. By integrating an application into the clinical workflow and allowing clinicians to access data sources that would otherwise be cumbersome to assemble, view, and manipulate, institution-wide platforms represent an alternative approach to achieving the vision of true personalized medicine. %M 35166689 %R 10.2196/34560 %U https://www.jmir.org/2022/2/e34560 %U https://doi.org/10.2196/34560 %U http://www.ncbi.nlm.nih.gov/pubmed/35166689 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 11 %N 1 %P e34573 %T Digital Biomarkers for Supporting Transitional Care Decisions: Protocol for a Transnational Feasibility Study %A Petsani,Despoina %A Ahmed,Sara %A Petronikolou,Vasileia %A Kehayia,Eva %A Alastalo,Mika %A Santonen,Teemu %A Merino-Barbancho,Beatriz %A Cea,Gloria %A Segkouli,Sofia %A Stavropoulos,Thanos G %A Billis,Antonis %A Doumas,Michael %A Almeida,Rosa %A Nagy,Enikő %A Broeckx,Leen %A Bamidis,Panagiotis %A Konstantinidis,Evdokimos %+ Medical Physics and Digital Innovation Laboratory, School of Medicine, Aristotle University of Thessaloniki, University Campus, Thessaloniki, 54124, Greece, 30 6986177524, despoinapets@gmail.com %K Living Lab %K cocreation %K transitional care %K technology %K feasibility study %D 2022 %7 19.1.2022 %9 Protocol %J JMIR Res Protoc %G English %X Background: Virtual Health and Wellbeing Living Lab Infrastructure is a Horizon 2020 project that aims to harmonize Living Lab procedures and facilitate access to European health and well-being research infrastructures. In this context, this study presents a joint research activity that will be conducted within Virtual Health and Wellbeing Living Lab Infrastructure in the transitional care domain to test and validate the harmonized Living Lab procedures and infrastructures. The collection of data from various sources (information and communications technology and clinical and patient-reported outcome measures) demonstrated the capacity to assess risk and support decisions during care transitions, but there is no harmonized way of combining this information. Objective: This study primarily aims to evaluate the feasibility and benefit of collecting multichannel data across Living Labs on the topic of transitional care and to harmonize data processes and collection. In addition, the authors aim to investigate the collection and use of digital biomarkers and explore initial patterns in the data that demonstrate the potential to predict transition outcomes, such as readmissions and adverse events. Methods: The current research protocol presents a multicenter, prospective, observational cohort study that will consist of three phases, running consecutively in multiple sites: a cocreation phase, a testing and simulation phase, and a transnational pilot phase. The cocreation phase aims to build a common understanding among different sites, investigate the differences in hospitalization discharge management among countries, and the willingness of different stakeholders to use technological solutions in the transitional care process. The testing and simulation phase aims to explore ways of integrating observation of a patient’s clinical condition, patient involvement, and discharge education in transitional care. The objective of the simulation phase is to evaluate the feasibility and the barriers faced by health care professionals in assessing transition readiness. Results: The cocreation phase will be completed by April 2022. The testing and simulation phase will begin in September 2022 and will partially overlap with the deployment of the transnational pilot phase that will start in the same month. The data collection of the transnational pilots will be finalized by the end of June 2023. Data processing is expected to be completed by March 2024. The results will consist of guidelines and implementation pathways for large-scale studies and an analysis for identifying initial patterns in the acquired data. Conclusions: The knowledge acquired through this research will lead to harmonized procedures and data collection for Living Labs that support transitions in care. International Registered Report Identifier (IRRID): PRR1-10.2196/34573 %M 35044303 %R 10.2196/34573 %U https://www.researchprotocols.org/2022/1/e34573 %U https://doi.org/10.2196/34573 %U http://www.ncbi.nlm.nih.gov/pubmed/35044303 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 24 %N 1 %P e25440 %T Understanding the Nature of Metadata: Systematic Review %A Ulrich,Hannes %A Kock-Schoppenhauer,Ann-Kristin %A Deppenwiese,Noemi %A Gött,Robert %A Kern,Jori %A Lablans,Martin %A Majeed,Raphael W %A Stöhr,Mark R %A Stausberg,Jürgen %A Varghese,Julian %A Dugas,Martin %A Ingenerf,Josef %+ IT Center for Clinical Research, University of Lübeck, Ratzeburger Allee 160, Lübeck, 23564, Germany, 49 45131015607, h.ulrich@uni-luebeck.de %K metadata %K metadata definition %K systematic review %K data integration %K data identification %K data classification %D 2022 %7 11.1.2022 %9 Review %J J Med Internet Res %G English %X Background: Metadata are created to describe the corresponding data in a detailed and unambiguous way and is used for various applications in different research areas, for example, data identification and classification. However, a clear definition of metadata is crucial for further use. Unfortunately, extensive experience with the processing and management of metadata has shown that the term “metadata” and its use is not always unambiguous. Objective: This study aimed to understand the definition of metadata and the challenges resulting from metadata reuse. Methods: A systematic literature search was performed in this study following the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines for reporting on systematic reviews. Five research questions were identified to streamline the review process, addressing metadata characteristics, metadata standards, use cases, and problems encountered. This review was preceded by a harmonization process to achieve a general understanding of the terms used. Results: The harmonization process resulted in a clear set of definitions for metadata processing focusing on data integration. The following literature review was conducted by 10 reviewers with different backgrounds and using the harmonized definitions. This study included 81 peer-reviewed papers from the last decade after applying various filtering steps to identify the most relevant papers. The 5 research questions could be answered, resulting in a broad overview of the standards, use cases, problems, and corresponding solutions for the application of metadata in different research areas. Conclusions: Metadata can be a powerful tool for identifying, describing, and processing information, but its meaningful creation is costly and challenging. This review process uncovered many standards, use cases, problems, and solutions for dealing with metadata. The presented harmonized definitions and the new schema have the potential to improve the classification and generation of metadata by creating a shared understanding of metadata and its context. %M 35014967 %R 10.2196/25440 %U https://www.jmir.org/2022/1/e25440 %U https://doi.org/10.2196/25440 %U http://www.ncbi.nlm.nih.gov/pubmed/35014967 %0 Journal Article %@ 2291-5222 %I JMIR Publications %V 10 %N 1 %P e30557 %T Enabling Research and Clinical Use of Patient-Generated Health Data (the mindLAMP Platform): Digital Phenotyping Study %A Vaidyam,Aditya %A Halamka,John %A Torous,John %+ Beth Israel Deaconess Medical Center, 330 Brrokline Avenue, Boston, MA, 02215, United States, 1 6176676700, jtorous@bidmc.harvard.edu %K digital phenotyping %K mHealth %K apps %K FHIR %K digital health %K health data %K patient-generated health data %K mobile health %K smartphones %K wearables %K mobile apps %K mental health, mobile phone %D 2022 %7 7.1.2022 %9 Original Paper %J JMIR Mhealth Uhealth %G English %X Background: There is a growing need for the integration of patient-generated health data (PGHD) into research and clinical care to enable personalized, preventive, and interactive care, but technical and organizational challenges, such as the lack of standards and easy-to-use tools, preclude the effective use of PGHD generated from consumer devices, such as smartphones and wearables. Objective: This study outlines how we used mobile apps and semantic web standards such as HTTP 2.0, Representational State Transfer, JSON (JavaScript Object Notation), JSON Schema, Transport Layer Security (version 1.3), Advanced Encryption Standard-256, OpenAPI, HTML5, and Vega, in conjunction with patient and provider feedback to completely update a previous version of mindLAMP. Methods: The Learn, Assess, Manage, and Prevent (LAMP) platform addresses the abovementioned challenges in enhancing clinical insight by supporting research, data analysis, and implementation efforts around PGHD as an open-source solution with freely accessible and shared code. Results: With a simplified programming interface and novel data representation that captures additional metadata, the LAMP platform enables interoperability with existing Fast Healthcare Interoperability Resources–based health care systems as well as consumer wearables and services such as Apple HealthKit and Google Fit. The companion Cortex data analysis and machine learning toolkit offer robust support for artificial intelligence, behavioral feature extraction, interactive visualizations, and high-performance data processing through parallelization and vectorization techniques. Conclusions: The LAMP platform incorporates feedback from patients and clinicians alongside a standards-based approach to address these needs and functions across a wide range of use cases through its customizable and flexible components. These range from simple survey-based research to international consortiums capturing multimodal data to simple delivery of mindfulness exercises through personalized, just-in-time adaptive interventions. %M 34994710 %R 10.2196/30557 %U https://mhealth.jmir.org/2022/1/e30557 %U https://doi.org/10.2196/30557 %U http://www.ncbi.nlm.nih.gov/pubmed/34994710 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 24 %N 1 %P e30720 %T Sequential Data–Based Patient Similarity Framework for Patient Outcome Prediction: Algorithm Development %A Wang,Ni %A Wang,Muyu %A Zhou,Yang %A Liu,Honglei %A Wei,Lan %A Fei,Xiaolu %A Chen,Hui %+ School of Biomedical Engineering, Capital Medical University, No.10, Xitoutiao, You An Men, Fengtai District, Beijing, 100069, China, 86 010 8391 1545, chenhui@ccmu.edu.cn %K patient similarity %K electronic medical records %K time series %K acute myocardial infarction %K natural language processing %K machine learning %K deep learning %K outcome prediction %K informatics %K health data %D 2022 %7 6.1.2022 %9 Original Paper %J J Med Internet Res %G English %X Background: Sequential information in electronic medical records is valuable and helpful for patient outcome prediction but is rarely used for patient similarity measurement because of its unevenness, irregularity, and heterogeneity. Objective: We aimed to develop a patient similarity framework for patient outcome prediction that makes use of sequential and cross-sectional information in electronic medical record systems. Methods: Sequence similarity was calculated from timestamped event sequences using edit distance, and trend similarity was calculated from time series using dynamic time warping and Haar decomposition. We also extracted cross-sectional information, namely, demographic, laboratory test, and radiological report data, for additional similarity calculations. We validated the effectiveness of the framework by constructing k–nearest neighbors classifiers to predict mortality and readmission for acute myocardial infarction patients, using data from (1) a public data set and (2) a private data set, at 3 time points—at admission, on Day 7, and at discharge—to provide early warning patient outcomes. We also constructed state-of-the-art Euclidean-distance k–nearest neighbor, logistic regression, random forest, long short-term memory network, and recurrent neural network models, which were used for comparison. Results: With all available information during a hospitalization episode, predictive models using the similarity model outperformed baseline models based on both public and private data sets. For mortality predictions, all models except for the logistic regression model showed improved performances over time. There were no such increasing trends in predictive performances for readmission predictions. The random forest and logistic regression models performed best for mortality and readmission predictions, respectively, when using information from the first week after admission. Conclusions: For patient outcome predictions, the patient similarity framework facilitated sequential similarity calculations for uneven electronic medical record data and helped improve predictive performance. %M 34989682 %R 10.2196/30720 %U https://www.jmir.org/2022/1/e30720 %U https://doi.org/10.2196/30720 %U http://www.ncbi.nlm.nih.gov/pubmed/34989682 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 11 %N 1 %P e34567 %T Cocreating a Harmonized Living Lab for Big Data–Driven Hybrid Persona Development: Protocol for Cocreating, Testing, and Seeking Consensus %A Santonen,Teemu %A Petsani,Despoina %A Julin,Mikko %A Garschall,Markus %A Kropf,Johannes %A Van der Auwera,Vicky %A Bernaerts,Sylvie %A Losada,Raquel %A Almeida,Rosa %A Garatea,Jokin %A Muñoz,Idoia %A Nagy,Eniko %A Kehayia,Eva %A de Guise,Elaine %A Nadeau,Sylvie %A Azevedo,Nancy %A Segkouli,Sofia %A Lazarou,Ioulietta %A Petronikolou,Vasileia %A Bamidis,Panagiotis %A Konstantinidis,Evdokimos %+ Department of Research, Development, Innovation and Business Development, Laurea University of Applied Sciences, Vanha maantie 9, Espoo, 02650, Finland, 358 503658353, teemu.santonen@laurea.fi %K Living Lab %K everyday living %K technology %K big data %K harmonization %K personas %K small-scale real-life testing %K mobile phone %D 2022 %7 6.1.2022 %9 Protocol %J JMIR Res Protoc %G English %X Background: Living Labs are user-centered, open innovation ecosystems based on a systematic user cocreation approach, which integrates research and innovation processes in real-life communities and settings. The Horizon 2020 Project VITALISE (Virtual Health and Wellbeing Living Lab Infrastructure) unites 19 partners across 11 countries. The project aims to harmonize Living Lab procedures and enable effective and convenient transnational and virtual access to key European health and well-being research infrastructures, which are governed by Living Labs. The VITALISE consortium will conduct joint research activities in the fields included in the care pathway of patients: rehabilitation, transitional care, and everyday living environments for older adults. This protocol focuses on health and well-being research in everyday living environments. Objective: The main aim of this study is to cocreate and test a harmonized research protocol for developing big data–driven hybrid persona, which are hypothetical user archetypes created to represent a user community. In addition, the use and applicability of innovative technologies will be investigated in the context of various everyday living and Living Lab environments. Methods: In phase 1, surveys and structured interviews will be used to identify the most suitable Living Lab methods, tools, and instruments for health-related research among VITALISE project Living Labs (N=10). A series of web-based cocreation workshops and iterative cowriting processes will be applied to define the initial protocols. In phase 2, five small-scale case studies will be conducted to test the cocreated research protocols in various real-life everyday living settings and Living Lab infrastructures. In phase 3, a cross-case analysis grounded on semistructured interviews will be conducted to identify the challenges and benefits of using the proposed research protocols. Furthermore, a series of cocreation workshops and the consensus seeking Delphi study process will be conducted in parallel to cocreate and validate the acceptance of the defined harmonized research protocols among wider Living Lab communities. Results: As of September 30, 2021, project deliverables Ethics and safety manual and Living lab standard version 1 have been submitted to the European Commission review process. The study will be finished by March 2024. Conclusions: The outcome of this research will lead to harmonized procedures and protocols in the context of big data–driven hybrid persona development among health and well-being Living Labs in Europe and beyond. Harmonized protocols enable Living Labs to exploit similar research protocols, devices, hardware, and software for interventions and complex data collection purposes. Economies of scale and improved use of resources will speed up and improve research quality and offer novel possibilities for open data sharing, multidisciplinary research, and comparative studies beyond current practices. Case studies will also provide novel insights for implementing innovative technologies in the context of everyday Living Lab research. International Registered Report Identifier (IRRID): DERR1-10.2196/34567 %M 34989697 %R 10.2196/34567 %U https://www.researchprotocols.org/2022/1/e34567 %U https://doi.org/10.2196/34567 %U http://www.ncbi.nlm.nih.gov/pubmed/34989697 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 11 %N 1 %P e31365 %T A Nationwide Evaluation of the Prevalence of Human Papillomavirus in Brazil (POP-Brazil Study): Protocol for Data Quality Assurance and Control %A Horvath,Jaqueline Driemeyer Correia %A Bessel,Marina %A Kops,Natália Luiza %A Souza,Flávia Moreno Alves %A Pereira,Gerson Mendes %A Wendland,Eliana Marcia %+ Escritório de Projetos, Programa de Apoio ao Desenvolvimento Institucional do Sistema Único de Saúde, Hospital Moinhos de Vento, Rua Ramiro Barcelos 910, Porto Alegre, 90035-004, Brazil, 55 51 33143600, elianawend@gmail.com %K quality control %K quality assurance %K evidence-based medicine %K quality data %D 2022 %7 5.1.2022 %9 Protocol %J JMIR Res Protoc %G English %X Background: The credibility of a study and its internal and external validity depend crucially on the quality of the data produced. An in-depth knowledge of quality control processes is essential as large and integrative epidemiological studies are increasingly prioritized. Objective: This study aimed to describe the stages of quality control in the POP-Brazil study and to present an analysis of the quality indicators. Methods: Quality assurance and control were initiated with the planning of this nationwide, multicentric study and continued through the development of the project. All quality control protocol strategies, such as training, protocol implementation, audits, and inspection, were discussed one by one. We highlight the importance of conducting a pilot study that provides the researcher the opportunity to refine or modify the research methodology and validating the results through double data entry, test-retest, and analysis of nonresponse rates. Results: This cross-sectional, nationwide, multicentric study recruited 8628 sexually active young adults (16-25 years old) in 119 public health units between September 2016 and November 2017. The Human Research Ethics Committee of the Moinhos de Vento Hospital approved this project. Conclusions: Quality control processes are a continuum, not restricted to a single event, and are fundamental to the success of data integrity and the minimization of bias in epidemiological studies. The quality control steps described can be used as a guide to implement evidence-based, valid, reliable, and useful procedures in most observational studies to ensure data integrity. International Registered Report Identifier (IRRID): RR1-10.2196/31365 %M 34989680 %R 10.2196/31365 %U https://www.researchprotocols.org/2022/1/e31365 %U https://doi.org/10.2196/31365 %U http://www.ncbi.nlm.nih.gov/pubmed/34989680 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 12 %N %P e48892 %T NephroCAGE—German-Canadian Consortium on AI for Improved Kidney Transplantation Outcome: Protocol for an Algorithm Development and Validation Study %A Schapranow,Matthieu-P %A Bayat,Mozhgan %A Rasheed,Aadil %A Naik,Marcel %A Graf,Verena %A Schmidt,Danilo %A Budde,Klemens %A Cardinal,Héloïse %A Sapir-Pichhadze,Ruth %A Fenninger,Franz %A Sherwood,Karen %A Keown,Paul %A Günther,Oliver P %A Pandl,Konstantin D %A Leiser,Florian %A Thiebes,Scott %A Sunyaev,Ali %A Niemann,Matthias %A Schimanski,Andreas %A Klein,Thomas %+ Hasso Plattner Institute for Digital Engineering, University of Potsdam, Prof.-Dr.-Helmert-Street 2-3, Potsdam, 14482, Germany, 49 3315509 ext 1331, schapranow@hpi.de %K posttransplant risks %K kidney transplantation %K federated learning infrastructure %K clinical prediction model %K donor-recipient matching %K multinational transplant data set %D 2023 %7 22.12.2023 %9 Protocol %J JMIR Res Protoc %G English %X Background: Recent advances in hardware and software enabled the use of artificial intelligence (AI) algorithms for analysis of complex data in a wide range of daily-life use cases. We aim to explore the benefits of applying AI to a specific use case in transplant nephrology: risk prediction for severe posttransplant events. For the first time, we combine multinational real-world transplant data, which require specific legal and technical protection measures. Objective: The German-Canadian NephroCAGE consortium aims to develop and evaluate specific processes, software tools, and methods to (1) combine transplant data of more than 8000 cases over the past decades from leading transplant centers in Germany and Canada, (2) implement specific measures to protect sensitive transplant data, and (3) use multinational data as a foundation for developing high-quality prognostic AI models. Methods: To protect sensitive transplant data addressing the first and second objectives, we aim to implement a decentralized NephroCAGE federated learning infrastructure upon a private blockchain. Our NephroCAGE federated learning infrastructure enables a switch of paradigms: instead of pooling sensitive data into a central database for analysis, it enables the transfer of clinical prediction models (CPMs) to clinical sites for local data analyses. Thus, sensitive transplant data reside protected in their original sites while the comparable small algorithms are exchanged instead. For our third objective, we will compare the performance of selected AI algorithms, for example, random forest and extreme gradient boosting, as foundation for CPMs to predict severe short- and long-term posttransplant risks, for example, graft failure or mortality. The CPMs will be trained on donor and recipient data from retrospective cohorts of kidney transplant patients. Results: We have received initial funding for NephroCAGE in February 2021. All clinical partners have applied for and received ethics approval as of 2022. The process of exploration of clinical transplant database for variable extraction has started at all the centers in 2022. In total, 8120 patient records have been retrieved as of August 2023. The development and validation of CPMs is ongoing as of 2023. Conclusions: For the first time, we will (1) combine kidney transplant data from nephrology centers in Germany and Canada, (2) implement federated learning as a foundation to use such real-world transplant data as a basis for the training of CPMs in a privacy-preserving way, and (3) develop a learning software system to investigate population specifics, for example, to understand population heterogeneity, treatment specificities, and individual impact on selected posttransplant outcomes. International Registered Report Identifier (IRRID): DERR1-10.2196/48892 %M 38133915 %R 10.2196/48892 %U https://www.researchprotocols.org/2023/1/e48892 %U https://doi.org/10.2196/48892 %U http://www.ncbi.nlm.nih.gov/pubmed/38133915 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e51471 %T Effects of Internal and External Factors on Hospital Data Breaches: Quantitative Study %A Dolezel,Diane %A Beauvais,Brad %A Stigler Granados,Paula %A Fulton,Lawrence %A Kruse,Clemens Scott %+ Health Informatics & Information Management Department, Texas State University, 100 Bobcat Way, Round Rock, TX, 78665, United States, 1 512 716 2840, dd30@txstate.edu %K data breach %K security %K geospatial %K predictive %K mobile phone %D 2023 %7 21.12.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Health care data breaches are the most rapidly increasing type of cybercrime; however, the predictors of health care data breaches are uncertain. Objective: This quantitative study aims to develop a predictive model to explain the number of hospital data breaches at the county level. Methods: This study evaluated data consolidated at the county level from 1032 short-term acute care hospitals. We considered the association between data breach occurrence (a dichotomous variable), predictors based on county demographics, and socioeconomics, average hospital workload, facility type, and average performance on several hospital financial metrics using 3 model types: logistic regression, perceptron, and support vector machine. Results: The model coefficient performance metrics indicated convergent validity across the 3 model types for all variables except bad debt and the factor level accounting for counties with >20% and up to 40% Hispanic populations, both of which had mixed coefficient directionality. The support vector machine model performed the classification task best based on all metrics (accuracy, precision, recall, F1-score). All the 3 models performed the classification task well with directional congruence of weights. From the logistic regression model, the top 5 odds ratios (indicating a higher risk of breach) included inpatient workload, medical center status, pediatric trauma center status, accounts receivable, and the number of outpatient visits, in high to low order. The bottom 5 odds ratios (indicating the lowest odds of experiencing a data breach) occurred for counties with Black populations of >20% and <40%, >80% and <100%, and >40% but <60%, as well as counties with ≤20% Asian or between 80% and 100% Hispanic individuals. Our results are in line with those of other studies that determined that patient workload, facility type, and financial outcomes were associated with the likelihood of health care data breach occurrence. Conclusions: The results of this study provide a predictive model for health care data breaches that may guide health care managers to reduce the risk of data breaches by raising awareness of the risk factors. %M 38127426 %R 10.2196/51471 %U https://www.jmir.org/2023/1/e51471 %U https://doi.org/10.2196/51471 %U http://www.ncbi.nlm.nih.gov/pubmed/38127426 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e44599 %T Tensorial Principal Component Analysis in Detecting Temporal Trajectories of Purchase Patterns in Loyalty Card Data: Retrospective Cohort Study %A Autio,Reija %A Virta,Joni %A Nordhausen,Klaus %A Fogelholm,Mikael %A Erkkola,Maijaliisa %A Nevalainen,Jaakko %+ Faculty of Social Sciences (Health Sciences), Tampere University, P.O. Box 100, Tampere, FI-33014, Finland, 358 50 318 7364, reija.autio@tuni.fi %K tensorial data %K principal components %K loyalty card data %K purchase pattern %K food expenditure %K seasonality %K food %K diet %D 2023 %7 15.12.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Loyalty card data automatically collected by retailers provide an excellent source for evaluating health-related purchase behavior of customers. The data comprise information on every grocery purchase, including expenditures on product groups and the time of purchase for each customer. Such data where customers have an expenditure value for every product group for each time can be formulated as 3D tensorial data. Objective: This study aimed to use the modern tensorial principal component analysis (PCA) method to uncover the characteristics of health-related purchase patterns from loyalty card data. Another aim was to identify card holders with distinct purchase patterns. We also considered the interpretation, advantages, and challenges of tensorial PCA compared with standard PCA. Methods: Loyalty card program members from the largest retailer in Finland were invited to participate in this study. Our LoCard data consist of the purchases of 7251 card holders who consented to the use of their data from the year 2016. The purchases were reclassified into 55 product groups and aggregated across 52 weeks. The data were then analyzed using tensorial PCA, allowing us to effectively reduce the time and product group-wise dimensions simultaneously. The augmentation method was used for selecting the suitable number of principal components for the analysis. Results: Using tensorial PCA, we were able to systematically search for typical food purchasing patterns across time and product groups as well as detect different purchasing behaviors across groups of card holders. For example, we identified customers who purchased large amounts of meat products and separated them further into groups based on time profiles, that is, customers whose purchases of meat remained stable, increased, or decreased throughout the year or varied between seasons of the year. Conclusions: Using tensorial PCA, we can effectively examine customers’ purchasing behavior in more detail than with traditional methods because it can handle time and product group dimensions simultaneously. When interpreting the results, both time and product dimensions must be considered. In further analyses, these time and product groups can be directly associated with additional consumer characteristics such as socioeconomic and demographic predictors of dietary patterns. In addition, they can be linked to external factors that impact grocery purchases such as inflation and unexpected pandemics. This enables us to identify what types of people have specific purchasing patterns, which can help in the development of ways in which consumers can be steered toward making healthier food choices. %M 38100168 %R 10.2196/44599 %U https://www.jmir.org/2023/1/e44599 %U https://doi.org/10.2196/44599 %U http://www.ncbi.nlm.nih.gov/pubmed/38100168 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 7 %N %P e50027 %T Traceable Research Data Sharing in a German Medical Data Integration Center With FAIR (Findability, Accessibility, Interoperability, and Reusability)-Geared Provenance Implementation: Proof-of-Concept Study %A Gierend,Kerstin %A Waltemath,Dagmar %A Ganslandt,Thomas %A Siegel,Fabian %+ Department of Biomedical Informatics at the Center for Preventive Medicine and Digital Health, Medical Faculty Mannheim, Heidelberg University, Theodor-Kutzer-Ufer 1-3, Mannheim, 68167, Germany, 49 621383 ext 8087, kerstin.gierend@medma.uni-heidelberg.de %K provenance %K traceability %K data management %K metadata %K data integrity %K data integration center %K medical informatics %D 2023 %7 7.12.2023 %9 Original Paper %J JMIR Form Res %G English %X Background: Secondary investigations into digital health records, including electronic patient data from German medical data integration centers (DICs), pave the way for enhanced future patient care. However, only limited information is captured regarding the integrity, traceability, and quality of the (sensitive) data elements. This lack of detail diminishes trust in the validity of the collected data. From a technical standpoint, adhering to the widely accepted FAIR (Findability, Accessibility, Interoperability, and Reusability) principles for data stewardship necessitates enriching data with provenance-related metadata. Provenance offers insights into the readiness for the reuse of a data element and serves as a supplier of data governance. Objective: The primary goal of this study is to augment the reusability of clinical routine data within a medical DIC for secondary utilization in clinical research. Our aim is to establish provenance traces that underpin the status of data integrity, reliability, and consequently, trust in electronic health records, thereby enhancing the accountability of the medical DIC. We present the implementation of a proof-of-concept provenance library integrating international standards as an initial step. Methods: We adhered to a customized road map for a provenance framework, and examined the data integration steps across the ETL (extract, transform, and load) phases. Following a maturity model, we derived requirements for a provenance library. Using this research approach, we formulated a provenance model with associated metadata and implemented a proof-of-concept provenance class. Furthermore, we seamlessly incorporated the internationally recognized Word Wide Web Consortium (W3C) provenance standard, aligned the resultant provenance records with the interoperable health care standard Fast Healthcare Interoperability Resources, and presented them in various representation formats. Ultimately, we conducted a thorough assessment of provenance trace measurements. Results: This study marks the inaugural implementation of integrated provenance traces at the data element level within a German medical DIC. We devised and executed a practical method that synergizes the robustness of quality- and health standard–guided (meta)data management practices. Our measurements indicate commendable pipeline execution times, attaining notable levels of accuracy and reliability in processing clinical routine data, thereby ensuring accountability in the medical DIC. These findings should inspire the development of additional tools aimed at providing evidence-based and reliable electronic health record services for secondary use. Conclusions: The research method outlined for the proof-of-concept provenance class has been crafted to promote effective and reliable core data management practices. It aims to enhance biomedical data by imbuing it with meaningful provenance, thereby bolstering the benefits for both research and society. Additionally, it facilitates the streamlined reuse of biomedical data. As a result, the system mitigates risks, as data analysis without knowledge of the origin and quality of all data elements is rendered futile. While the approach was initially developed for the medical DIC use case, these principles can be universally applied throughout the scientific domain. %M 38060305 %R 10.2196/50027 %U https://formative.jmir.org/2023/1/e50027 %U https://doi.org/10.2196/50027 %U http://www.ncbi.nlm.nih.gov/pubmed/38060305 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 11 %N %P e44639 %T Patient Information Summarization in Clinical Settings: Scoping Review %A Keszthelyi,Daniel %A Gaudet-Blavignac,Christophe %A Bjelogrlic,Mina %A Lovis,Christian %+ Division of Medical Information Sciences, University Hospitals of Geneva, Rue Gabrielle-Perret-Gentil 4, Geneva, 1205, Switzerland, 41 223726201, Daniel.Keszthelyi@unige.ch %K summarization %K electronic health records %K EHR %K medical record %K visualization %K dashboard %K natural language processing %D 2023 %7 28.11.2023 %9 Review %J JMIR Med Inform %G English %X Background: Information overflow, a common problem in the present clinical environment, can be mitigated by summarizing clinical data. Although there are several solutions for clinical summarization, there is a lack of a complete overview of the research relevant to this field. Objective: This study aims to identify state-of-the-art solutions for clinical summarization, to analyze their capabilities, and to identify their properties. Methods: A scoping review of articles published between 2005 and 2022 was conducted. With a clinical focus, PubMed and Web of Science were queried to find an initial set of reports, later extended by articles found through a chain of citations. The included reports were analyzed to answer the questions of where, what, and how medical information is summarized; whether summarization conserves temporality, uncertainty, and medical pertinence; and how the propositions are evaluated and deployed. To answer how information is summarized, methods were compared through a new framework “collect—synthesize—communicate” referring to information gathering from data, its synthesis, and communication to the end user. Results: Overall, 128 articles were included, representing various medical fields. Exclusively structured data were used as input in 46.1% (59/128) of papers, text in 41.4% (53/128) of articles, and both in 10.2% (13/128) of papers. Using the proposed framework, 42.2% (54/128) of the records contributed to information collection, 27.3% (35/128) contributed to information synthesis, and 46.1% (59/128) presented solutions for summary communication. Numerous summarization approaches have been presented, including extractive (n=13) and abstractive summarization (n=19); topic modeling (n=5); summary specification (n=11); concept and relation extraction (n=30); visual design considerations (n=59); and complete pipelines (n=7) using information extraction, synthesis, and communication. Graphical displays (n=53), short texts (n=41), static reports (n=7), and problem-oriented views (n=7) were the most common types in terms of summary communication. Although temporality and uncertainty information were usually not conserved in most studies (74/128, 57.8% and 113/128, 88.3%, respectively), some studies presented solutions to treat this information. Overall, 115 (89.8%) articles showed results of an evaluation, and methods included evaluations with human participants (median 15, IQR 24 participants): measurements in experiments with human participants (n=31), real situations (n=8), and usability studies (n=28). Methods without human involvement included intrinsic evaluation (n=24), performance on a proxy (n=10), or domain-specific tasks (n=11). Overall, 11 (8.6%) reports described a system deployed in clinical settings. Conclusions: The scientific literature contains many propositions for summarizing patient information but reports very few comparisons of these proposals. This work proposes to compare these algorithms through how they conserve essential aspects of clinical information and through the “collect—synthesize—communicate” framework. We found that current propositions usually address these 3 steps only partially. Moreover, they conserve and use temporality, uncertainty, and pertinent medical aspects to varying extents, and solutions are often preliminary. %M 38015588 %R 10.2196/44639 %U https://medinform.jmir.org/2023/1/e44639 %U https://doi.org/10.2196/44639 %U http://www.ncbi.nlm.nih.gov/pubmed/38015588 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 11 %N %P e47859 %T Synthetic Tabular Data Based on Generative Adversarial Networks in Health Care: Generation and Validation Using the Divide-and-Conquer Strategy %A Kang,Ha Ye Jin %A Batbaatar,Erdenebileg %A Choi,Dong-Woo %A Choi,Kui Son %A Ko,Minsam %A Ryu,Kwang Sun %+ National Cancer Data Center, National Cancer Control Institute, National Cancer Center, 323 Ilsan-ro, Ilsandong-gu, Goyang-si, Gyeonggi-do, 10408, Republic of Korea, 82 31 920 0652, niceplay13@ncc.re.kr %K generative adversarial networks %K GAN %K synthetic data generation %K synthetic tabular data %K lung cancer %K machine learning %K mortality prediction %D 2023 %7 24.11.2023 %9 Original Paper %J JMIR Med Inform %G English %X Background: Synthetic data generation (SDG) based on generative adversarial networks (GANs) is used in health care, but research on preserving data with logical relationships with synthetic tabular data (STD) remains challenging. Filtering methods for SDG can lead to the loss of important information. Objective: This study proposed a divide-and-conquer (DC) method to generate STD based on the GAN algorithm, while preserving data with logical relationships. Methods: The proposed method was evaluated on data from the Korea Association for Lung Cancer Registry (KALC-R) and 2 benchmark data sets (breast cancer and diabetes). The DC-based SDG strategy comprises 3 steps: (1) We used 2 different partitioning methods (the class-specific criterion distinguished between survival and death groups, while the Cramer V criterion identified the highest correlation between columns in the original data); (2) the entire data set was divided into a number of subsets, which were then used as input for the conditional tabular generative adversarial network and the copula generative adversarial network to generate synthetic data; and (3) the generated synthetic data were consolidated into a single entity. For validation, we compared DC-based SDG and conditional sampling (CS)–based SDG through the performances of machine learning models. In addition, we generated imbalanced and balanced synthetic data for each of the 3 data sets and compared their performance using 4 classifiers: decision tree (DT), random forest (RF), Extreme Gradient Boosting (XGBoost), and light gradient-boosting machine (LGBM) models. Results: The synthetic data of the 3 diseases (non–small cell lung cancer [NSCLC], breast cancer, and diabetes) generated by our proposed model outperformed the 4 classifiers (DT, RF, XGBoost, and LGBM). The CS- versus DC-based model performances were compared using the mean area under the curve (SD) values: 74.87 (SD 0.77) versus 63.87 (SD 2.02) for NSCLC, 73.31 (SD 1.11) versus 67.96 (SD 2.15) for breast cancer, and 61.57 (SD 0.09) versus 60.08 (SD 0.17) for diabetes (DT); 85.61 (SD 0.29) versus 79.01 (SD 1.20) for NSCLC, 78.05 (SD 1.59) versus 73.48 (SD 4.73) for breast cancer, and 59.98 (SD 0.24) versus 58.55 (SD 0.17) for diabetes (RF); 85.20 (SD 0.82) versus 76.42 (SD 0.93) for NSCLC, 77.86 (SD 2.27) versus 68.32 (SD 2.37) for breast cancer, and 60.18 (SD 0.20) versus 58.98 (SD 0.29) for diabetes (XGBoost); and 85.14 (SD 0.77) versus 77.62 (SD 1.85) for NSCLC, 78.16 (SD 1.52) versus 70.02 (SD 2.17) for breast cancer, and 61.75 (SD 0.13) versus 61.12 (SD 0.23) for diabetes (LGBM). In addition, we found that balanced synthetic data performed better. Conclusions: This study is the first attempt to generate and validate STD based on a DC approach and shows improved performance using STD. The necessity for balanced SDG was also demonstrated. %M 37999942 %R 10.2196/47859 %U https://medinform.jmir.org/2023/1/e47859 %U https://doi.org/10.2196/47859 %U http://www.ncbi.nlm.nih.gov/pubmed/37999942 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e49314 %T A Conference (Missingness in Action) to Address Missingness in Data and AI in Health Care: Qualitative Thematic Analysis %A Rose,Christian %A Barber,Rachel %A Preiksaitis,Carl %A Kim,Ireh %A Mishra,Nikesh %A Kayser,Kristen %A Brown,Italo %A Gisondi,Michael %+ Department of Emergency Medicine, Stanford University School of Medicine, 900 Welch Road, Palo Alto, CA, 94304, United States, 1 415 915 9585, ccrose@stanford.edu %K machine learning %K artificial intelligence %K health care data %K data quality %K thematic analysis %K AI %K implementation %K digital conference %K data quality %K trust %K privacy %K predictive model %K health care community %D 2023 %7 23.11.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Missingness in health care data poses significant challenges in the development and implementation of artificial intelligence (AI) and machine learning solutions. Identifying and addressing these challenges is critical to ensuring the continued growth and accuracy of these models as well as their equitable and effective use in health care settings. Objective: This study aims to explore the challenges, opportunities, and potential solutions related to missingness in health care data for AI applications through the conduct of a digital conference and thematic analysis of conference proceedings. Methods: A digital conference was held in September 2022, attracting 861 registered participants, with 164 (19%) attending the live event. The conference featured presentations and panel discussions by experts in AI, machine learning, and health care. Transcripts of the event were analyzed using the stepwise framework of Braun and Clark to identify key themes related to missingness in health care data. Results: Three principal themes—data quality and bias, human input in model development, and trust and privacy—emerged from the analysis. Topics included the accuracy of predictive models, lack of inclusion of underrepresented communities, partnership with physicians and other populations, challenges with sensitive health care data, and fostering trust with patients and the health care community. Conclusions: Addressing the challenges of data quality, human input, and trust is vital when devising and using machine learning algorithms in health care. Recommendations include expanding data collection efforts to reduce gaps and biases, involving medical professionals in the development and implementation of AI models, and developing clear ethical guidelines to safeguard patient privacy. Further research and ongoing discussions are needed to ensure these conclusions remain relevant as health care and AI continue to evolve. %M 37995113 %R 10.2196/49314 %U https://www.jmir.org/2023/1/e49314 %U https://doi.org/10.2196/49314 %U http://www.ncbi.nlm.nih.gov/pubmed/37995113 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e47066 %T Public Preferences for Digital Health Data Sharing: Discrete Choice Experiment Study in 12 European Countries %A Biasiotto,Roberta %A Viberg Johansson,Jennifer %A Alemu,Melaku Birhanu %A Romano,Virginia %A Bentzen,Heidi Beate %A Kaye,Jane %A Ancillotti,Mirko %A Blom,Johanna Maria Catharina %A Chassang,Gauthier %A Hallinan,Dara %A Jónsdóttir,Guðbjörg Andrea %A Monasterio Astobiza,Aníbal %A Rial-Sebbag,Emmanuelle %A Rodríguez-Arias,David %A Shah,Nisha %A Skovgaard,Lea %A Staunton,Ciara %A Tschigg,Katharina %A Veldwijk,Jorien %A Mascalzoni,Deborah %+ Institute for Biomedicine (Affiliated Institute of the University of Lübeck), Eurac Research, Via A. Volta 21, Bolzano, Italy, 39 0471 055 488, roberta.biasiotto@eurac.edu %K governance %K digital health data %K preferences %K Europe %K discrete choice experiment %K data use %K data sharing %K secondary use of data %D 2023 %7 23.11.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: With new technologies, health data can be collected in a variety of different clinical, research, and public health contexts, and then can be used for a range of new purposes. Establishing the public’s views about digital health data sharing is essential for policy makers to develop effective harmonization initiatives for digital health data governance at the European level. Objective: This study investigated public preferences for digital health data sharing. Methods: A discrete choice experiment survey was administered to a sample of European residents in 12 European countries (Austria, Denmark, France, Germany, Iceland, Ireland, Italy, the Netherlands, Norway, Spain, Sweden, and the United Kingdom) from August 2020 to August 2021. Respondents answered whether hypothetical situations of data sharing were acceptable for them. Each hypothetical scenario was defined by 5 attributes (“data collector,” “data user,” “reason for data use,” “information on data sharing and consent,” and “availability of review process”), which had 3 to 4 attribute levels each. A latent class model was run across the whole data set and separately for different European regions (Northern, Central, and Southern Europe). Attribute relative importance was calculated for each latent class’s pooled and regional data sets. Results: A total of 5015 completed surveys were analyzed. In general, the most important attribute for respondents was the availability of information and consent during health data sharing. In the latent class model, 4 classes of preference patterns were identified. While respondents in 2 classes strongly expressed their preferences for data sharing with opposing positions, respondents in the other 2 classes preferred not to share their data, but attribute levels of the situation could have had an impact on their preferences. Respondents generally found the following to be the most acceptable: a national authority or academic research project as the data user; being informed and asked to consent; and a review process for data transfer and use, or transfer only. On the other hand, collection of their data by a technological company and data use for commercial communication were the least acceptable. There was preference heterogeneity across Europe and within European regions. Conclusions: This study showed the importance of transparency in data use and oversight of health-related data sharing for European respondents. Regional and intraregional preference heterogeneity for “data collector,” “data user,” “reason,” “type of consent,” and “review” calls for governance solutions that would grant data subjects the ability to control their digital health data being shared within different contexts. These results suggest that the use of data without consent will demand weighty and exceptional reasons. An interactive and dynamic informed consent model combined with oversight mechanisms may be a solution for policy initiatives aiming to harmonize health data use across Europe. %M 37995125 %R 10.2196/47066 %U https://www.jmir.org/2023/1/e47066 %U https://doi.org/10.2196/47066 %U http://www.ncbi.nlm.nih.gov/pubmed/37995125 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 7 %N %P e50998 %T Potential Schizophrenia Disease-Related Genes Prediction Using Metagraph Representations Based on a Protein-Protein Interaction Keyword Network: Framework Development and Validation %A Yu,Shirui %A Wang,Ziyang %A Nan,Jiale %A Li,Aihua %A Yang,Xuemei %A Tang,Xiaoli %+ Institute of Medical Information, Chinese Academy of Medical Sciences, No 69 Dongdan North Street, Beijing, 100005, China, 86 10 52328902, tang.xiaoli@imicams.ac.cn %K disease gene prediction %K metagraph %K protein representations %K schizophrenia %K keyword network %D 2023 %7 15.11.2023 %9 Original Paper %J JMIR Form Res %G English %X Background: Schizophrenia is a serious mental disease. With increased research funding for this disease, schizophrenia has become one of the key areas of focus in the medical field. Searching for associations between diseases and genes is an effective approach to study complex diseases, which may enhance research on schizophrenia pathology and lead to the identification of new treatment targets. Objective: The aim of this study was to identify potential schizophrenia risk genes by employing machine learning methods to extract topological characteristics of proteins and their functional roles in a protein-protein interaction (PPI)-keywords (PPIK) network and understand the complex disease–causing property. Consequently, a PPIK-based metagraph representation approach is proposed. Methods: To enrich the PPI network, we integrated keywords describing protein properties and constructed a PPIK network. We extracted features that describe the topology of this network through metagraphs. We further transformed these metagraphs into vectors and represented proteins with a series of vectors. We then trained and optimized our model using random forest (RF), extreme gradient boosting, light gradient boosting machine, and logistic regression models. Results: Comprehensive experiments demonstrated the good performance of our proposed method with an area under the receiver operating characteristic curve (AUC) value between 0.72 and 0.76. Our model also outperformed baseline methods for overall disease protein prediction, including the random walk with restart, average commute time, and Katz models. Compared with the PPI network constructed from the baseline models, complementation of keywords in the PPIK network improved the performance (AUC) by 0.08 on average, and the metagraph-based method improved the AUC by 0.30 on average compared with that of the baseline methods. According to the comprehensive performance of the four models, RF was selected as the best model for disease protein prediction, with precision, recall, F1-score, and AUC values of 0.76, 0.73, 0.72, and 0.76, respectively. We transformed these proteins to their encoding gene IDs and identified the top 20 genes as the most probable schizophrenia-risk genes, including the EYA3, CNTN4, HSPA8, LRRK2, and AFP genes. We further validated these outcomes against metagraph features and evidence from the literature, performed a features analysis, and exploited evidence from the literature to interpret the correlation between the predicted genes and diseases. Conclusions: The metagraph representation based on the PPIK network framework was found to be effective for potential schizophrenia risk genes identification. The results are quite reliable as evidence can be found in the literature to support our prediction. Our approach can provide more biological insights into the pathogenesis of schizophrenia. %M 37966892 %R 10.2196/50998 %U https://formative.jmir.org/2023/1/e50998 %U https://doi.org/10.2196/50998 %U http://www.ncbi.nlm.nih.gov/pubmed/37966892 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 11 %N %P e48030 %T The Journey of Data Within a Global Data Sharing Initiative: A Federated 3-Layer Data Analysis Pipeline to Scale Up Multiple Sclerosis Research %A Pirmani,Ashkan %A De Brouwer,Edward %A Geys,Lotte %A Parciak,Tina %A Moreau,Yves %A Peeters,Liesbet M %+ Biomedical Research Institute, Hasselt University, Agoralaan, Building C, Diepenbeek, 3590, Belgium, 32 11 26 92 05, liesbet.peeters@uhasselt.be %K data analysis pipeline %K federated model sharing %K real-world data %K evidence-based decision-making %K end-to-end pipeline %K multiple sclerosis %K data analysis %K pipeline %K data science %K federated %K neurology %K brain %K spine %K spinal nervous system %K neuroscience %K data sharing %K rare %K low prevalence %D 2023 %7 9.11.2023 %9 Original Paper %J JMIR Med Inform %G English %X Background: Investigating low-prevalence diseases such as multiple sclerosis is challenging because of the rather small number of individuals affected by this disease and the scattering of real-world data across numerous data sources. These obstacles impair data integration, standardization, and analysis, which negatively impact the generation of significant meaningful clinical evidence. Objective: This study aims to present a comprehensive, research question–agnostic, multistakeholder-driven end-to-end data analysis pipeline that accommodates 3 prevalent data-sharing streams: individual data sharing, core data set sharing, and federated model sharing. Methods: A demand-driven methodology is employed for standardization, followed by 3 streams of data acquisition, a data quality enhancement process, a data integration procedure, and a concluding analysis stage to fulfill real-world data-sharing requirements. This pipeline’s effectiveness was demonstrated through its successful implementation in the COVID-19 and multiple sclerosis global data sharing initiative. Results: The global data sharing initiative yielded multiple scientific publications and provided extensive worldwide guidance for the community with multiple sclerosis. The pipeline facilitated gathering pertinent data from various sources, accommodating distinct sharing streams and assimilating them into a unified data set for subsequent statistical analysis or secure data examination. This pipeline contributed to the assembly of the largest data set of people with multiple sclerosis infected with COVID-19. Conclusions: The proposed data analysis pipeline exemplifies the potential of global stakeholder collaboration and underlines the significance of evidence-based decision-making. It serves as a paradigm for how data sharing initiatives can propel advancements in health care, emphasizing its adaptability and capacity to address diverse research inquiries. %M 37943585 %R 10.2196/48030 %U https://medinform.jmir.org/2023/1/e48030 %U https://doi.org/10.2196/48030 %U http://www.ncbi.nlm.nih.gov/pubmed/37943585 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e48809 %T The Status of Data Management Practices Across German Medical Data Integration Centers: Mixed Methods Study %A Gierend,Kerstin %A Freiesleben,Sherry %A Kadioglu,Dennis %A Siegel,Fabian %A Ganslandt,Thomas %A Waltemath,Dagmar %+ Department of Biomedical Informatics at the Center for Preventive Medicine and Digital Health, Medical Faculty Mannheim, Heidelberg University, Theodor-Kutzer-Ufer 1-3, Mannheim, 68167, Germany, 49 621383 ext 8087, kerstin.gierend@medma.uni-heidelberg.de %K data management %K provenance %K traceability %K metadata %K data integration center %K maturity model %D 2023 %7 8.11.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: In the context of the Medical Informatics Initiative, medical data integration centers (DICs) have implemented complex data flows to transfer routine health care data into research data repositories for secondary use. Data management practices are of importance throughout these processes, and special attention should be given to provenance aspects. Insufficient knowledge can lead to validity risks and reduce the confidence and quality of the processed data. The need to implement maintainable data management practices is undisputed, but there is a great lack of clarity on the status. Objective: Our study examines the current data management practices throughout the data life cycle within the Medical Informatics in Research and Care in University Medicine (MIRACUM) consortium. We present a framework for the maturity status of data management practices and present recommendations to enable a trustful dissemination and reuse of routine health care data. Methods: In this mixed methods study, we conducted semistructured interviews with stakeholders from 10 DICs between July and September 2021. We used a self-designed questionnaire that we tailored to the MIRACUM DICs, to collect qualitative and quantitative data. Our study method is compliant with the Good Reporting of a Mixed Methods Study (GRAMMS) checklist. Results: Our study provides insights into the data management practices at the MIRACUM DICs. We identify several traceability issues that can be partially explained with a lack of contextual information within nonharmonized workflow steps, unclear responsibilities, missing or incomplete data elements, and incomplete information about the computational environment information. Based on the identified shortcomings, we suggest a data management maturity framework to reach more clarity and to help define enhanced data management strategies. Conclusions: The data management maturity framework supports the production and dissemination of accurate and provenance-enriched data for secondary use. Our work serves as a catalyst for the derivation of an overarching data management strategy, abiding data integrity and provenance characteristics as key factors. We envision that this work will lead to the generation of fairer and maintained health research data of high quality. %M 37938878 %R 10.2196/48809 %U https://www.jmir.org/2023/1/e48809 %U https://doi.org/10.2196/48809 %U http://www.ncbi.nlm.nih.gov/pubmed/37938878 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e41446 %T Data Quality in Health Research: Integrative Literature Review %A Bernardi,Filipe Andrade %A Alves,Domingos %A Crepaldi,Nathalia %A Yamada,Diego Bettiol %A Lima,Vinícius Costa %A Rijo,Rui %+ Ribeirão Preto School of Medicine, University of Sao Paulo, Av Bandeirantes, 3900, Ribeirão Preto, 14040-900, Brazil, 55 16997880795, filipepaulista12@usp.br %K data quality %K research %K digital health %K review %K decision-making %K health data %K research network %K artificial intelligence %K e-management %K digital governance %K reliability %K database %K health system %K health services %K health stakeholders %D 2023 %7 31.10.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Decision-making and strategies to improve service delivery must be supported by reliable health data to generate consistent evidence on health status. The data quality management process must ensure the reliability of collected data. Consequently, various methodologies to improve the quality of services are applied in the health field. At the same time, scientific research is constantly evolving to improve data quality through better reproducibility and empowerment of researchers and offers patient groups tools for secured data sharing and privacy compliance. Objective: Through an integrative literature review, the aim of this work was to identify and evaluate digital health technology interventions designed to support the conducting of health research based on data quality. Methods: A search was conducted in 6 electronic scientific databases in January 2022: PubMed, SCOPUS, Web of Science, Institute of Electrical and Electronics Engineers Digital Library, Cumulative Index of Nursing and Allied Health Literature, and Latin American and Caribbean Health Sciences Literature. The Preferred Reporting Items for Systematic Reviews and Meta-Analyses checklist and flowchart were used to visualize the search strategy results in the databases. Results: After analyzing and extracting the outcomes of interest, 33 papers were included in the review. The studies covered the period of 2017-2021 and were conducted in 22 countries. Key findings revealed variability and a lack of consensus in assessing data quality domains and metrics. Data quality factors included the research environment, application time, and development steps. Strategies for improving data quality involved using business intelligence models, statistical analyses, data mining techniques, and qualitative approaches. Conclusions: The main barriers to health data quality are technical, motivational, economical, political, legal, ethical, organizational, human resources, and methodological. The data quality process and techniques, from precollection to gathering, postcollection, and analysis, are critical for the final result of a study or the quality of processes and decision-making in a health care organization. The findings highlight the need for standardized practices and collaborative efforts to enhance data quality in health research. Finally, context guides decisions regarding data quality strategies and techniques. International Registered Report Identifier (IRRID): RR2-10.1101/2022.05.31.22275804 %M 37906223 %R 10.2196/41446 %U https://www.jmir.org/2023/1/e41446 %U https://doi.org/10.2196/41446 %U http://www.ncbi.nlm.nih.gov/pubmed/37906223 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e45225 %T Potential Target Discovery and Drug Repurposing for Coronaviruses: Study Involving a Knowledge Graph–Based Approach %A Lou,Pei %A Fang,An %A Zhao,Wanqing %A Yao,Kuanda %A Yang,Yusheng %A Hu,Jiahui %+ Institute of Medical Information, Chinese Academy of Medical Sciences & Peking Union Medical College, No. 3 Yabao Road, Chaoyang District, Beijing, 100020, China, 86 01052328782, hu.jiahui@imicams.ac.cn %K coronavirus %K heterogeneous data integration %K knowledge graph embedding %K drug repurposing %K interpretable prediction %K COVID-19 %D 2023 %7 20.10.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: The global pandemics of severe acute respiratory syndrome, Middle East respiratory syndrome, and COVID-19 have caused unprecedented crises for public health. Coronaviruses are constantly evolving, and it is unknown which new coronavirus will emerge and when the next coronavirus will sweep across the world. Knowledge graphs are expected to help discover the pathogenicity and transmission mechanism of viruses. Objective: The aim of this study was to discover potential targets and candidate drugs to repurpose for coronaviruses through a knowledge graph–based approach. Methods: We propose a computational and evidence-based knowledge discovery approach to identify potential targets and candidate drugs for coronaviruses from biomedical literature and well-known knowledge bases. To organize the semantic triples extracted automatically from biomedical literature, a semantic conversion model was designed. The literature knowledge was associated and integrated with existing drug and gene knowledge through semantic mapping, and the coronavirus knowledge graph (CovKG) was constructed. We adopted both the knowledge graph embedding model and the semantic reasoning mechanism to discover unrecorded mechanisms of drug action as well as potential targets and drug candidates. Furthermore, we have provided evidence-based support with a scoring and backtracking mechanism. Results: The constructed CovKG contains 17,369,620 triples, of which 641,195 were extracted from biomedical literature, covering 13,065 concept unique identifiers, 209 semantic types, and 97 semantic relations of the Unified Medical Language System. Through multi-source knowledge integration, 475 drugs and 262 targets were mapped to existing knowledge, and 41 new drug mechanisms of action were found by semantic reasoning, which were not recorded in the existing knowledge base. Among the knowledge graph embedding models, TransR outperformed others (mean reciprocal rank=0.2510, Hits@10=0.3505). A total of 33 potential targets and 18 drug candidates were identified for coronaviruses. Among them, 7 novel drugs (ie, quinine, nelfinavir, ivermectin, asunaprevir, tylophorine, Artemisia annua extract, and resveratrol) and 3 highly ranked targets (ie, angiotensin converting enzyme 2, transmembrane serine protease 2, and M protein) were further discussed. Conclusions: We showed the effectiveness of a knowledge graph–based approach in potential target discovery and drug repurposing for coronaviruses. Our approach can be extended to other viruses or diseases for biomedical knowledge discovery and relevant applications. %M 37862061 %R 10.2196/45225 %U https://www.jmir.org/2023/1/e45225 %U https://doi.org/10.2196/45225 %U http://www.ncbi.nlm.nih.gov/pubmed/37862061 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e47254 %T The BioRef Infrastructure, a Framework for Real-Time, Federated, Privacy-Preserving, and Personalized Reference Intervals: Design, Development, and Application %A Blatter,Tobias Ueli %A Witte,Harald %A Fasquelle-Lopez,Jules %A Nakas,Christos Theodoros %A Raisaro,Jean Louis %A Leichtle,Alexander Benedikt %+ University Institute of Clinical Chemistry, University Hospital Bern, Freiburgstrasse 10, Bern, 3010, Switzerland, 41 31 632 83 30, harald.witte@extern.insel.ch %K personalized health %K laboratory medicine %K reference interval %K research infrastructure %K sensitive data %K confidential data %K data security %K differential privacy %K precision medicine %D 2023 %7 18.10.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Reference intervals (RIs) for patient test results are in standard use across many medical disciplines, allowing physicians to identify measurements indicating potentially pathological states with relative ease. The process of inferring cohort-specific RIs is, however, often ignored because of the high costs and cumbersome efforts associated with it. Sophisticated analysis tools are required to automatically infer relevant and locally specific RIs directly from routine laboratory data. These tools would effectively connect clinical laboratory databases to physicians and provide personalized target ranges for the respective cohort population. Objective: This study aims to describe the BioRef infrastructure, a multicentric governance and IT framework for the estimation and assessment of patient group–specific RIs from routine clinical laboratory data using an innovative decentralized data-sharing approach and a sophisticated, clinically oriented graphical user interface for data analysis. Methods: A common governance agreement and interoperability standards have been established, allowing the harmonization of multidimensional laboratory measurements from multiple clinical databases into a unified “big data” resource. International coding systems, such as the International Classification of Diseases, Tenth Revision (ICD-10); unique identifiers for medical devices from the Global Unique Device Identification Database; type identifiers from the Global Medical Device Nomenclature; and a universal transfer logic, such as the Resource Description Framework (RDF), are used to align the routine laboratory data of each data provider for use within the BioRef framework. With a decentralized data-sharing approach, the BioRef data can be evaluated by end users from each cohort site following a strict “no copy, no move” principle, that is, only data aggregates for the intercohort analysis of target ranges are exchanged. Results: The TI4Health distributed and secure analytics system was used to implement the proposed federated and privacy-preserving approach and comply with the limitations applied to sensitive patient data. Under the BioRef interoperability consensus, clinical partners enable the computation of RIs via the TI4Health graphical user interface for query without exposing the underlying raw data. The interface was developed for use by physicians and clinical laboratory specialists and allows intuitive and interactive data stratification by patient factors (age, sex, and personal medical history) as well as laboratory analysis determinants (device, analyzer, and test kit identifier). This consolidated effort enables the creation of extremely detailed and patient group–specific queries, allowing the generation of individualized, covariate-adjusted RIs on the fly. Conclusions: With the BioRef-TI4Health infrastructure, a framework for clinical physicians and researchers to define precise RIs immediately in a convenient, privacy-preserving, and reproducible manner has been implemented, promoting a vital part of practicing precision medicine while streamlining compliance and avoiding transfers of raw patient data. This new approach can provide a crucial update on RIs and improve patient care for personalized medicine. %M 37851984 %R 10.2196/47254 %U https://www.jmir.org/2023/1/e47254 %U https://doi.org/10.2196/47254 %U http://www.ncbi.nlm.nih.gov/pubmed/37851984 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 11 %N %P e44892 %T A Multilabel Text Classifier of Cancer Literature at the Publication Level: Methods Study of Medical Text Classification %A Zhang,Ying %A Li,Xiaoying %A Liu,Yi %A Li,Aihua %A Yang,Xuemei %A Tang,Xiaoli %+ Institute of Medical Information, Chinese Academy of Medical Sciences, No 69, Dongdan North Street, Beijing, 100020, China, 86 10 52328902, tang.xiaoli@imicams.ac.cn %K text classification %K publication-level classifier %K cancer literature %K deep learning %D 2023 %7 5.10.2023 %9 Original Paper %J JMIR Med Inform %G English %X Background: Given the threat posed by cancer to human health, there is a rapid growth in the volume of data in the cancer field and interdisciplinary and collaborative research is becoming increasingly important for fine-grained classification. The low-resolution classifier of reported studies at the journal level fails to satisfy advanced searching demands, and a single label does not adequately characterize the literature originated from interdisciplinary research results. There is thus a need to establish a multilabel classifier with higher resolution to support literature retrieval for cancer research and reduce the burden of screening papers for clinical relevance. Objective: The primary objective of this research was to address the low-resolution issue of cancer literature classification due to the ambiguity of the existing journal-level classifier in order to support gaining high-relevance evidence for clinical consideration and all-sided results for literature retrieval. Methods: We trained a multilabel classifier with scalability for classifying the literature on cancer research directly at the publication level to assign proper content-derived labels based on the “Bidirectional Encoder Representation from Transformers (BERT) + X” model and obtain the best option for X. First, a corpus of 70,599 cancer publications retrieved from the Dimensions database was divided into a training and a testing set in a ratio of 7:3. Second, using the classification terminology of International Cancer Research Partnership cancer types, we compared the performance of classifiers developed using BERT and 5 classical deep learning models, such as the text recurrent neural network (TextRNN) and FastText, followed by metrics analysis. Results: After comparing various combined deep learning models, we obtained a classifier based on the optimal combination “BERT + TextRNN,” with a precision of 93.09%, a recall of 87.75%, and an F1-score of 90.34%. Moreover, we quantified the distinctive characteristics in the text structure and multilabel distribution in order to generalize the model to other fields with similar characteristics. Conclusions: The “BERT + TextRNN” model was trained for high-resolution classification of cancer literature at the publication level to support accurate retrieval and academic statistics. The model automatically assigns 1 or more labels to each cancer paper, as required. Quantitative comparison verified that the “BERT + TextRNN” model is the best fit for multilabel classification of cancer literature compared to other models. More data from diverse fields will be collected to testify the scalability and extensibility of the proposed model in the future. %M 37796584 %R 10.2196/44892 %U https://medinform.jmir.org/2023/1/e44892 %U https://doi.org/10.2196/44892 %U http://www.ncbi.nlm.nih.gov/pubmed/37796584 %0 Journal Article %@ 1929-073X %I JMIR Publications %V 12 %N %P e44310 %T Normal Workflow and Key Strategies for Data Cleaning Toward Real-World Data: Viewpoint %A Guo,Manping %A Wang,Yiming %A Yang,Qiaoning %A Li,Rui %A Zhao,Yang %A Li,Chenfei %A Zhu,Mingbo %A Cui,Yao %A Jiang,Xin %A Sheng,Song %A Li,Qingna %A Gao,Rui %+ Xiyuan Hospital, China Academy of Chinese Medical Sciences, GCP, Xiyuan Hospital, 1 Xiyuan Playground, Haidian District, Beijing, 100091, China, 86 010 62835653, ruigao@126.com %K data cleaning %K data quality %K key technologies %K real-world data %K viewpoint %D 2023 %7 21.9.2023 %9 Viewpoint %J Interact J Med Res %G English %X With the rapid development of science, technology, and engineering, large amounts of data have been generated in many fields in the past 20 years. In the process of medical research, data are constantly generated, and large amounts of real-world data form a “data disaster.” Effective data analysis and mining are based on data availability and high data quality. The premise of high data quality is the need to clean the data. Data cleaning is the process of detecting and correcting “dirty data,” which is the basis of data analysis and management. Moreover, data cleaning is a common technology for improving data quality. However, the current literature on real-world research provides little guidance on how to efficiently and ethically set up and perform data cleaning. To address this issue, we proposed a data cleaning framework for real-world research, focusing on the 3 most common types of dirty data (duplicate, missing, and outlier data), and a normal workflow for data cleaning to serve as a reference for the application of such technologies in future studies. We also provided relevant suggestions for common problems in data cleaning. %M 37733421 %R 10.2196/44310 %U https://www.i-jmr.org/2023/1/e44310 %U https://doi.org/10.2196/44310 %U http://www.ncbi.nlm.nih.gov/pubmed/37733421 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e48115 %T Large-Scale Biomedical Relation Extraction Across Diverse Relation Types: Model Development and Usability Study on COVID-19 %A Zhang,Zeyu %A Fang,Meng %A Wu,Rebecca %A Zong,Hui %A Huang,Honglian %A Tong,Yuantao %A Xie,Yujia %A Cheng,Shiyang %A Wei,Ziyi %A Crabbe,M James C %A Zhang,Xiaoyan %A Wang,Ying %+ Research Center for Translational Medicine, Shanghai East Hospital, School of Life Sciences and Technology, Tongji University, 1239 Siping Road, Shanghai, 200092, China, 86 21 65980233, nadger_wang@139.com %K biomedical text mining %K biomedical relation extraction %K pretrained language model %K task-adaptive pretraining %K knowledge graph %K knowledge discovery %K clinical drug path %K COVID-19 %D 2023 %7 20.9.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Biomedical relation extraction (RE) is of great importance for researchers to conduct systematic biomedical studies. It not only helps knowledge mining, such as knowledge graphs and novel knowledge discovery, but also promotes translational applications, such as clinical diagnosis, decision-making, and precision medicine. However, the relations between biomedical entities are complex and diverse, and comprehensive biomedical RE is not yet well established. Objective: We aimed to investigate and improve large-scale RE with diverse relation types and conduct usability studies with application scenarios to optimize biomedical text mining. Methods: Data sets containing 125 relation types with different entity semantic levels were constructed to evaluate the impact of entity semantic information on RE, and performance analysis was conducted on different model architectures and domain models. This study also proposed a continued pretraining strategy and integrated models with scripts into a tool. Furthermore, this study applied RE to the COVID-19 corpus with article topics and application scenarios of clinical interest to assess and demonstrate its biological interpretability and usability. Results: The performance analysis revealed that RE achieves the best performance when the detailed semantic type is provided. For a single model, PubMedBERT with continued pretraining performed the best, with an F1-score of 0.8998. Usability studies on COVID-19 demonstrated the interpretability and usability of RE, and a relation graph database was constructed, which was used to reveal existing and novel drug paths with edge explanations. The models (including pretrained and fine-tuned models), integrated tool (Docker), and generated data (including the COVID-19 relation graph database and drug paths) have been made publicly available to the biomedical text mining community and clinical researchers. Conclusions: This study provided a comprehensive analysis of RE with diverse relation types. Optimized RE models and tools for diverse relation types were developed, which can be widely used in biomedical text mining. Our usability studies provided a proof-of-concept demonstration of how large-scale RE can be leveraged to facilitate novel research. %M 37632414 %R 10.2196/48115 %U https://www.jmir.org/2023/1/e48115 %U https://doi.org/10.2196/48115 %U http://www.ncbi.nlm.nih.gov/pubmed/37632414 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 12 %N %P e48636 %T The Australian Genetic Heart Disease Registry: Protocol for a Data Linkage Study %A Butters,Alexandra %A Blanch,Bianca %A Kemp-Casey,Anna %A Do,Judy %A Yeates,Laura %A Leslie,Felicity %A Semsarian,Christopher %A Nedkoff,Lee %A Briffa,Tom %A Ingles,Jodie %A Sweeting,Joanna %+ Clinical Genomics Laboratory, Centre for Population Genomics, Garvan Institute of Medical Research, 384 Victoria Street, Darlinghurst, 2010, Australia, 61 2 9359 8049, joanna.sweeting@populationgenomics.org.au %K data linkage %K genetic heart diseases %K health care use %K cardiomyopathies %K arrhythmia %K cardiology %K heart %K genetics %K registry %K registries %K risk %K mortality %K national %K big data %K harmonization %K probabilistic matching %D 2023 %7 20.9.2023 %9 Protocol %J JMIR Res Protoc %G English %X Background: Genetic heart diseases such as hypertrophic cardiomyopathy can cause significant morbidity and mortality, ranging from syncope, chest pain, and palpitations to heart failure and sudden cardiac death. These diseases are inherited in an autosomal dominant fashion, meaning family members of affected individuals have a 1 in 2 chance of also inheriting the disease (“at-risk relatives”). The health care use patterns of individuals with a genetic heart disease, including emergency department presentations and hospital admissions, are poorly understood. By linking genetic heart disease registry data to routinely collected health data, we aim to provide a more comprehensive clinical data set to examine the burden of disease on individuals, families, and health care systems. Objective: The objective of this study is to link the Australian Genetic Heart Disease (AGHD) Registry with routinely collected whole-population health data sets to investigate the health care use of individuals with a genetic heart disease and their at-risk relatives. This linked data set will allow for the investigation of differences in outcomes and health care use due to disease, sex, socioeconomic status, and other factors. Methods: The AGHD Registry is a nationwide data set that began in 2007 and aims to recruit individuals with a genetic heart disease and their family members. In this study, demographic, clinical, and genetic data (available from 2007 to 2019) for AGHD Registry participants and at-risk relatives residing in New South Wales (NSW), Australia, were linked to routinely collected health data. These data included NSW-based data sets covering hospitalizations (2001-2019), emergency department presentations (2005-2019), and both state-wide and national mortality registries (2007-2019). The linkage was performed by the Centre for Health Record Linkage. Investigations stratifying by diagnosis, age, sex, socioeconomic status, and gene status will be undertaken and reported using descriptive statistics. Results: NSW AGHD Registry participants were linked to routinely collected health data sets using probabilistic matching (November 2019). Of 1720 AGHD Registry participants, 1384 had linkages with 11,610 hospital records, 7032 emergency department records, and 60 death records. Data assessment and harmonization were performed, and descriptive data analysis is underway. Conclusions: We intend to provide insights into the health care use patterns of individuals with a genetic heart disease and their at-risk relatives, including frequency of hospital admissions and differences due to factors such as disease, sex, and socioeconomic status. Identifying disparities and potential barriers to care may highlight specific health care needs (eg, between sexes) and factors impacting health care access and use. International Registered Report Identifier (IRRID): DERR1-10.2196/48636 %M 37728963 %R 10.2196/48636 %U https://www.researchprotocols.org/2023/1/e48636 %U https://doi.org/10.2196/48636 %U http://www.ncbi.nlm.nih.gov/pubmed/37728963 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e47540 %T Sharing Data With Shared Benefits: Artificial Intelligence Perspective %A Tajabadi,Mohammad %A Grabenhenrich,Linus %A Ribeiro,Adèle %A Leyer,Michael %A Heider,Dominik %+ Department of Data Science in Biomedicine, Faculty of Mathematics and Computer Science, University of Marburg, Hans-Meerwein-Str. 6, Marburg, 35043, Germany, 49 6421 2821579, dominik.heider@uni-marburg.de %K federated learning %K machine learning %K medical data %K fairness %K data sharing %K artificial intelligence %K development %K artificial intelligence model %K applications %K data analysis %K diagnostic tool %K tool %D 2023 %7 29.8.2023 %9 Viewpoint %J J Med Internet Res %G English %X Artificial intelligence (AI) and data sharing go hand in hand. In order to develop powerful AI models for medical and health applications, data need to be collected and brought together over multiple centers. However, due to various reasons, including data privacy, not all data can be made publicly available or shared with other parties. Federated and swarm learning can help in these scenarios. However, in the private sector, such as between companies, the incentive is limited, as the resulting AI models would be available for all partners irrespective of their individual contribution, including the amount of data provided by each party. Here, we explore a potential solution to this challenge as a viewpoint, aiming to establish a fairer approach that encourages companies to engage in collaborative data analysis and AI modeling. Within the proposed approach, each individual participant could gain a model commensurate with their respective data contribution, ultimately leading to better diagnostic tools for all participants in a fair manner. %M 37642995 %R 10.2196/47540 %U https://www.jmir.org/2023/1/e47540 %U https://doi.org/10.2196/47540 %U http://www.ncbi.nlm.nih.gov/pubmed/37642995 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e45013 %T Initiatives, Concepts, and Implementation Practices of the Findable, Accessible, Interoperable, and Reusable Data Principles in Health Data Stewardship: Scoping Review %A Inau,Esther Thea %A Sack,Jean %A Waltemath,Dagmar %A Zeleke,Atinkut Alamirrew %+ Department of Medical Informatics, Institute for Community Medicine, University Medicine Greifswald, Walther-Rathenau-Str 48, Greifswald, D-17475, Germany, 49 3834867548, inaue@uni-greifswald.de %K data stewardship %K findable, accessible, interoperable, and reusable data principles %K FAIR data principles %K health research %K Preferred Reporting Items for Systematic Reviews and Meta-Analyses %K PRISMA %K qualitative analysis %K scoping review %K information retrieval %K health information exchange %D 2023 %7 28.8.2023 %9 Review %J J Med Internet Res %G English %X Background: Thorough data stewardship is a key enabler of comprehensive health research. Processes such as data collection, storage, access, sharing, and analytics require researchers to follow elaborate data management strategies properly and consistently. Studies have shown that findable, accessible, interoperable, and reusable (FAIR) data leads to improved data sharing in different scientific domains. Objective: This scoping review identifies and discusses concepts, approaches, implementation experiences, and lessons learned in FAIR initiatives in health research data. Methods: The Arksey and O’Malley stage-based methodological framework for scoping reviews was applied. PubMed, Web of Science, and Google Scholar were searched to access relevant publications. Articles written in English, published between 2014 and 2020, and addressing FAIR concepts or practices in the health domain were included. The 3 data sources were deduplicated using a reference management software. In total, 2 independent authors reviewed the eligibility of each article based on defined inclusion and exclusion criteria. A charting tool was used to extract information from the full-text papers. The results were reported using the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) guidelines. Results: A total of 2.18% (34/1561) of the screened articles were included in the final review. The authors reported FAIRification approaches, which include interpolation, inclusion of comprehensive data dictionaries, repository design, semantic interoperability, ontologies, data quality, linked data, and requirement gathering for FAIRification tools. Challenges and mitigation strategies associated with FAIRification, such as high setup costs, data politics, technical and administrative issues, privacy concerns, and difficulties encountered in sharing health data despite its sensitive nature were also reported. We found various workflows, tools, and infrastructures designed by different groups worldwide to facilitate the FAIRification of health research data. We also uncovered a wide range of problems and questions that researchers are trying to address by using the different workflows, tools, and infrastructures. Although the concept of FAIR data stewardship in the health research domain is relatively new, almost all continents have been reached by at least one network trying to achieve health data FAIRness. Documented outcomes of FAIRification efforts include peer-reviewed publications, improved data sharing, facilitated data reuse, return on investment, and new treatments. Successful FAIRification of data has informed the management and prognosis of various diseases such as cancer, cardiovascular diseases, and neurological diseases. Efforts to FAIRify data on a wider variety of diseases have been ongoing since the COVID-19 pandemic. Conclusions: This work summarises projects, tools, and workflows for the FAIRification of health research data. The comprehensive review shows that implementing the FAIR concept in health data stewardship carries the promise of improved research data management and transparency in the era of big data and open research publishing. International Registered Report Identifier (IRRID): RR2-10.2196/22505 %M 37639292 %R 10.2196/45013 %U https://www.jmir.org/2023/1/e45013 %U https://doi.org/10.2196/45013 %U http://www.ncbi.nlm.nih.gov/pubmed/37639292 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 11 %N %P e46275 %T Visual Analytics of Multidimensional Oral Health Surveys: Data Mining Study %A Xu,Ting %A Ma,Yuming %A Pan,Tianya %A Chen,Yifei %A Liu,Yuhua %A Zhu,Fudong %A Zhou,Zhiguang %A Chen,Qianming %+ School of Media and Design, Hangzhou Dianzi University, Xueyuan Road #18, Hangzhou, 310018, China, 86 15957193211, zhgzhou@hdu.edu.cn %K visual analytics %K oral health data mining %K knowledge graph %K multidimensional data visualization %D 2023 %7 1.8.2023 %9 Original Paper %J JMIR Med Inform %G English %X Background: Oral health surveys largely facilitate the prevention and treatment of oral diseases as well as the awareness of population health status. As oral health is always surveyed from a variety of perspectives, it is a difficult and complicated task to gain insights from multidimensional oral health surveys. Objective: We aimed to develop a visualization framework for the visual analytics and deep mining of multidimensional oral health surveys. Methods: First, diseases and groups were embedded into data portraits based on their multidimensional attributes. Subsequently, group classification and correlation pattern extraction were conducted to explore the correlation features among diseases, behaviors, symptoms, and cognitions. On the basis of the feature mining of diseases, groups, behaviors, and their attributes, a knowledge graph was constructed to reveal semantic information, integrate the graph query function, and describe the features of intrigue to users. Results: A visualization framework was implemented for the exploration of multidimensional oral health surveys. A series of user-friendly interactions were integrated to propose a visual analysis system that can help users further achieve the regulations of oral health conditions. Conclusions: A visualization framework is provided in this paper with a set of meaningful user interactions integrated, enabling users to intuitively understand the oral health situation and conduct in-depth data exploration and analysis. Case studies based on real-world data sets demonstrate the effectiveness of our system in the exploration of oral diseases. %M 37526971 %R 10.2196/46275 %U https://medinform.jmir.org/2023/1/e46275 %U https://doi.org/10.2196/46275 %U http://www.ncbi.nlm.nih.gov/pubmed/37526971 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 12 %N %P e42404 %T Predictors and Consequences of Homelessness: Protocol for a Cohort Study Design Using Linked Routine Data %A Mitchell,Eileen %A O’Reilly,Dermot %A O’Donovan,Diarmuid %A Bradley,Declan %+ Centre for Public Health, Queen’s University, University Road, Belfast BT7 1NN, Belfast, , United Kingdom, 44 028 9024 5133, e.mitchell@qub.ac.uk %K administrative data %K data linkage %K health care use %K homelessness %K housing %K mortality %D 2023 %7 27.7.2023 %9 Protocol %J JMIR Res Protoc %G English %X Background: Homelessness is a global burden, estimated to impact more than 100 million people worldwide. Individuals and families experiencing homelessness are more likely to have poorer physical and mental health than the general population. Administrative data is being increasingly used in homelessness research. Objective: The objective of this study is to combine administrative health care data and social housing data to better understand the consequences and predictors associated with being homeless. Methods: We will be linking health and social care administrative databases from Northern Ireland, United Kingdom. We will conduct descriptive analyses to examine trends in homelessness and investigate risk factors for key outcomes. Results: The results of our analyses will be shared with stakeholders, reported at conferences and in academic journals, and summarized in policy briefing notes for policymakers. Conclusions: This study will aim to identify predictors and consequences of homelessness in Northern Ireland using linked housing, health, and social care data. The findings of this study will examine trends and outcomes in this vulnerable population using routinely collected health and social care administrative data. International Registered Report Identifier (IRRID): DERR1-10.2196/42404 %M 37498664 %R 10.2196/42404 %U https://www.researchprotocols.org/2023/1/e42404 %U https://doi.org/10.2196/42404 %U http://www.ncbi.nlm.nih.gov/pubmed/37498664 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e41858 %T Using Hypothesis-Led Machine Learning and Hierarchical Cluster Analysis to Identify Disease Pathways Prior to Dementia: Longitudinal Cohort Study %A Huang,Shih-Tsung %A Hsiao,Fei-Yuan %A Tsai,Tsung-Hsien %A Chen,Pei-Jung %A Peng,Li-Ning %A Chen,Liang-Kung %+ Center for Geriatrics and Gerontology, Taipei Veterans General Hospital, No. 201, Sec 2, Shih-Pai Road, Taipei, 11217, Taiwan, 886 2 28757711, lkchen2@vghtpe.gov.tw %K dementia %K machine learning %K cluster analysis %K disease %K condition %K symptoms %K data %K data set %K cardiovascular %K neuropsychiatric %K infection %K mobility %K mental conditions %K development %D 2023 %7 26.7.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Dementia development is a complex process in which the occurrence and sequential relationships of different diseases or conditions may construct specific patterns leading to incident dementia. Objective: This study aimed to identify patterns of disease or symptom clusters and their sequences prior to incident dementia using a novel approach incorporating machine learning methods. Methods: Using Taiwan’s National Health Insurance Research Database, data from 15,700 older people with dementia and 15,700 nondementia controls matched on age, sex, and index year (n=10,466, 67% for the training data set and n=5234, 33% for the testing data set) were retrieved for analysis. Using machine learning methods to capture specific hierarchical disease triplet clusters prior to dementia, we designed a study algorithm with four steps: (1) data preprocessing, (2) disease or symptom pathway selection, (3) model construction and optimization, and (4) data visualization. Results: Among 15,700 identified older people with dementia, 10,466 and 5234 subjects were randomly assigned to the training and testing data sets, and 6215 hierarchical disease triplet clusters with positive correlations with dementia onset were identified. We subsequently generated 19,438 features to construct prediction models, and the model with the best performance was support vector machine (SVM) with the by-group LASSO (least absolute shrinkage and selection operator) regression method (total corresponding features=2513; accuracy=0.615; sensitivity=0.607; specificity=0.622; positive predictive value=0.612; negative predictive value=0.619; area under the curve=0.639). In total, this study captured 49 hierarchical disease triplet clusters related to dementia development, and the most characteristic patterns leading to incident dementia started with cardiovascular conditions (mainly hypertension), cerebrovascular disease, mobility disorders, or infections, followed by neuropsychiatric conditions. Conclusions: Dementia development in the real world is an intricate process involving various diseases or conditions, their co-occurrence, and sequential relationships. Using a machine learning approach, we identified 49 hierarchical disease triplet clusters with leading roles (cardio- or cerebrovascular disease) and supporting roles (mental conditions, locomotion difficulties, infections, and nonspecific neurological conditions) in dementia development. Further studies using data from other countries are needed to validate the prediction algorithms for dementia development, allowing the development of comprehensive strategies to prevent or care for dementia in the real world. %M 37494081 %R 10.2196/41858 %U https://www.jmir.org/2023/1/e41858 %U https://doi.org/10.2196/41858 %U http://www.ncbi.nlm.nih.gov/pubmed/37494081 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 12 %N %P e46542 %T Creation of a Laboratory for Statistics and Analysis of Dependence and Chronic Conditions: Protocol for the Bages Territorial Specialization and Competitiveness Project (PECT BAGESS) %A Pujolar-Díaz,Georgina %A Vidal-Alaball,Josep %A Forcada,Anna %A Descals-Singla,Elisabet %A Basora,Josep %A , %+ Unitat de Suport a la Recerca de la Catalunya Central, Fundació Institut Universitari per a la Recerca a l'Atenció Primària de Salut Jordi Gol i Gurina, Carrer Pica d'Estats 13-15, Sant Fruitós de Bages, 08272, Spain, 34 93 693 00 40, jvidal.cc.ics@gencat.cat %K chronic disease %K multiple chronic conditions %K primary health care %K diffusion of innovation %K health data %K data sharing %D 2023 %7 26.7.2023 %9 Protocol %J JMIR Res Protoc %G English %X Background: With the increasing prevalence of chronic diseases, partly due to the increase in life expectancy and the aging of the population, the complexity of the approach faced by the structures, dynamics, and actors that are part of the current care and attention systems is evident. The territory of Bages (Catalonia, Spain) presents characteristics of a highly complex ecosystem where there is a need to develop new, more dynamic structures for the various actors in the health and social systems, aimed at incorporating new actors in the technological and business field that would allow innovation in the management of this context. Within the framework of the Bages Territorial Specialization and Competitiveness Project (PECT BAGESS), the aim is to address these challenges through various entities that will develop 7 interrelated operations. Of these, the operation of the IDIAP Jordi Gol-Catalan Health Institute focuses on the creation of a Laboratory for Statistics and Analysis of Dependence and Chronic Conditions in the Bages region, in the form of a database that will collect the most relevant information from the different environments that affect the management of chronic conditions and dependence: health, social, economic, and environment. Objective: This study aims to create a laboratory for statistical, dependence, and chronic condition analysis in the Bages region, to determine the chronic conditions and conditions that generate dependence in the Bages area, in order to propose products and services that respond to the needs of people in these situations. Methods: PECT BAGESS originated from the Shared Agenda initiative, which was established in the Bages region with the goal of enhancing the quality of life and fostering social inclusion for individuals with chronic diseases. This study presents part of this broader project, consisting of the creation of a database. Data from chronic conditions and dependence service providers will be combined, using a unique identifier for the different sources of information. A thorough legal analysis was conducted to establish a secure data sharing mechanism among the entities participating in the project. Results: The laboratory will be a key piece in the structure generated in the environment of the PECT BAGESS, which will allow relevant information to be passed on from the different sectors involved to respond to the needs of people with chronic conditions and dependence, as well as to generate opportunities for products and services. Conclusions: The emerging organizational dynamics and structures are expected to demonstrate a health and social management model that may have a remarkable impact on these sectors. Products and services developed may be very useful for generating synergies and facilitating the living conditions of people who can benefit from all these services. However, secure data sharing circuits must be considered. International Registered Report Identifier (IRRID): PRR1-10.2196/46542 %M 37494102 %R 10.2196/46542 %U https://www.researchprotocols.org/2023/1/e46542 %U https://doi.org/10.2196/46542 %U http://www.ncbi.nlm.nih.gov/pubmed/37494102 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e45948 %T Ten Topics to Get Started in Medical Informatics Research %A Wolfien,Markus %A Ahmadi,Najia %A Fitzer,Kai %A Grummt,Sophia %A Heine,Kilian-Ludwig %A Jung,Ian-C %A Krefting,Dagmar %A Kühn,Andreas %A Peng,Yuan %A Reinecke,Ines %A Scheel,Julia %A Schmidt,Tobias %A Schmücker,Paul %A Schüttler,Christina %A Waltemath,Dagmar %A Zoch,Michele %A Sedlmayr,Martin %+ Institute for Medical Informatics and Biometry, Faculty of Medicine Carl Gustav Carus, Technische Universität Dresden, Fetscherstraße 74, Dresden, 01307, Germany, 49 3514587723, markus.wolfien@tu-dresden.de %K medical informatics %K health informatics %K interdisciplinary communication %K research data %K clinical data %K digital health %D 2023 %7 24.7.2023 %9 Viewpoint %J J Med Internet Res %G English %X The vast and heterogeneous data being constantly generated in clinics can provide great wealth for patients and research alike. The quickly evolving field of medical informatics research has contributed numerous concepts, algorithms, and standards to facilitate this development. However, these difficult relationships, complex terminologies, and multiple implementations can present obstacles for people who want to get active in the field. With a particular focus on medical informatics research conducted in Germany, we present in our Viewpoint a set of 10 important topics to improve the overall interdisciplinary communication between different stakeholders (eg, physicians, computational experts, experimentalists, students, patient representatives). This may lower the barriers to entry and offer a starting point for collaborations at different levels. The suggested topics are briefly introduced, then general best practice guidance is given, and further resources for in-depth reading or hands-on tutorials are recommended. In addition, the topics are set to cover current aspects and open research gaps of the medical informatics domain, including data regulations and concepts; data harmonization and processing; and data evaluation, visualization, and dissemination. In addition, we give an example on how these topics can be integrated in a medical informatics curriculum for higher education. By recognizing these topics, readers will be able to (1) set clinical and research data into the context of medical informatics, understanding what is possible to achieve with data or how data should be handled in terms of data privacy and storage; (2) distinguish current interoperability standards and obtain first insights into the processes leading to effective data transfer and analysis; and (3) value the use of newly developed technical approaches to utilize the full potential of clinical data. %M 37486754 %R 10.2196/45948 %U https://www.jmir.org/2023/1/e45948 %U https://doi.org/10.2196/45948 %U http://www.ncbi.nlm.nih.gov/pubmed/37486754 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 11 %N %P e47934 %T Data Analysis of Physician Competence Research Trend: Social Network Analysis and Topic Modeling Approach %A Yune,So Jung %A Kim,Youngjon %A Lee,Jea Woog %+ Intelligence Informatics Processing Lab, Chung-Ang University, 84, Heukseok-ro, Dongjak-gu,, Seoul, 06974, Republic of Korea, 82 10 5426 7318, yyizeuks@cau.ac.kr %K physician competency %K research trend %K competency-based education %K professionalism %K topic modeling %K latent Dirichlet allocation %K LDA algorithm %K data science %K social network analysis %D 2023 %7 19.7.2023 %9 Original Paper %J JMIR Med Inform %G English %X Background: Studies on competency in medical education often explore the acquisition, performance, and evaluation of particular skills, knowledge, or behaviors that constitute physician competency. As physician competency reflects social demands according to changes in the medical environment, analyzing the research trends of physician competency by period is necessary to derive major research topics for future studies. Therefore, a more macroscopic method is required to analyze the core competencies of physicians in this era. Objective: This study aimed to analyze research trends related to physicians’ competency in reflecting social needs according to changes in the medical environment. Methods: We used topic modeling to identify potential research topics by analyzing data from studies related to physician competency published between 2011 and 2020. We preprocessed 1354 articles and extracted 272 keywords. Results: The terms that appeared most frequently in the research related to physician competency since 2010 were knowledge, hospital, family, job, guidelines, management, and communication. The terms that appeared in most studies were education, model, knowledge, and hospital. Topic modeling revealed that the main topics about physician competency included Evidence-based clinical practice, Community-based healthcare, Patient care, Career and self-management, Continuous professional development, and Communication and cooperation. We divided the studies into 4 periods (2011-2013, 2014-2016, 2017-2019, and 2020-2021) and performed a linear regression analysis. The results showed a change in topics by period. The hot topics that have shown increased interest among scholars over time include Community-based healthcare, Career and self-management, and Continuous professional development. Conclusions: On the basis of the analysis of research trends, it is predicted that physician professionalism and community-based medicine will continue to be studied in future studies on physician competency. %M 37467028 %R 10.2196/47934 %U https://medinform.jmir.org/2023/1/e47934 %U https://doi.org/10.2196/47934 %U http://www.ncbi.nlm.nih.gov/pubmed/37467028 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e45059 %T Establishing a Health CASCADE–Curated Open-Access Database to Consolidate Knowledge About Co-Creation: Novel Artificial Intelligence–Assisted Methodology Based on Systematic Reviews %A Agnello,Danielle Marie %A Loisel,Quentin Emile Armand %A An,Qingfan %A Balaskas,George %A Chrifou,Rabab %A Dall,Philippa %A de Boer,Janneke %A Delfmann,Lea Rahel %A Giné-Garriga,Maria %A Goh,Kunshan %A Longworth,Giuliana Raffaella %A Messiha,Katrina %A McCaffrey,Lauren %A Smith,Niamh %A Steiner,Artur %A Vogelsang,Mira %A Chastin,Sebastien %+ School of Health and Life Sciences, Glasgow Caledonian University, Cowcaddens Road, Glasgow, G4 0BA, United Kingdom, 44 7871788785, danielle.agnello@gcu.ac.uk %K co-creation %K co-production %K co-design %K database %K participatory %K methodology %K artificial intelligence %D 2023 %7 18.7.2023 %9 Review %J J Med Internet Res %G English %X Background: Co-creation is an approach that aims to democratize research and bridge the gap between research and practice, but the potential fragmentation of knowledge about co-creation has hindered progress. A comprehensive database of published literature from multidisciplinary sources can address this fragmentation through the integration of diverse perspectives, identification and dissemination of best practices, and increase clarity about co-creation. However, two considerable challenges exist. First, there is uncertainty about co-creation terminology, making it difficult to identify relevant literature. Second, the exponential growth of scientific publications has led to an overwhelming amount of literature that surpasses the human capacity for a comprehensive review. These challenges hinder progress in co-creation research and underscore the need for a novel methodology to consolidate and investigate the literature. Objective: This study aimed to synthesize knowledge about co-creation across various fields through the development and application of an artificial intelligence (AI)–assisted selection process. The ultimate goal of this database was to provide stakeholders interested in co-creation with relevant literature. Methods: We created a novel methodology for establishing a curated database. To accommodate the variation in terminology, we used a broad definition of co-creation that encompassed the essence of existing definitions. To filter out irrelevant information, an AI-assisted selection process was used. In addition, we conducted bibliometric analyses and quality control procedures to assess content and accuracy. Overall, this approach allowed us to develop a robust and reliable database that serves as a valuable resource for stakeholders interested in co-creation. Results: The final version of the database included 13,501 papers, which are indexed in Zenodo and accessible in an open-access downloadable format. The quality assessment revealed that 20.3% (140/688) of the database likely contained irrelevant material, whereas the methodology captured 91% (58/64) of the relevant literature. Participatory and variations of the term co-creation were the most frequent terms in the title and abstracts of included literature. The predominant source journals included health sciences, sustainability, environmental sciences, medical research, and health services research. Conclusions: This study produced a high-quality, open-access database about co-creation. The study demonstrates that it is possible to perform a systematic review selection process on a fragmented concept using human-AI collaboration. Our unified concept of co-creation includes the co-approaches (co-creation, co-design, and co-production), forms of participatory research, and user involvement. Our analysis of authorship, citations, and source landscape highlights the potential lack of collaboration among co-creation researchers and underscores the need for future investigation into the different research methodologies. The database provides a resource for relevant literature and can support rapid literature reviews about co-creation. It also offers clarity about the current co-creation landscape and helps to address barriers that researchers may face when seeking evidence about co-creation. %M 37463024 %R 10.2196/45059 %U https://www.jmir.org/2023/1/e45059 %U https://doi.org/10.2196/45059 %U http://www.ncbi.nlm.nih.gov/pubmed/37463024 %0 Journal Article %@ 2563-3570 %I JMIR Publications %V 4 %N %P e44700 %T Secure Comparisons of Single Nucleotide Polymorphisms Using Secure Multiparty Computation: Method Development %A Woods,Andrew %A Kramer,Skyler T %A Xu,Dong %A Jiang,Wei %+ Department of Electrical Engineering and Computer Science, University of Missouri, 227 Naka Hall, Columbia, MO, 65211-0001, United States, 1 5738822299, xudong@missouri.edu %K secure multiparty computation %K single nucleotide polymorphism %K Variant Call Format %K Jaccard similarity %D 2023 %7 18.7.2023 %9 Original Paper %J JMIR Bioinform Biotech %G English %X Background: While genomic variations can provide valuable information for health care and ancestry, the privacy of individual genomic data must be protected. Thus, a secure environment is desirable for a human DNA database such that the total data are queryable but not directly accessible to involved parties (eg, data hosts and hospitals) and that the query results are learned only by the user or authorized party. Objective: In this study, we provide efficient and secure computations on panels of single nucleotide polymorphisms (SNPs) from genomic sequences as computed under the following set operations: union, intersection, set difference, and symmetric difference. Methods: Using these operations, we can compute similarity metrics, such as the Jaccard similarity, which could allow querying a DNA database to find the same person and genetic relatives securely. We analyzed various security paradigms and show metrics for the protocols under several security assumptions, such as semihonest, malicious with honest majority, and malicious with a malicious majority. Results: We show that our methods can be used practically on realistically sized data. Specifically, we can compute the Jaccard similarity of two genomes when considering sets of SNPs, each with 400,000 SNPs, in 2.16 seconds with the assumption of a malicious adversary in an honest majority and 0.36 seconds under a semihonest model. Conclusions: Our methods may help adopt trusted environments for hosting individual genomic data with end-to-end data security. %R 10.2196/44700 %U https://bioinform.jmir.org/2023/1/e44700 %U https://doi.org/10.2196/44700 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e45651 %T Effects of Using Different Indirect Techniques on the Calculation of Reference Intervals: Observational Study %A Yang,Dan %A Su,Zihan %A Mu,Runqing %A Diao,Yingying %A Zhang,Xin %A Liu,Yusi %A Wang,Shuo %A Wang,Xu %A Zhao,Lei %A Wang,Hongyi %A Zhao,Min %+ National Clinical Research Center for Laboratory Medicine, Department of Laboratory Medicine, The First Hospital of China Medical University, Nanjin North Street, No 155, Shenyang, 110001, China, 86 13898169877, minzhao@cmu.edu.cn %K comparative study %K data transformation %K indirect method %K outliers %K reference interval %K clinical decision-making %K complete blood count %K red blood cells %K white blood cells %K platelets %K laboratory %K clinical %D 2023 %7 17.7.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Reference intervals (RIs) play an important role in clinical decision-making. However, due to the time, labor, and financial costs involved in establishing RIs using direct means, the use of indirect methods, based on big data previously obtained from clinical laboratories, is getting increasing attention. Different indirect techniques combined with different data transformation methods and outlier removal might cause differences in the calculation of RIs. However, there are few systematic evaluations of this. Objective: This study used data derived from direct methods as reference standards and evaluated the accuracy of combinations of different data transformation, outlier removal, and indirect techniques in establishing complete blood count (CBC) RIs for large-scale data. Methods: The CBC data of populations aged ≥18 years undergoing physical examination from January 2010 to December 2011 were retrieved from the First Affiliated Hospital of China Medical University in northern China. After exclusion of repeated individuals, we performed parametric, nonparametric, Hoffmann, Bhattacharya, and truncation points and Kolmogorov–Smirnov distance (kosmic) indirect methods, combined with log or BoxCox transformation, and Reed–Dixon, Tukey, and iterative mean (3SD) outlier removal methods in order to derive the RIs of 8 CBC parameters and compared the results with those directly and previously established. Furthermore, bias ratios (BRs) were calculated to assess which combination of indirect technique, data transformation pattern, and outlier removal method is preferrable. Results: Raw data showed that the degrees of skewness of the white blood cell (WBC) count, platelet (PLT) count, mean corpuscular hemoglobin (MCH), mean corpuscular hemoglobin concentration (MCHC), and mean corpuscular volume (MCV) were much more obvious than those of other CBC parameters. After log or BoxCox transformation combined with Tukey or iterative mean (3SD) processing, the distribution types of these data were close to Gaussian distribution. Tukey-based outlier removal yielded the maximum number of outliers. The lower-limit bias of WBC (male), PLT (male), hemoglobin (HGB; male), MCH (male/female), and MCV (female) was greater than that of the corresponding upper limit for more than half of 30 indirect methods. Computational indirect choices of CBC parameters for males and females were inconsistent. The RIs of MCHC established by the direct method for females were narrow. For this, the kosmic method was markedly superior, which contrasted with the RI calculation of CBC parameters with high |BR| qualification rates for males. Among the top 10 methodologies for the WBC count, PLT count, HGB, MCV, and MCHC with a high-BR qualification rate among males, the Bhattacharya, Hoffmann, and parametric methods were superior to the other 2 indirect methods. Conclusions: Compared to results derived by the direct method, outlier removal methods and indirect techniques markedly influence the final RIs, whereas data transformation has negligible effects, except for obviously skewed data. Specifically, the outlier removal efficiency of Tukey and iterative mean (3SD) methods is almost equivalent. Furthermore, the choice of indirect techniques depends more on the characteristics of the studied analyte itself. This study provides scientific evidence for clinical laboratories to use their previous data sets to establish RIs. %M 37459170 %R 10.2196/45651 %U https://www.jmir.org/2023/1/e45651 %U https://doi.org/10.2196/45651 %U http://www.ncbi.nlm.nih.gov/pubmed/37459170 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 7 %N %P e42262 %T Implementing a Machine Learning Screening Tool for Malnutrition: Insights From Qualitative Research Applicable to Other Machine Learning–Based Clinical Decision Support Systems %A Besculides,Melanie %A Mazumdar,Madhu %A Phlegar,Sydney %A Freeman,Robert %A Wilson,Sara %A Joshi,Himanshu %A Kia,Arash %A Gorbenko,Ksenia %+ Institute for Healthcare Delivery Science, Icahn School of Medicine at Mount Sinai, One Gustave Levy Place, Box 1077, New York, NY, 10029, United States, 1 9176965097, melanie.besculides@mountsinai.org %K machine learning %K AI %K CDSS %K evaluation %K nutrition %K screening %K clinical %K usability %K effectiveness %K treatment %K malnutrition %K decision-making %K tool %K data %K acceptability %D 2023 %7 13.7.2023 %9 Original Paper %J JMIR Form Res %G English %X Background: Machine learning (ML)–based clinical decision support systems (CDSS) are popular in clinical practice settings but are often criticized for being limited in usability, interpretability, and effectiveness. Evaluating the implementation of ML-based CDSS is critical to ensure CDSS is acceptable and useful to clinicians and helps them deliver high-quality health care. Malnutrition is a common and underdiagnosed condition among hospital patients, which can have serious adverse impacts. Early identification and treatment of malnutrition are important. Objective: This study aims to evaluate the implementation of an ML tool, Malnutrition Universal Screening Tool (MUST)–Plus, that predicts hospital patients at high risk for malnutrition and identify best implementation practices applicable to this and other ML-based CDSS. Methods: We conducted a qualitative postimplementation evaluation using in-depth interviews with registered dietitians (RDs) who use MUST-Plus output in their everyday work. After coding the data, we mapped emergent themes onto select domains of the nonadoption, abandonment, scale-up, spread, and sustainability (NASSS) framework. Results: We interviewed 17 of the 24 RDs approached (71%), representing 37% of those who use MUST-Plus output. Several themes emerged: (1) enhancements to the tool were made to improve accuracy and usability; (2) MUST-Plus helped identify patients that would not otherwise be seen; perceived usefulness was highest in the original site; (3) perceived accuracy varied by respondent and site; (4) RDs valued autonomy in prioritizing patients; (5) depth of tool understanding varied by hospital and level; (6) MUST-Plus was integrated into workflows and electronic health records; and (7) RDs expressed a desire to eventually have 1 automated screener. Conclusions: Our findings suggest that continuous involvement of stakeholders at new sites given staff turnover is vital to ensure buy-in. Qualitative research can help identify the potential bias of ML tools and should be widely used to ensure health equity. Ongoing collaboration among CDSS developers, data scientists, and clinical providers may help refine CDSS for optimal use and improve the acceptability of CDSS in the clinical context. %M 37440303 %R 10.2196/42262 %U https://formative.jmir.org/2023/1/e42262 %U https://doi.org/10.2196/42262 %U http://www.ncbi.nlm.nih.gov/pubmed/37440303 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e42621 %T The FeatureCloud Platform for Federated Learning in Biomedicine: Unified Approach %A Matschinske,Julian %A Späth,Julian %A Bakhtiari,Mohammad %A Probul,Niklas %A Kazemi Majdabadi,Mohammad Mahdi %A Nasirigerdeh,Reza %A Torkzadehmahani,Reihaneh %A Hartebrodt,Anne %A Orban,Balazs-Attila %A Fejér,Sándor-József %A Zolotareva,Olga %A Das,Supratim %A Baumbach,Linda %A Pauling,Josch K %A Tomašević,Olivera %A Bihari,Béla %A Bloice,Marcus %A Donner,Nina C %A Fdhila,Walid %A Frisch,Tobias %A Hauschild,Anne-Christin %A Heider,Dominik %A Holzinger,Andreas %A Hötzendorfer,Walter %A Hospes,Jan %A Kacprowski,Tim %A Kastelitz,Markus %A List,Markus %A Mayer,Rudolf %A Moga,Mónika %A Müller,Heimo %A Pustozerova,Anastasia %A Röttger,Richard %A Saak,Christina C %A Saranti,Anna %A Schmidt,Harald H H W %A Tschohl,Christof %A Wenke,Nina K %A Baumbach,Jan %+ University of Hamburg, Notkestrasse 9, Hamburg, 22607, Germany, 49 40 42838 ext 7640, julian.matschinske@uni-hamburg.de %K privacy-preserving machine learning %K federated learning %K interactive platform %K artificial intelligence %K AI store %K privacy-enhancing technologies %K additive secret sharing %D 2023 %7 12.7.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Machine learning and artificial intelligence have shown promising results in many areas and are driven by the increasing amount of available data. However, these data are often distributed across different institutions and cannot be easily shared owing to strict privacy regulations. Federated learning (FL) allows the training of distributed machine learning models without sharing sensitive data. In addition, the implementation is time-consuming and requires advanced programming skills and complex technical infrastructures. Objective: Various tools and frameworks have been developed to simplify the development of FL algorithms and provide the necessary technical infrastructure. Although there are many high-quality frameworks, most focus only on a single application case or method. To our knowledge, there are no generic frameworks, meaning that the existing solutions are restricted to a particular type of algorithm or application field. Furthermore, most of these frameworks provide an application programming interface that needs programming knowledge. There is no collection of ready-to-use FL algorithms that are extendable and allow users (eg, researchers) without programming knowledge to apply FL. A central FL platform for both FL algorithm developers and users does not exist. This study aimed to address this gap and make FL available to everyone by developing FeatureCloud, an all-in-one platform for FL in biomedicine and beyond. Methods: The FeatureCloud platform consists of 3 main components: a global frontend, a global backend, and a local controller. Our platform uses a Docker to separate the local acting components of the platform from the sensitive data systems. We evaluated our platform using 4 different algorithms on 5 data sets for both accuracy and runtime. Results: FeatureCloud removes the complexity of distributed systems for developers and end users by providing a comprehensive platform for executing multi-institutional FL analyses and implementing FL algorithms. Through its integrated artificial intelligence store, federated algorithms can easily be published and reused by the community. To secure sensitive raw data, FeatureCloud supports privacy-enhancing technologies to secure the shared local models and assures high standards in data privacy to comply with the strict General Data Protection Regulation. Our evaluation shows that applications developed in FeatureCloud can produce highly similar results compared with centralized approaches and scale well for an increasing number of participating sites. Conclusions: FeatureCloud provides a ready-to-use platform that integrates the development and execution of FL algorithms while reducing the complexity to a minimum and removing the hurdles of federated infrastructure. Thus, we believe that it has the potential to greatly increase the accessibility of privacy-preserving and distributed data analyses in biomedicine and beyond. %M 37436815 %R 10.2196/42621 %U https://www.jmir.org/2023/1/e42621 %U https://doi.org/10.2196/42621 %U http://www.ncbi.nlm.nih.gov/pubmed/37436815 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e46328 %T Global COVID-19 Policy Engagement With Scientific Research Information: Altmetric Data Study %A Park,Han Woo %A Yoon,Ho Young %+ Division of Communication & Media, Ewha Womans University, Posco 620, Ewhayeodae Gil 52, Seodaemungu, Seoul, 03760, Republic of Korea, 82 2 3277 4491, hoyoungyoon@ewha.ac.kr %K altmetrics %K government policy report %K citation analysis %K COVID-19 %K World Health Organization %K WHO %K COVID-19 research %K online citation network %K policy domains %D 2023 %7 29.6.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Previous studies on COVID-19 scholarly articles have primarily focused on bibliometric characteristics, neglecting the identification of institutional actors that cite recent scientific contributions related to COVID-19 in the policy domain, and their locations. Objective: The purpose of this study was to assess the online citation network and knowledge structure of COVID-19 research across policy domains over 2 years from January 2020 to January 2022, with a particular emphasis on geographical frequency. Two research questions were addressed. The first question was related to who has been the most active in policy engagement with science and research information sharing during the COVID-19 pandemic, particularly in terms of countries and organization types. The second question was related to whether there are significant differences in the types of coronavirus research shared among countries and continents. Methods: The Altmetric database was used to collect policy report citations of scientific articles for 3 topic terms (COVID-19, COVID-19 vaccine, and COVID-19 variants). Altmetric provides the URLs of policy agencies that have cited COVID-19 research. The scientific articles used for Altmetric citations are extracted from journals indexed by PubMed. The numbers of COVID-19, COVID-19 vaccine, and COVID-19 variant research outputs between January 1, 2020, and January 31, 2022, were 216,787, 16,748, and 2777, respectively. The study examined the frequency of citations based on policy institutional domains, such as intergovernmental organizations, national and domestic governmental organizations, and nongovernmental organizations (think tanks and academic institutions). Results: The World Health Organization (WHO) stood out as the most notable institution citing COVID-19–related research outputs. The WHO actively sought and disseminated information regarding the COVID-19 pandemic. The COVID-19 vaccine citation network exhibited the most extensive connections in terms of degree centrality, 2-local eigenvector centrality, and eigenvector centrality among the 3 key terms. The Netherlands, the United States, the United Kingdom, and Australia were the countries that sought and shared the most information on COVID-19 vaccines, likely due to their high numbers of COVID-19 cases. Developing nations, although gaining quicker access to COVID-19 vaccine information, appeared to be relatively isolated from the enriched COVID-19 pandemic content in the global network. Conclusions: The global scientific network ecology during the COVID-19 pandemic revealed distinct types of links primarily centered around the WHO. Western countries demonstrated effective networking practices in constructing these networks. The prominent position of the key term “COVID-19 vaccine” demonstrates that nation-states align with global authority regardless of their national contexts. In summary, the citation networking practices of policy agencies have the potential to uncover the global knowledge distribution structure as a proxy for the networking strategy employed during a pandemic. %M 37384384 %R 10.2196/46328 %U https://www.jmir.org/2023/1/e46328 %U https://doi.org/10.2196/46328 %U http://www.ncbi.nlm.nih.gov/pubmed/37384384 %0 Journal Article %@ 2369-2960 %I JMIR Publications %V 9 %N %P e42149 %T Dashboard With Bump Charts to Visualize the Changes in the Rankings of Leading Causes of Death According to Two Lists: National Population-Based Time-Series Cross-Sectional Study %A Tai,Shu-Yu %A Chi,Ying-Chen %A Chien,Yu-Wen %A Kawachi,Ichiro %A Lu,Tsung-Hsueh %+ Department of Public Health, College of Medicine, National Cheng Kung University, No. 1, Dah Hsueh Road, East District, Tainan, 701, Taiwan, 886 0928389971, robertlu@mail.ncku.edu.tw %K COVID-19 %K dashboard %K data visualization %K leading causes of death %K mortality/trend %K ranking %K surveillance %K cause of mortality %K cause of death %K monitoring %K surveillance indicator %K health statistics %K mortality data %D 2023 %7 27.6.2023 %9 Original Paper %J JMIR Public Health Surveill %G English %X Background: Health advocates and the media often use the rankings of the leading causes of death (CODs) to draw attention to health issues with relatively high mortality burdens in a population. The National Center for Health Statistics (NCHS) publishes “Deaths: leading causes” annually. The ranking list used by the NCHS and statistical offices in several countries includes broad categories such as cancer, heart disease, and accidents. However, the list used by the World Health Organization (WHO) subdivides broad categories (17 for cancer, 8 for heart disease, and 6 for accidents) and classifies Alzheimer disease and related dementias and hypertensive diseases more comprehensively compared to the NCHS list. Regarding the data visualization of the rankings of leading CODs, the bar chart is the most commonly used graph; nevertheless, bar charts may not effectively reveal the changes in the rankings over time. Objective: The aim of this study is to use a dashboard with bump charts to visualize the changes in the rankings of the leading CODs in the United States by sex and age from 1999 to 2021, according to 2 lists (NCHS vs WHO). Methods: Data on the number of deaths in each category from each list for each year were obtained from the Wide-ranging Online Data for Epidemiologic Research system, maintained by the Center for Disease Control and Prevention. Rankings were based on the absolute number of deaths. The dashboard enables users to filter by list (NCHS or WHO) and demographic characteristics (sex and age) and highlight a particular COD. Results: Several CODs that were only on the WHO list, including brain, breast, colon, hematopoietic, lung, pancreas, prostate, and uterus cancer (all classified as cancer on the NCHS list); unintentional transport injury; poisoning; drowning; and falls (all classified as accidents on the NCHS list), were among the 10 leading CODs in several sex and age subgroups. In contrast, several CODs that appeared among the 10 leading CODs according to the NCHS list, such as pneumonia, kidney disease, cirrhosis, and sepsis, were excluded from the 10 leading CODs if the WHO list was used. The rank of Alzheimer disease and related dementias and hypertensive diseases according to the WHO list was higher than their ranks according to the NCHS list. A marked increase in the ranking of unintentional poisoning among men aged 45-64 years was noted from 2008 to 2021. Conclusions: A dashboard with bump charts can be used to improve the visualization of the changes in the rankings of leading CODs according to the WHO and NCHS lists as well as demographic characteristics; the visualization can help users make informed decisions regarding the most appropriate ranking list for their needs. %M 37368475 %R 10.2196/42149 %U https://publichealth.jmir.org/2023/1/e42149 %U https://doi.org/10.2196/42149 %U http://www.ncbi.nlm.nih.gov/pubmed/37368475 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e45614 %T Representation Learning and Spectral Clustering for the Development and External Validation of Dynamic Sepsis Phenotypes: Observational Cohort Study %A Boussina,Aaron %A Wardi,Gabriel %A Shashikumar,Supreeth Prajwal %A Malhotra,Atul %A Zheng,Kai %A Nemati,Shamim %+ Division of Biomedical Informatics, University of California, San Diego, 9500 Gilman Dr. MC 0990, La Jolla, CA, 92093, United States, 1 858 534 2230, aboussina@health.ucsd.edu %K sepsis %K phenotype %K emergency service, hospital %K disease progression %K artificial intelligence %K machine learning %K emergency %K infection %K clinical phenotype %K clinical phenotyping %K transition model %K transition modeling %D 2023 %7 23.6.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Recent attempts at clinical phenotyping for sepsis have shown promise in identifying groups of patients with distinct treatment responses. Nonetheless, the replicability and actionability of these phenotypes remain an issue because the patient trajectory is a function of both the patient’s physiological state and the interventions they receive. Objective: We aimed to develop a novel approach for deriving clinical phenotypes using unsupervised learning and transition modeling. Methods: Forty commonly used clinical variables from the electronic health record were used as inputs to a feed-forward neural network trained to predict the onset of sepsis. Using spectral clustering on the representations from this network, we derived and validated consistent phenotypes across a diverse cohort of patients with sepsis. We modeled phenotype dynamics as a Markov decision process with transitions as a function of the patient’s current state and the interventions they received. Results: Four consistent and distinct phenotypes were derived from over 11,500 adult patients who were admitted from the University of California, San Diego emergency department (ED) with sepsis between January 1, 2016, and January 31, 2020. Over 2000 adult patients admitted from the University of California, Irvine ED with sepsis between November 4, 2017, and August 4, 2022, were involved in the external validation. We demonstrate that sepsis phenotypes are not static and evolve in response to physiological factors and based on interventions. We show that roughly 45% of patients change phenotype membership within the first 6 hours of ED arrival. We observed consistent trends in patient dynamics as a function of interventions including early administration of antibiotics. Conclusions: We derived and describe 4 sepsis phenotypes present within 6 hours of triage in the ED. We observe that the administration of a 30 mL/kg fluid bolus may be associated with worse outcomes in certain phenotypes, whereas prompt antimicrobial therapy is associated with improved outcomes. %M 37351927 %R 10.2196/45614 %U https://www.jmir.org/2023/1/e45614 %U https://doi.org/10.2196/45614 %U http://www.ncbi.nlm.nih.gov/pubmed/37351927 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 11 %N %P e41576 %T Using the H2O Automatic Machine Learning Algorithms to Identify Predictors of Web-Based Medical Record Nonuse Among Patients in a Data-Rich Environment: Mixed Methods Study %A Chen,Yang %A Liu,Xuejiao %A Gao,Lei %A Zhu,Miao %A Shia,Ben-Chang %A Chen,Mingchih %A Ye,Linglong %A Qin,Lei %+ School of Statistics, University of International Business and Economics, No.10, Huixin Dongjie, Chaoyang District, Beijing, 100029, China, 86 01064491146, qinlei@uibe.edu.cn %K web-based medical record %K predictors %K H2O’s automatic machine learning %K Health Information National Trends Survey %K HINTS %K mobile phone %D 2023 %7 19.6.2023 %9 Original Paper %J JMIR Med Inform %G English %X Background: With the advent of electronic storage of medical records and the internet, patients can access web-based medical records. This has facilitated doctor-patient communication and built trust between them. However, many patients avoid using web-based medical records despite their greater availability and readability. Objective: On the basis of demographic and individual behavioral characteristics, this study explores the predictors of web-based medical record nonuse among patients. Methods: Data were collected from the National Cancer Institute 2019 to 2020 Health Information National Trends Survey. First, based on the data-rich environment, the chi-square test (categorical variables) and 2-tailed t tests (continuous variables) were performed on the response variables and the variables in the questionnaire. According to the test results, the variables were initially screened, and those that passed the test were selected for subsequent analysis. Second, participants were excluded from the study if any of the initially screened variables were missing. Third, the data obtained were modeled using 5 machine learning algorithms, namely, logistic regression, automatic generalized linear model, automatic random forest, automatic deep neural network, and automatic gradient boosting machine, to identify and investigate factors affecting web-based medical record nonuse. The aforementioned automatic machine learning algorithms were based on the R interface (R Foundation for Statistical Computing) of the H2O (H2O.ai) scalable machine learning platform. Finally, 5-fold cross-validation was adopted for 80% of the data set, which was used as the training data to determine hyperparameters of 5 algorithms, and 20% of the data set was used as the test data for model comparison. Results: Among the 9072 respondents, 5409 (59.62%) had no experience using web-based medical records. Using the 5 algorithms, 29 variables were identified as crucial predictors of nonuse of web-based medical records. These 29 variables comprised 6 (21%) sociodemographic variables (age, BMI, race, marital status, education, and income) and 23 (79%) variables related to individual lifestyles and behavioral habits (such as electronic and internet use, individuals’ health status and their level of health concern, etc). H2O’s automatic machine learning methods have a high model accuracy. On the basis of the performance of the validation data set, the optimal model was the automatic random forest with the highest area under the curve in the validation set (88.52%) and the test set (82.87%). Conclusions: When monitoring web-based medical record use trends, research should focus on social factors such as age, education, BMI, and marital status, as well as personal lifestyle and behavioral habits, including smoking, use of electronic devices and the internet, patients’ personal health status, and their level of health concern. The use of electronic medical records can be targeted to specific patient groups, allowing more people to benefit from their usefulness. %M 37335616 %R 10.2196/41576 %U https://medinform.jmir.org/2023/1/e41576 %U https://doi.org/10.2196/41576 %U http://www.ncbi.nlm.nih.gov/pubmed/37335616 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 12 %N %P e45823 %T Comparing Decentralized Learning Methods for Health Data Models to Nondecentralized Alternatives: Protocol for a Systematic Review %A Diniz,José Miguel %A Vasconcelos,Henrique %A Souza,Júlio %A Rb-Silva,Rita %A Ameijeiras-Rodriguez,Carolina %A Freitas,Alberto %+ PhD Program in Health Data Science, Faculty of Medicine, University of Porto, Rua Dr Plácido da Costa, 4200-450, Porto, Portugal, 351 225 513 622, jmdiniz.med@gmail.com %K decentralized learning %K distributed learning %K federated learning %K centralized learning %K privacy %K health %K health data %K secondary data use %K health data model %K blockchain %K health care %K data science %D 2023 %7 19.6.2023 %9 Protocol %J JMIR Res Protoc %G English %X Background: Considering the soaring health-related costs directed toward a growing, aging, and comorbid population, the health sector needs effective data-driven interventions while managing rising care costs. While health interventions using data mining have become more robust and adopted, they often demand high-quality big data. However, growing privacy concerns have hindered large-scale data sharing. In parallel, recently introduced legal instruments require complex implementations, especially when it comes to biomedical data. New privacy-preserving technologies, such as decentralized learning, make it possible to create health models without mobilizing data sets by using distributed computation principles. Several multinational partnerships, including a recent agreement between the United States and the European Union, are adopting these techniques for next-generation data science. While these approaches are promising, there is no clear and robust evidence synthesis of health care applications. Objective: The main aim is to compare the performance among health data models (eg, automated diagnosis and mortality prediction) developed using decentralized learning approaches (eg, federated and blockchain) to those using centralized or local methods. Secondary aims are comparing the privacy compromise and resource use among model architectures. Methods: We will conduct a systematic review using the first-ever registered research protocol for this topic following a robust search methodology, including several biomedical and computational databases. This work will compare health data models differing in development architecture, grouping them according to their clinical applications. For reporting purposes, a PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) 2020 flow diagram will be presented. CHARMS (Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modelling Studies)–based forms will be used for data extraction and to assess the risk of bias, alongside PROBAST (Prediction Model Risk of Bias Assessment Tool). All effect measures in the original studies will be reported. Results: The queries and data extractions are expected to start on February 28, 2023, and end by July 31, 2023. The research protocol was registered with PROSPERO, under the number 393126, on February 3, 2023. With this protocol, we detail how we will conduct the systematic review. With that study, we aim to summarize the progress and findings from state-of-the-art decentralized learning models in health care in comparison to their local and centralized counterparts. Results are expected to clarify the consensuses and heterogeneities reported and help guide the research and development of new robust and sustainable applications to address the health data privacy problem, with applicability in real-world settings. Conclusions: We expect to clearly present the status quo of these privacy-preserving technologies in health care. With this robust synthesis of the currently available scientific evidence, the review will inform health technology assessment and evidence-based decisions, from health professionals, data scientists, and policy makers alike. Importantly, it should also guide the development and application of new tools in service of patients’ privacy and future research. Trial Registration: PROSPERO 393126; https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=393126 International Registered Report Identifier (IRRID): PRR1-10.2196/45823 %M 37335606 %R 10.2196/45823 %U https://www.researchprotocols.org/2023/1/e45823 %U https://doi.org/10.2196/45823 %U http://www.ncbi.nlm.nih.gov/pubmed/37335606 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 7 %N %P e44567 %T Migrating a Well-Established Longitudinal Cohort Database From Oracle SQL to Research Electronic Data Entry (REDCap): Data Management Research and Design Study %A Kusejko,Katharina %A Smith,Daniel %A Scherrer,Alexandra %A Paioni,Paolo %A Kohns Vasconcelos,Malte %A Aebi-Popp,Karoline %A Kouyos,Roger D %A Günthard,Huldrych F %A Kahlert,Christian R %A , %+ Institute of Medical Virology, University of Zurich, Universitaetsstrasse 84, Zurich, 8006, Switzerland, 41 44 634 1913, katharina.kusejko@usz.ch %K REDCap %K cohort study %K data collection %K electronic case report forms %K eCRF %K software %K digital solution %K electronic data entry %K HIV %D 2023 %7 31.5.2023 %9 Original Paper %J JMIR Form Res %G English %X Background: Providing user-friendly electronic data collection tools for large multicenter studies is key for obtaining high-quality research data. Research Electronic Data Capture (REDCap) is a software solution developed for setting up research databases with integrated graphical user interfaces for electronic data entry. The Swiss Mother and Child HIV Cohort Study (MoCHiV) is a longitudinal cohort study with around 2 million data entries dating back to the early 1980s. Until 2022, data collection in MoCHiV was paper-based. Objective: The objective of this study was to provide a user-friendly graphical interface for electronic data entry for physicians and study nurses reporting MoCHiV data. Methods: MoCHiV collects information on obstetric events among women living with HIV and children born to mothers living with HIV. Until 2022, MoCHiV data were stored in an Oracle SQL relational database. In this project, R and REDCap were used to develop an electronic data entry platform for MoCHiV with migration of already collected data. Results: The key steps for providing an electronic data entry option for MoCHiV were (1) design, (2) data cleaning and formatting, (3) migration and compliance, and (4) add-on features. In the first step, the database structure was defined in REDCap, including the specification of primary and foreign keys, definition of study variables, and the hierarchy of questions (termed “branching logic”). In the second step, data stored in Oracle were cleaned and formatted to adhere to the defined database structure. Systematic data checks ensured compliance to all branching logic and levels of categorical variables. REDCap-specific variables and numbering of repeated events for enabling a relational data structure in REDCap were generated using R. In the third step, data were imported to REDCap and then systematically compared to the original data. In the last step, add-on features, such as data access groups, redirections, and summary reports, were integrated to facilitate data entry in the multicenter MoCHiV study. Conclusions: By combining different software tools—Oracle SQL, R, and REDCap—and building a systematic pipeline for data cleaning, formatting, and comparing, we were able to migrate a multicenter longitudinal cohort study from Oracle SQL to REDCap. REDCap offers a flexible way for developing customized study designs, even in the case of longitudinal studies with different study arms (ie, obstetric events, women, and mother-child pairs). However, REDCap does not offer built-in tools for preprocessing large data sets before data import. Additional software is needed (eg, R) for data formatting and cleaning to achieve the predefined REDCap data structure. %M 37256686 %R 10.2196/44567 %U https://formative.jmir.org/2023/1/e44567 %U https://doi.org/10.2196/44567 %U http://www.ncbi.nlm.nih.gov/pubmed/37256686 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e45171 %T Web-Based Social Networks of Individuals With Adverse Childhood Experiences: Quantitative Study %A Cao,Yiding %A Rajendran,Suraj %A Sundararajan,Prathic %A Law,Royal %A Bacon,Sarah %A Sumner,Steven A %A Masuda,Naoki %+ Department of Mathematics, State University of New York at Buffalo, North Campus, Buffalo, NY, 14260, United States, 1 716 645 8804, naokimas@gmail.com %K adverse childhood experience %K ACE %K social networks %K Twitter %K Reddit %K childhood %K abuse %K neglect %K violence %K substance use %K coping strategy %K coping %K interpersonal connection %K web-based connection %K behavior %K social connection %K resilience %D 2023 %7 30.5.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Adverse childhood experiences (ACEs), which include abuse and neglect and various household challenges such as exposure to intimate partner violence and substance use in the home, can have negative impacts on the lifelong health of affected individuals. Among various strategies for mitigating the adverse effects of ACEs is to enhance connectedness and social support for those who have experienced them. However, how the social networks of those who experienced ACEs differ from the social networks of those who did not is poorly understood. Objective: In this study, we used Reddit and Twitter data to investigate and compare social networks between individuals with and without ACE exposure. Methods: We first used a neural network classifier to identify the presence or absence of public ACE disclosures in social media posts. We then analyzed egocentric social networks comparing individuals with self-reported ACEs with those with no reported history. Results: We found that, although individuals reporting ACEs had fewer total followers in web-based social networks, they had higher reciprocity in following behavior (ie, mutual following with other users), a higher tendency to follow and be followed by other individuals with ACEs, and a higher tendency to follow back individuals with ACEs rather than individuals without ACEs. Conclusions: These results imply that individuals with ACEs may try to actively connect with others who have similar previous traumatic experiences as a positive connection and coping strategy. Supportive interpersonal connections on the web for individuals with ACEs appear to be a prevalent behavior and may be a way to enhance social connectedness and resilience in those who have experienced ACEs. %M 37252791 %R 10.2196/45171 %U https://www.jmir.org/2023/1/e45171 %U https://doi.org/10.2196/45171 %U http://www.ncbi.nlm.nih.gov/pubmed/37252791 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e45662 %T Generate Analysis-Ready Data for Real-world Evidence: Tutorial for Harnessing Electronic Health Records With Advanced Informatic Technologies %A Hou,Jue %A Zhao,Rachel %A Gronsbell,Jessica %A Lin,Yucong %A Bonzel,Clara-Lea %A Zeng,Qingyi %A Zhang,Sinian %A Beaulieu-Jones,Brett K %A Weber,Griffin M %A Jemielita,Thomas %A Wan,Shuyan Sabrina %A Hong,Chuan %A Cai,Tianrun %A Wen,Jun %A Ayakulangara Panickan,Vidul %A Liaw,Kai-Li %A Liao,Katherine %A Cai,Tianxi %+ Department of Biomedical Informatics, Harvard Medical School, 10 Shattuck Street, Room 434, Boston, MA, 02115, United States, 1 617 432 4923, tcai@hsph.harvard.edu %K electronic health records %K real-world evidence %K data curation %K medical informatics %K randomized controlled trials %K reproducibility %D 2023 %7 25.5.2023 %9 Tutorial %J J Med Internet Res %G English %X Although randomized controlled trials (RCTs) are the gold standard for establishing the efficacy and safety of a medical treatment, real-world evidence (RWE) generated from real-world data has been vital in postapproval monitoring and is being promoted for the regulatory process of experimental therapies. An emerging source of real-world data is electronic health records (EHRs), which contain detailed information on patient care in both structured (eg, diagnosis codes) and unstructured (eg, clinical notes and images) forms. Despite the granularity of the data available in EHRs, the critical variables required to reliably assess the relationship between a treatment and clinical outcome are challenging to extract. To address this fundamental challenge and accelerate the reliable use of EHRs for RWE, we introduce an integrated data curation and modeling pipeline consisting of 4 modules that leverage recent advances in natural language processing, computational phenotyping, and causal modeling techniques with noisy data. Module 1 consists of techniques for data harmonization. We use natural language processing to recognize clinical variables from RCT design documents and map the extracted variables to EHR features with description matching and knowledge networks. Module 2 then develops techniques for cohort construction using advanced phenotyping algorithms to both identify patients with diseases of interest and define the treatment arms. Module 3 introduces methods for variable curation, including a list of existing tools to extract baseline variables from different sources (eg, codified, free text, and medical imaging) and end points of various types (eg, death, binary, temporal, and numerical). Finally, module 4 presents validation and robust modeling methods, and we propose a strategy to create gold-standard labels for EHR variables of interest to validate data curation quality and perform subsequent causal modeling for RWE. In addition to the workflow proposed in our pipeline, we also develop a reporting guideline for RWE that covers the necessary information to facilitate transparent reporting and reproducibility of results. Moreover, our pipeline is highly data driven, enhancing study data with a rich variety of publicly available information and knowledge sources. We also showcase our pipeline and provide guidance on the deployment of relevant tools by revisiting the emulation of the Clinical Outcomes of Surgical Therapy Study Group Trial on laparoscopy-assisted colectomy versus open colectomy in patients with early-stage colon cancer. We also draw on existing literature on EHR emulation of RCTs together with our own studies with the Mass General Brigham EHR. %M 37227772 %R 10.2196/45662 %U https://www.jmir.org/2023/1/e45662 %U https://doi.org/10.2196/45662 %U http://www.ncbi.nlm.nih.gov/pubmed/37227772 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e44330 %T Text Analysis of Trends in Health Equity and Disparities From the Internal Revenue Service Tax Documentation Submitted by US Nonprofit Hospitals Between 2010 and 2019: Exploratory Study %A Hadley,Emily %A Marcial,Laura Haak %A Quattrone,Wes %A Bobashev,Georgiy %+ RTI International, 3040 East Cornwallis Road, Durham, NC, 27514, United States, 1 919 541 6000, ehadley@rti.org %K text mining %K natural language processing %K health care disparities %K hospital administration %D 2023 %7 24.5.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Many US hospitals are classified as nonprofits and receive tax-exempt status partially in exchange for providing benefits to the community. Proof of compliance is collected with the Schedule H form submitted as part of the annual Internal Revenue Service Form 990 (F990H), including a free-response text section that is known for being ambiguous and difficult to audit. This research is among the first to use natural language processing approaches to evaluate this text section with a focus on health equity and disparities. Objective: This study aims to determine the extent to which the free-response text in F990H reveals how nonprofit hospitals address health equity and disparities, including alignment with public priorities. Methods: We used free-response text submitted by hospital reporting entities in Part V and VI of the Internal Revenue Service Form 990 Schedule H between 2010 and 2019. We identified 29 main themes connected to health equity and disparities, and 152 related key phrases. We tallied occurrences of these phrases through term frequency analysis, calculated the Moran I statistic to assess geographic variation in 2018, analyzed Google Trends use for the same terms during the same period, and used semantic search with Sentence-BERT in Python to understand contextual use. Results: We found increased use from 2010 to 2019 across all the 29 phrase themes related to health equity and disparities. More than 90% of hospital reporting entities used terms in 2018 and 2019 related to affordability (2018: 2117/2131, 99.34%; 2019: 1620/1627, 99.57%), government organizations (2018: 2053/2131, 96.33%; 2019: 1577/1627, 96.93%), mental health (2018: 1937/2131, 90.9%; 2019: 1517/1627, 93.24%), and data collection (2018: 1947/2131, 91.37%; 2019: 1502/1627, 92.32%). The themes with the largest relative increase were LGBTQ (lesbian, gay, bisexual, transgender, and queer; 1676%; 2010: 12/2328, 0.51%; 2019: 149/1627, 9.16%) and social determinants of health (958%; 2010: 68/2328, 2.92%; 2019: 503/1627, 30.92%). Terms related to homelessness varied geographically from 2010 to 2018, and terms related to equity, health IT, immigration, LGBTQ, oral health, rural, social determinants of health, and substance use showed statistically significant (P<.05) geographic variation in 2018. The largest percentage point increase was for terms related to substance use (2010: 403/2328, 17.31%; 2019: 1149/1627, 70.62%). However, use in themes such as LGBTQ, disability, oral health, and race and ethnicity ranked lower than public interest in these topics, and some increased mentions of themes were to explicitly say that no action was taken. Conclusions: Hospital reporting entities demonstrate an increasing awareness of health equity and disparities in community benefit tax documentation, but these do not necessarily correspond with general population interests or additional action. We propose further investigation of alignment with community health needs assessments and make suggestions for improvements to F990H reporting requirements. %M 37223985 %R 10.2196/44330 %U https://www.jmir.org/2023/1/e44330 %U https://doi.org/10.2196/44330 %U http://www.ncbi.nlm.nih.gov/pubmed/37223985 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e42375 %T Risk Factors Associated With Primary Care–Reported Domestic Violence for Women Involved in Family Law Care Proceedings: Data Linkage Observational Study %A Johnson,Rhodri D %A Griffiths,Lucy J %A Cowley,Laura E %A Broadhurst,Karen %A Bailey,Rowena %+ Centre for Child & Family Justice Research, Sociology, Bowland College, Lancaster University, Bowland Ave E, Bailrigg, Lancaster, LA1 4YN, United Kingdom, 44 01524 594126, k.broadhurst@lancaster.ac.uk %K data linkage %K domestic violence %K domestic abuse %K health data %K family justice data %D 2023 %7 24.5.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Domestic violence and abuse (DVA) has a detrimental impact on the health and well-being of children and families but is commonly underreported, with an estimated prevalence of 5.5% in England and Wales in 2020. DVA is more common in groups considered vulnerable, including those involved in public law family court proceedings; however, there is a lack of evidence regarding risk factors for DVA among those involved in the family justice system. Objective: This study examines risk factors for DVA within a cohort of mothers involved in public law family court proceedings in Wales and a matched general population comparison group. Methods: We linked family justice data from the Children and Family Court Advisory and Support Service (Cafcass Cymru [Wales]) to demographic and electronic health records within the Secure Anonymised Information Linkage (SAIL) Databank. We constructed 2 study cohorts: mothers involved in public law family court proceedings (2011-2019) and a general population group of mothers not involved in public law family court proceedings, matched on key demographics (age and deprivation). We used published clinical codes to identify mothers with exposure to DVA documented in their primary care records and who therefore reported DVA to their general practitioner. Multiple logistic regression analyses were used to examine risk factors for primary care–recorded DVA. Results: Mothers involved in public law family court proceedings were 8 times more likely to have had exposure to DVA documented in their primary care records than the general population group (adjusted odds ratio [AOR] 8.0, 95% CI 6.6-9.7). Within the cohort of mothers involved in public law family court proceedings, risk factors for DVA with the greatest effect sizes included living in sparsely populated areas (AOR 3.9, 95% CI 2.8-5.5), assault-related emergency department attendances (AOR 2.2, 95% CI 1.5-3.1), and mental health conditions (AOR 1.7, 95% CI 1.3-2.2). An 8-fold increased risk of DVA emphasizes increased vulnerabilities for individuals involved in public law family court proceedings. Conclusions: Previously reported DVA risk factors do not necessarily apply to this group of women. The additional risk factors identified in this study could be considered for inclusion in national guidelines. The evidence that living in sparsely populated areas and assault-related emergency department attendances are associated with increased risk of DVA could be used to inform policy and practice interventions targeting prevention as well as tailored support services for those with exposure to DVA. However, further work should also explore other sources of DVA, such as that recorded in secondary health care, family, and criminal justice records, to understand the true scale of the problem. %M 37223967 %R 10.2196/42375 %U https://www.jmir.org/2023/1/e42375 %U https://doi.org/10.2196/42375 %U http://www.ncbi.nlm.nih.gov/pubmed/37223967 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 11 %N %P e41808 %T Using a Clinical Data Warehouse to Calculate and Present Key Metrics for the Radiology Department: Implementation and Performance Evaluation %A Liman,Leon %A May,Bernd %A Fette,Georg %A Krebs,Jonathan %A Puppe,Frank %+ Chair of Computer Science VI, Würzburg University, Am Hubland, Würzburg, 97074, Germany, 49 9313189250, leon.liman@uni-wuerzburg.de %K data warehouse %K electronic health records %K radiology %K statistics and numerical data %K hospital data %K eHealth %K medical records %D 2023 %7 22.5.2023 %9 Original Paper %J JMIR Med Inform %G English %X Background: Due to the importance of radiologic examinations, such as X-rays or computed tomography scans, for many clinical diagnoses, the optimal use of the radiology department is 1 of the primary goals of many hospitals. Objective: This study aims to calculate the key metrics of this use by creating a radiology data warehouse solution, where data from radiology information systems (RISs) can be imported and then queried using a query language as well as a graphical user interface (GUI). Methods: Using a simple configuration file, the developed system allowed for the processing of radiology data exported from any kind of RIS into a Microsoft Excel, comma-separated value (CSV), or JavaScript Object Notation (JSON) file. These data were then imported into a clinical data warehouse. Additional values based on the radiology data were calculated during this import process by implementing 1 of several provided interfaces. Afterward, the query language and GUI of the data warehouse were used to configure and calculate reports on these data. For the most common types of requested reports, a web interface was created to view their numbers as graphics. Results: The tool was successfully tested with the data of 4 different German hospitals from 2018 to 2021, with a total of 1,436,111 examinations. The user feedback was good, since all their queries could be answered if the available data were sufficient. The initial processing of the radiology data for using them with the clinical data warehouse took (depending on the amount of data provided by each hospital) between 7 minutes and 1 hour 11 minutes. Calculating 3 reports of different complexities on the data of each hospital was possible in 1-3 seconds for reports with up to 200 individual calculations and in up to 1.5 minutes for reports with up to 8200 individual calculations. Conclusions: A system was developed with the main advantage of being generic concerning the export of different RISs as well as concerning the configuration of queries for various reports. The queries could be configured easily using the GUI of the data warehouse, and their results could be exported into the standard formats Excel and CSV for further processing. %M 37213191 %R 10.2196/41808 %U https://medinform.jmir.org/2023/1/e41808 %U https://doi.org/10.2196/41808 %U http://www.ncbi.nlm.nih.gov/pubmed/37213191 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e41048 %T Development of Indirect Health Data Linkage on Health Product Use and Care Trajectories in France: Systematic Review %A Ranchon,Florence %A Chanoine,Sébastien %A Lambert-Lacroix,Sophie %A Bosson,Jean-Luc %A Moreau-Gaudry,Alexandre %A Bedouch,Pierrick %+ Hospices Civils de Lyon, Groupement Hospitalier Sud, Unité de pharmacie clinique oncologique, 165 chemin du grand Revoyet, Pierre-Bénite, 69495, France, 33 478864360, florence.ranchon@chu-lyon.fr %K data linkage %K health database %K deterministic approach %K probabilistic approach %K health products %K public health activity %K health data %K linkage %K France %K big data %K usability %K integration %K care trajectories %D 2023 %7 18.5.2023 %9 Review %J J Med Internet Res %G English %X Background: European national disparities in the integration of data linkage (ie, being able to match patient data between databases) into routine public health activities were recently highlighted. In France, the claims database covers almost the whole population from birth to death, offering a great research potential for data linkage. As the use of a common unique identifier to directly link personal data is often limited, linkage with a set of indirect key identifiers has been developed, which is associated with the linkage quality challenge to minimize errors in linked data. Objective: The aim of this systematic review is to analyze the type and quality of research publications on indirect data linkage on health product use and care trajectories in France. Methods: A comprehensive search for all papers published in PubMed/Medline and Embase databases up to December 31, 2022, involving linked French database focusing on health products use or care trajectories was realized. Only studies based on the use of indirect identifiers were included (ie, without a unique personal identifier available to easily link the databases). A descriptive analysis of data linkage with quality indicators and adherence to the Bohensky framework for evaluating data linkage studies was also realized. Results: In total, 16 papers were selected. Data linkage was performed at the national level in 7 (43.8%) cases or at the local level in 9 (56.2%) studies. The number of patients included in the different databases and resulting from data linkage varied greatly, respectively, from 713 to 75,000 patients and from 210 to 31,000 linked patients. The diseases studied were mainly chronic diseases and infections. The objectives of the data linkage were multiple: to estimate the risk of adverse drug reactions (ADRs; n=6, 37.5%), to reconstruct the patient’s care trajectory (n=5, 31.3%), to describe therapeutic uses (n=2, 12.5%), to evaluate the benefits of treatments (n=2, 12.5%), and to evaluate treatment adherence (n=1, 6.3%). Registries are the most frequently linked databases with French claims data. No studies have looked at linking with a hospital data warehouse, a clinical trial database, or patient self-reported databases. The linkage approach was deterministic in 7 (43.8%) studies, probabilistic in 4 (25.0%) studies, and not specified in 5 (31.3%) studies. The linkage rate was mainly from 80% to 90% (reported in 11/15, 73.3%, studies). Adherence to the Bohensky framework for evaluating data linkage studies showed that the description of the source databases for the linkage was always performed but that the completion rate and accuracy of the variables to be linked were not systematically described. Conclusions: This review highlights the growing interest in health data linkage in France. Nevertheless, regulatory, technical, and human constraints remain major obstacles to their deployment. The volume, variety, and validity of the data represent a real challenge, and advanced expertise and skills in statistical analysis and artificial intelligence are required to treat these big data. %M 37200084 %R 10.2196/41048 %U https://www.jmir.org/2023/1/e41048 %U https://doi.org/10.2196/41048 %U http://www.ncbi.nlm.nih.gov/pubmed/37200084 %0 Journal Article %@ 2563-3570 %I JMIR Publications %V 4 %N %P e37306 %T The Identification of Potential Drugs for Dengue Hemorrhagic Fever: Network-Based Drug Reprofiling Study %A Kochuthakidiyel Suresh,Praveenkumar %A Sekar,Gnanasoundari %A Mallady,Kavya %A Wan Ab Rahman,Wan Suriana %A Shima Shahidan,Wan Nazatul %A Venkatesan,Gokulakannan %+ School of Dental Sciences, Universiti Sains Malaysia, Health Campus, 16150 Kubang Kerian, Kota Bharu, Kelantan, Kelantan, 16150, Malaysia, 60 162543854, gokulkannancmr@gmail.com %K dengue hemorrhagic fever %K drug reprofiling %K network pharmacology %K network medicine %K DHF %K repurposable drugs %K viral fevers %K drug repurposing %D 2023 %7 9.5.2023 %9 Original Paper %J JMIR Bioinform Biotech %G English %X Background: Dengue fever can progress to dengue hemorrhagic fever (DHF), a more serious and occasionally fatal form of the disease. Indicators of serious disease arise about the time the fever begins to reduce (typically 3 to 7 days following symptom onset). There are currently no effective antivirals available. Drug repurposing is an emerging drug discovery process for rapidly developing effective DHF therapies. Through network pharmacology modeling, several US Food and Drug Administration (FDA)-approved medications have already been researched for various viral outbreaks. Objective: We aimed to identify potentially repurposable drugs for DHF among existing FDA-approved drugs for viral attacks, symptoms of viral fevers, and DHF. Methods: Using target identification databases (GeneCards and DrugBank), we identified human–DHF virus interacting genes and drug targets against these genes. We determined hub genes and potential drugs with a network-based analysis. We performed functional enrichment and network analyses to identify pathways, protein-protein interactions, tissues where the gene expression was high, and disease-gene associations. Results: Analyzing virus-host interactions and therapeutic targets in the human genome network revealed 45 repurposable medicines. Hub network analysis of host-virus-drug associations suggested that aspirin, captopril, and rilonacept might efficiently treat DHF. Gene enrichment analysis supported these findings. According to a Mayo Clinic report, using aspirin in the treatment of dengue fever may increase the risk of bleeding complications, but several studies from around the world suggest that thrombosis is associated with DHF. The human interactome contains the genes prostaglandin-endoperoxide synthase 2 (PTGS2), angiotensin converting enzyme (ACE), and coagulation factor II, thrombin (F2), which have been documented to have a role in the pathogenesis of disease progression in DHF, and our analysis of most of the drugs targeting these genes showed that the hub gene module (human-virus-drug) was highly enriched in tissues associated with the immune system (P=7.29 × 10–24) and human umbilical vein endothelial cells (P=1.83 × 10–20); this group of tissues acts as an anticoagulant barrier between the vessel walls and blood. Kegg analysis showed an association with genes linked to cancer (P=1.13 × 10–14) and the advanced glycation end products–receptor for advanced glycation end products signaling pathway in diabetic complications (P=3.52 × 10–14), which indicates that DHF patients with diabetes and cancer are at risk of higher pathogenicity. Thus, gene-targeting medications may play a significant part in limiting or worsening the condition of DHF patients. Conclusions: Aspirin is not usually prescribed for dengue fever because of bleeding complications, but it has been reported that using aspirin in lower doses is beneficial in the management of diseases with thrombosis. Drug repurposing is an emerging field in which clinical validation and dosage identification are required before the drug is prescribed. Further retrospective and collaborative international trials are essential for understanding the pathogenesis of this condition. %R 10.2196/37306 %U https://bioinform.jmir.org/2023/1/e37306 %U https://doi.org/10.2196/37306 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 11 %N %P e45534 %T Understanding Views Around the Creation of a Consented, Donated Databank of Clinical Free Text to Develop and Train Natural Language Processing Models for Research: Focus Group Interviews With Stakeholders %A Fitzpatrick,Natalie K %A Dobson,Richard %A Roberts,Angus %A Jones,Kerina %A Shah,Anoop D %A Nenadic,Goran %A Ford,Elizabeth %+ Institute of Health Informatics, University College London, 222 Euston Road, London, NW1 2DA, United Kingdom, 44 7808032697, n.fitzpatrick@ucl.ac.uk %K consent %K databank %K electronic health records %K free text %K governance %K natural language processing %K public involvement %K unstructured text %D 2023 %7 3.5.2023 %9 Original Paper %J JMIR Med Inform %G English %X Background: Information stored within electronic health records is often recorded as unstructured text. Special computerized natural language processing (NLP) tools are needed to process this text; however, complex governance arrangements make such data in the National Health Service hard to access, and therefore, it is difficult to use for research in improving NLP methods. The creation of a donated databank of clinical free text could provide an important opportunity for researchers to develop NLP methods and tools and may circumvent delays in accessing the data needed to train the models. However, to date, there has been little or no engagement with stakeholders on the acceptability and design considerations of establishing a free-text databank for this purpose. Objective: This study aimed to ascertain stakeholder views around the creation of a consented, donated databank of clinical free text to help create, train, and evaluate NLP for clinical research and to inform the potential next steps for adopting a partner-led approach to establish a national, funded databank of free text for use by the research community. Methods: Web-based in-depth focus group interviews were conducted with 4 stakeholder groups (patients and members of the public, clinicians, information governance leads and research ethics members, and NLP researchers). Results: All stakeholder groups were strongly in favor of the databank and saw great value in creating an environment where NLP tools can be tested and trained to improve their accuracy. Participants highlighted a range of complex issues for consideration as the databank is developed, including communicating the intended purpose, the approach to access and safeguarding the data, who should have access, and how to fund the databank. Participants recommended that a small-scale, gradual approach be adopted to start to gather donations and encouraged further engagement with stakeholders to develop a road map and set of standards for the databank. Conclusions: These findings provide a clear mandate to begin developing the databank and a framework for stakeholder expectations, which we would aim to meet with the databank delivery. %M 37133927 %R 10.2196/45534 %U https://medinform.jmir.org/2023/1/e45534 %U https://doi.org/10.2196/45534 %U http://www.ncbi.nlm.nih.gov/pubmed/37133927 %0 Journal Article %@ 2561-1011 %I JMIR Publications %V 7 %N %P e40524 %T Data Quality Degradation on Prediction Models Generated From Continuous Activity and Heart Rate Monitoring: Exploratory Analysis Using Simulation %A Hearn,Jason %A Van den Eynde,Jef %A Chinni,Bhargava %A Cedars,Ari %A Gottlieb Sen,Danielle %A Kutty,Shelby %A Manlhiot,Cedric %+ Blalock-Taussig-Thomas Heart Center, Johns Hopkins University, 1800 Orleans Street, Baltimore, MD, 21287, United States, 1 410 614 8481, cmanlhi1@jhmi.edu %K wearables %K time series %K data reliability %K prediction models %K hear rate %K monitoring %K data %K reliability %K clinical %K sleep %K data set %K cardiac %K physiological %K accuracy %K consumer %K wearables %K device %D 2023 %7 3.5.2023 %9 Original Paper %J JMIR Cardio %G English %X Background: Limited data accuracy is often cited as a reason for caution in the integration of physiological data obtained from consumer-oriented wearable devices in care management pathways. The effect of decreasing accuracy on predictive models generated from these data has not been previously investigated. Objective: The aim of this study is to simulate the effect of data degradation on the reliability of prediction models generated from those data and thus determine the extent to which lower device accuracy might or might not limit their use in clinical settings. Methods: Using the Multilevel Monitoring of Activity and Sleep in Healthy People data set, which includes continuous free-living step count and heart rate data from 21 healthy volunteers, we trained a random forest model to predict cardiac competence. Model performance in 75 perturbed data sets with increasing missingness, noisiness, bias, and a combination of all 3 perturbations was compared to model performance for the unperturbed data set. Results: The unperturbed data set achieved a mean root mean square error (RMSE) of 0.079 (SD 0.001) in predicting cardiac competence index. For all types of perturbations, RMSE remained stable up to 20%-30% perturbation. Above this level, RMSE started increasing and reached the point at which the model was no longer predictive at 80% for noise, 50% for missingness, and 35% for the combination of all perturbations. Introducing systematic bias in the underlying data had no effect on RMSE. Conclusions: In this proof-of-concept study, the performance of predictive models for cardiac competence generated from continuously acquired physiological data was relatively stable with declining quality of the source data. As such, lower accuracy of consumer-oriented wearable devices might not be an absolute contraindication for their use in clinical prediction models. %M 37133921 %R 10.2196/40524 %U https://cardio.jmir.org/2023/1/e40524 %U https://doi.org/10.2196/40524 %U http://www.ncbi.nlm.nih.gov/pubmed/37133921 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e43802 %T Health-Related Data Sources Accessible to Health Researchers From the US Government: Mapping Review %A Annis,Ann %A Reaves,Crista %A Sender,Jessica %A Bumpus,Sherry %+ College of Nursing, Michigan State University, 1355 Bogue St, Bott Building, East Lansing, MI, 48824, United States, 1 2488903361, annisann@msu.edu %K data sets as topic %K federal government %K data collection %K survey %K questionnaire %K health surveys %K big data %K government %K data set %K public domain %K data source %K systematic review %K mapping review %K review method %K open data %K health research %D 2023 %7 27.4.2023 %9 Review %J J Med Internet Res %G English %X Background: Big data from large, government-sponsored surveys and data sets offers researchers opportunities to conduct population-based studies of important health issues in the United States, as well as develop preliminary data to support proposed future work. Yet, navigating these national data sources is challenging. Despite the widespread availability of national data, there is little guidance for researchers on how to access and evaluate the use of these resources. Objective: Our aim was to identify and summarize a comprehensive list of federally sponsored, health- and health care–related data sources that are accessible in the public domain in order to facilitate their use by researchers. Methods: We conducted a systematic mapping review of government sources of health-related data on US populations and with active or recent (previous 10 years) data collection. The key measures were government sponsor, overview and purpose of data, population of interest, sampling design, sample size, data collection methodology, type and description of data, and cost to obtain data. Convergent synthesis was used to aggregate findings. Results: Among 106 unique data sources, 57 met the inclusion criteria. Data sources were classified as survey or assessment data (n=30, 53%), trends data (n=27, 47%), summative processed data (n=27, 47%), primary registry data (n=17, 30%), and evaluative data (n=11, 19%). Most (n=39, 68%) served more than 1 purpose. The population of interest included individuals/patients (n=40, 70%), providers (n=15, 26%), and health care sites and systems (n=14, 25%). The sources collected data on demographic (n=44, 77%) and clinical information (n=35, 61%), health behaviors (n=24, 42%), provider or practice characteristics (n=22, 39%), health care costs (n=17, 30%), and laboratory tests (n=8, 14%). Most (n=43, 75%) offered free data sets. Conclusions: A broad scope of national health data is accessible to researchers. These data provide insights into important health issues and the nation’s health care system while eliminating the burden of primary data collection. Data standardization and uniformity were uncommon across government entities, highlighting a need to improve data consistency. Secondary analyses of national data are a feasible, cost-efficient means to address national health concerns. %M 37103987 %R 10.2196/43802 %U https://www.jmir.org/2023/1/e43802 %U https://doi.org/10.2196/43802 %U http://www.ncbi.nlm.nih.gov/pubmed/37103987 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 7 %N %P e40805 %T Visualization of Traditional Chinese Medicine Formulas: Development and Usability Study %A Wu,Zhiyue %A Peng,Suyuan %A Zhou,Liang %+ National Institute of Health Data Science, Peking University, No. 38 Xueyuan Rd., Beijing, 100191, China, 86 10 82806532, zhoul@bjmu.edu.cn %K visualization %K Chinese medicine formulas %K interactive data analysis %K traditional Chinese medicine %K multifaceted data visualization %K five elements %D 2023 %7 21.4.2023 %9 Original Paper %J JMIR Form Res %G English %X Background: Traditional Chinese medicine (TCM) formulas are combinations of Chinese herbal medicines. Knowledge of classic medicine formulas is the basis of TCM diagnosis and treatment and is the core of TCM inheritance. The large number and flexibility of medicine formulas make memorization difficult, and understanding their composition rules is even more difficult. The multifaceted and multidimensional properties of herbal medicines are important for understanding the formula; however, these are usually separated from the formula information. Furthermore, these data are presented as text and cannot be analyzed jointly and interactively. Objective: We aimed to devise a visualization method for TCM formulas that shows the composition of medicine formulas and the multidimensional properties of herbal medicines involved and supports the comparison of medicine formulas. Methods: A TCM formula visualization method with multiple linked views is proposed and implemented as a web-based tool after close collaboration between visualization and TCM experts. The composition of medicine formulas is visualized in a formula view with a similarity-based layout supporting the comparison of compositing herbs; a shared herb view complements the formula view by showing all overlaps of pair-wise formulas; and a dimensionality-reduction plot of herbs enables the visualization of multidimensional herb properties. The usefulness of the tool was evaluated through a usability study with TCM experts. Results: Our method was applied to 2 typical categories of medicine formulas, namely tonic formulas and heat-clearing formulas, which contain 20 and 26 formulas composed of 58 and 73 herbal medicines, respectively. Each herbal medicine has a 23-dimensional characterizing attribute. In the usability study, TCM experts explored the 2 data sets with our web-based tool and quickly gained insight into formulas and herbs of interest, as well as the overall features of the formula groups that are difficult to identify with the traditional text-based method. Moreover, feedback from the experts indicated the usefulness of the proposed method. Conclusions: Our TCM formula visualization method is able to visualize and compare complex medicine formulas and the multidimensional attributes of herbal medicines using a web-based tool. TCM experts gained insights into 2 typical medicine formula categories using our method. Overall, the new method is a promising first step toward new TCM formula education and analysis methodologies. %M 37083631 %R 10.2196/40805 %U https://formative.jmir.org/2023/1/e40805 %U https://doi.org/10.2196/40805 %U http://www.ncbi.nlm.nih.gov/pubmed/37083631 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e39259 %T The Influence of Paid Memberships on Physician Rating Websites With the Example of the German Portal Jameda: Descriptive Cross-sectional Study %A Armbruster,Friedrich Aaron David %A Brüggmann,Dörthe %A Groneberg,David Alexander %A Bendels,Michael %+ Institute of Occupational, Social and Environmental Medicine, Goethe University, Theodor-Stern-Kai 7, Frankfurt, 60590, Germany, 49 6963016650, s3357030@stud.uni-frankfurt.de %K physician rating websites %K physician rating portals %K paid influence %K Germany %D 2023 %7 4.4.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: The majority of Germans see a deficit in information availability for choosing a physician. An increasing number of people use physician rating websites and decide upon the information provided. In Germany, the most popular physician rating website is Jameda.de, which offers monthly paid membership plans. The platform operator states that paid memberships have no influence on the rating indicators or list placement. Objective: The goal of this study was to investigate whether a physician’s membership status might be related to his or her quantitative evaluation factors and to possibly quantify these effects. Methods: Physician profiles were retrieved through the search mask on Jameda.de website. Physicians from 8 disciplines in Germany’s 12 most populous cities were specified as search criteria. Data Analysis and visualization were done with Matlab. Significance testing was conducted using a single factor ANOVA test followed by a multiple comparison test (Tukey Test). For analysis, the profiles were grouped according to member status (nonpaying, Gold, and Platinum) and analyzed according to the target variables—physician rating score, individual patient’s ratings, number of evaluations, recommendation quota, number of colleague recommendations, and profile views. Results: A total of 21,837 nonpaying profiles, 2904 Gold, and 808 Platinum member profiles were acquired. Statistically significant differences were found between paying (Gold and Platinum) and nonpaying profiles in all parameters we examined. The distribution of patient reviews differed also by membership status. Paying profiles had more ratings, a better overall physician rating, a higher recommendation quota, and more colleague recommendations, and they were visited more frequently than nonpaying physicians’ profiles. Statistically significant differences were found in most evaluation parameters within the paid membership packages in the sample analyzed. Conclusions: Paid physician profiles could be interpreted to be optimized for decision-making criteria of potential patients. With our data, it is not possible to draw any conclusions of mechanisms that alter physicians’ ratings. Further research is needed to investigate the causes for the observed effects. %M 37014690 %R 10.2196/39259 %U https://www.jmir.org/2023/1/e39259 %U https://doi.org/10.2196/39259 %U http://www.ncbi.nlm.nih.gov/pubmed/37014690 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e42615 %T Digital Health Data Quality Issues: Systematic Review %A Syed,Rehan %A Eden,Rebekah %A Makasi,Tendai %A Chukwudi,Ignatius %A Mamudu,Azumah %A Kamalpour,Mostafa %A Kapugama Geeganage,Dakshi %A Sadeghianasl,Sareh %A Leemans,Sander J J %A Goel,Kanika %A Andrews,Robert %A Wynn,Moe Thandar %A ter Hofstede,Arthur %A Myers,Trina %+ School of Information Systems, Faculty of Science, Queensland University of Technology, 2 George Street, Brisbane, 4000, Australia, 61 7 3138 9360, r.syed@qut.edu.au %K data quality %K digital health %K electronic health record %K eHealth %K systematic reviews %D 2023 %7 31.3.2023 %9 Review %J J Med Internet Res %G English %X Background: The promise of digital health is principally dependent on the ability to electronically capture data that can be analyzed to improve decision-making. However, the ability to effectively harness data has proven elusive, largely because of the quality of the data captured. Despite the importance of data quality (DQ), an agreed-upon DQ taxonomy evades literature. When consolidated frameworks are developed, the dimensions are often fragmented, without consideration of the interrelationships among the dimensions or their resultant impact. Objective: The aim of this study was to develop a consolidated digital health DQ dimension and outcome (DQ-DO) framework to provide insights into 3 research questions: What are the dimensions of digital health DQ? How are the dimensions of digital health DQ related? and What are the impacts of digital health DQ? Methods: Following the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines, a developmental systematic literature review was conducted of peer-reviewed literature focusing on digital health DQ in predominately hospital settings. A total of 227 relevant articles were retrieved and inductively analyzed to identify digital health DQ dimensions and outcomes. The inductive analysis was performed through open coding, constant comparison, and card sorting with subject matter experts to identify digital health DQ dimensions and digital health DQ outcomes. Subsequently, a computer-assisted analysis was performed and verified by DQ experts to identify the interrelationships among the DQ dimensions and relationships between DQ dimensions and outcomes. The analysis resulted in the development of the DQ-DO framework. Results: The digital health DQ-DO framework consists of 6 dimensions of DQ, namely accessibility, accuracy, completeness, consistency, contextual validity, and currency; interrelationships among the dimensions of digital health DQ, with consistency being the most influential dimension impacting all other digital health DQ dimensions; 5 digital health DQ outcomes, namely clinical, clinician, research-related, business process, and organizational outcomes; and relationships between the digital health DQ dimensions and DQ outcomes, with the consistency and accessibility dimensions impacting all DQ outcomes. Conclusions: The DQ-DO framework developed in this study demonstrates the complexity of digital health DQ and the necessity for reducing digital health DQ issues. The framework further provides health care executives with holistic insights into DQ issues and resultant outcomes, which can help them prioritize which DQ-related problems to tackle first. %M 37000497 %R 10.2196/42615 %U https://www.jmir.org/2023/1/e42615 %U https://doi.org/10.2196/42615 %U http://www.ncbi.nlm.nih.gov/pubmed/37000497 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e41588 %T Federated Machine Learning, Privacy-Enhancing Technologies, and Data Protection Laws in Medical Research: Scoping Review %A Brauneck,Alissa %A Schmalhorst,Louisa %A Kazemi Majdabadi,Mohammad Mahdi %A Bakhtiari,Mohammad %A Völker,Uwe %A Baumbach,Jan %A Baumbach,Linda %A Buchholtz,Gabriele %+ Hamburg University Faculty of Law, University of Hamburg, Rothenbaumchaussee 33, Hamburg, 20148, Germany, 49 40 42838 2328, alissa.brauneck@uni-hamburg.de %K federated learning %K data protection regulation %K data protection by design %K privacy protection %K General Data Protection Regulation compliance %K GDPR compliance %K privacy-preserving technologies %K differential privacy %K secure multiparty computation %D 2023 %7 30.3.2023 %9 Review %J J Med Internet Res %G English %X Background: The collection, storage, and analysis of large data sets are relevant in many sectors. Especially in the medical field, the processing of patient data promises great progress in personalized health care. However, it is strictly regulated, such as by the General Data Protection Regulation (GDPR). These regulations mandate strict data security and data protection and, thus, create major challenges for collecting and using large data sets. Technologies such as federated learning (FL), especially paired with differential privacy (DP) and secure multiparty computation (SMPC), aim to solve these challenges. Objective: This scoping review aimed to summarize the current discussion on the legal questions and concerns related to FL systems in medical research. We were particularly interested in whether and to what extent FL applications and training processes are compliant with the GDPR data protection law and whether the use of the aforementioned privacy-enhancing technologies (DP and SMPC) affects this legal compliance. We placed special emphasis on the consequences for medical research and development. Methods: We performed a scoping review according to the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews). We reviewed articles on Beck-Online, SSRN, ScienceDirect, arXiv, and Google Scholar published in German or English between 2016 and 2022. We examined 4 questions: whether local and global models are “personal data” as per the GDPR; what the “roles” as defined by the GDPR of various parties in FL are; who controls the data at various stages of the training process; and how, if at all, the use of privacy-enhancing technologies affects these findings. Results: We identified and summarized the findings of 56 relevant publications on FL. Local and likely also global models constitute personal data according to the GDPR. FL strengthens data protection but is still vulnerable to a number of attacks and the possibility of data leakage. These concerns can be successfully addressed through the privacy-enhancing technologies SMPC and DP. Conclusions: Combining FL with SMPC and DP is necessary to fulfill the legal data protection requirements (GDPR) in medical research dealing with personal data. Even though some technical and legal challenges remain, for example, the possibility of successful attacks on the system, combining FL with SMPC and DP creates enough security to satisfy the legal requirements of the GDPR. This combination thereby provides an attractive technical solution for health institutions willing to collaborate without exposing their data to risk. From a legal perspective, the combination provides enough built-in security measures to satisfy data protection requirements, and from a technical perspective, the combination provides secure systems with comparable performance with centralized machine learning applications. %M 36995759 %R 10.2196/41588 %U https://www.jmir.org/2023/1/e41588 %U https://doi.org/10.2196/41588 %U http://www.ncbi.nlm.nih.gov/pubmed/36995759 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e46700 %T mHealth Systems Need a Privacy-by-Design Approach: Commentary on “Federated Machine Learning, Privacy-Enhancing Technologies, and Data Protection Laws in Medical Research: Scoping Review” %A Tewari,Ambuj %+ Department of Statistics, University of Michigan, 1085 S University Ave, Ann Arbor, MI, 48109-1107, United States, 1 734 615 0928, tewaria@umich.edu %K mHealth %K differential privacy %K private synthetic data %K federated learning %K data protection regulation %K data protection by design %K privacy protection %K General Data Protection Regulation %K GDPR compliance %K privacy-preserving technologies %K secure multiparty computation %K multiparty computation %K machine learning %K privacy %D 2023 %7 30.3.2023 %9 Commentary %J J Med Internet Res %G English %X Brauneck and colleagues have combined technical and legal perspectives in their timely and valuable paper “Federated Machine Learning, Privacy-Enhancing Technologies, and Data Protection Laws in Medical Research: Scoping Review.” Researchers who design mobile health (mHealth) systems must adopt the same privacy-by-design approach that privacy regulations (eg, General Data Protection Regulation) do. In order to do this successfully, we will have to overcome implementation challenges in privacy-enhancing technologies such as differential privacy. We will also have to pay close attention to emerging technologies such as private synthetic data generation. %M 36995757 %R 10.2196/46700 %U https://www.jmir.org/2023/1/e46700 %U https://doi.org/10.2196/46700 %U http://www.ncbi.nlm.nih.gov/pubmed/36995757 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e41882 %T Examining Homophily, Language Coordination, and Analytical Thinking in Web-Based Conversations About Vaccines on Reddit: Study Using Deep Neural Network Language Models and Computer-Assisted Conversational Analyses %A Li,Yue %A Gee,William %A Jin,Kun %A Bond,Robert %+ School of Communication, The Ohio State University, Derby Hall, 3072, 154 N. Oval Mall, Columbus, OH, 43210, United States, 1 6142923400, bond.136@osu.edu %K vaccine hesitancy %K social media %K web-based conversations %K neural network language models %K computer-assisted conversational analyses %D 2023 %7 23.3.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Vaccine hesitancy has been deemed one of the top 10 threats to global health. Antivaccine information on social media is a major barrier to addressing vaccine hesitancy. Understanding how vaccine proponents and opponents interact with each other on social media may help address vaccine hesitancy. Objective: We aimed to examine conversations between vaccine proponents and opponents on Reddit to understand whether homophily in web-based conversations impedes opinion exchange, whether people are able to accommodate their languages to each other in web-based conversations, and whether engaging with opposing viewpoints stimulates higher levels of analytical thinking. Methods: We analyzed large-scale conversational text data about human vaccines on Reddit from 2016 to 2018. Using deep neural network language models and computer-assisted conversational analyses, we obtained each Redditor’s stance on vaccines, each post’s stance on vaccines, each Redditor’s language coordination score, and each post or comment’s analytical thinking score. We then performed chi-square tests, 2-tailed t tests, and multilevel modeling to test 3 questions of interest. Results: The results show that both provaccine and antivaccine Redditors are more likely to selectively respond to Redditors who indicate similar views on vaccines (P<.001). When Redditors interact with others who hold opposing views on vaccines, both provaccine and antivaccine Redditors accommodate their language to out-group members (provaccine Redditors: P=.044; antivaccine Redditors: P=.047) and show no difference in analytical thinking compared with interacting with congruent views (P=.63), suggesting that Redditors do not engage in motivated reasoning. Antivaccine Redditors, on average, showed higher analytical thinking in their posts and comments than provaccine Redditors (P<.001). Conclusions: This study shows that although vaccine proponents and opponents selectively communicate with their in-group members on Reddit, they accommodate their language and do not engage in motivated reasoning when communicating with out-group members. These findings may have implications for the design of provaccine campaigns on social media. %M 36951921 %R 10.2196/41882 %U https://www.jmir.org/2023/1/e41882 %U https://doi.org/10.2196/41882 %U http://www.ncbi.nlm.nih.gov/pubmed/36951921 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e35568 %T Automating Quality Assessment of Medical Evidence in Systematic Reviews: Model Development and Validation Study %A Šuster,Simon %A Baldwin,Timothy %A Lau,Jey Han %A Jimeno Yepes,Antonio %A Martinez Iraola,David %A Otmakhova,Yulia %A Verspoor,Karin %+ School of Computing and Information Systems, University of Melbourne, Parkville, Melbourne, 3000, Australia, 61 (03) 9035 4422, simon.suster@unimelb.edu.au %K critical appraisal %K evidence synthesis %K systematic reviews %K bias detection %K automated quality assessment %D 2023 %7 13.3.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Assessment of the quality of medical evidence available on the web is a critical step in the preparation of systematic reviews. Existing tools that automate parts of this task validate the quality of individual studies but not of entire bodies of evidence and focus on a restricted set of quality criteria. Objective: We proposed a quality assessment task that provides an overall quality rating for each body of evidence (BoE), as well as finer-grained justification for different quality criteria according to the Grading of Recommendation, Assessment, Development, and Evaluation formalization framework. For this purpose, we constructed a new data set and developed a machine learning baseline system (EvidenceGRADEr). Methods: We algorithmically extracted quality-related data from all summaries of findings found in the Cochrane Database of Systematic Reviews. Each BoE was defined by a set of population, intervention, comparison, and outcome criteria and assigned a quality grade (high, moderate, low, or very low) together with quality criteria (justification) that influenced that decision. Different statistical data, metadata about the review, and parts of the review text were extracted as support for grading each BoE. After pruning the resulting data set with various quality checks, we used it to train several neural-model variants. The predictions were compared against the labels originally assigned by the authors of the systematic reviews. Results: Our quality assessment data set, Cochrane Database of Systematic Reviews Quality of Evidence, contains 13,440 instances, or BoEs labeled for quality, originating from 2252 systematic reviews published on the internet from 2002 to 2020. On the basis of a 10-fold cross-validation, the best neural binary classifiers for quality criteria detected risk of bias at 0.78 F1 (P=.68; R=0.92) and imprecision at 0.75 F1 (P=.66; R=0.86), while the performance on inconsistency, indirectness, and publication bias criteria was lower (F1 in the range of 0.3-0.4). The prediction of the overall quality grade into 1 of the 4 levels resulted in 0.5 F1. When casting the task as a binary problem by merging the Grading of Recommendation, Assessment, Development, and Evaluation classes (high+moderate vs low+very low-quality evidence), we attained 0.74 F1. We also found that the results varied depending on the supporting information that is provided as an input to the models. Conclusions: Different factors affect the quality of evidence in the context of systematic reviews of medical evidence. Some of these (risk of bias and imprecision) can be automated with reasonable accuracy. Other quality dimensions such as indirectness, inconsistency, and publication bias prove more challenging for machine learning, largely because they are much rarer. This technology could substantially reduce reviewer workload in the future and expedite quality assessment as part of evidence synthesis. %M 36722350 %R 10.2196/35568 %U https://www.jmir.org/2023/1/e35568 %U https://doi.org/10.2196/35568 %U http://www.ncbi.nlm.nih.gov/pubmed/36722350 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e42822 %T A Data Transformation Methodology to Create Findable, Accessible, Interoperable, and Reusable Health Data: Software Design, Development, and Evaluation Study %A Sinaci,A Anil %A Gencturk,Mert %A Teoman,Huseyin Alper %A Laleci Erturkmen,Gokce Banu %A Alvarez-Romero,Celia %A Martinez-Garcia,Alicia %A Poblador-Plou,Beatriz %A Carmona-Pírez,Jonás %A Löbe,Matthias %A Parra-Calderon,Carlos Luis %+ Software Research & Development and Consultancy Corporation (SRDC), Orta Dogu Teknik Universitesi Teknokent K1-16, Cankaya, 06800, Turkey, 90 3122101763, anil@srdc.com.tr %K Health Level 7 Fast Healthcare Interoperability Resources %K HL7 FHIR %K Findable, Accessible, Interoperable, and Reusable principles %K FAIR principles %K health data sharing %K health data transformation %K secondary use %D 2023 %7 8.3.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Sharing health data is challenging because of several technical, ethical, and regulatory issues. The Findable, Accessible, Interoperable, and Reusable (FAIR) guiding principles have been conceptualized to enable data interoperability. Many studies provide implementation guidelines, assessment metrics, and software to achieve FAIR-compliant data, especially for health data sets. Health Level 7 (HL7) Fast Healthcare Interoperability Resources (FHIR) is a health data content modeling and exchange standard. Objective: Our goal was to devise a new methodology to extract, transform, and load existing health data sets into HL7 FHIR repositories in line with FAIR principles, develop a Data Curation Tool to implement the methodology, and evaluate it on health data sets from 2 different but complementary institutions. We aimed to increase the level of compliance with FAIR principles of existing health data sets through standardization and facilitate health data sharing by eliminating the associated technical barriers. Methods: Our approach automatically processes the capabilities of a given FHIR end point and directs the user while configuring mappings according to the rules enforced by FHIR profile definitions. Code system mappings can be configured for terminology translations through automatic use of FHIR resources. The validity of the created FHIR resources can be automatically checked, and the software does not allow invalid resources to be persisted. At each stage of our data transformation methodology, we used particular FHIR-based techniques so that the resulting data set could be evaluated as FAIR. We performed a data-centric evaluation of our methodology on health data sets from 2 different institutions. Results: Through an intuitive graphical user interface, users are prompted to configure the mappings into FHIR resource types with respect to the restrictions of selected profiles. Once the mappings are developed, our approach can syntactically and semantically transform existing health data sets into HL7 FHIR without loss of data utility according to our privacy-concerned criteria. In addition to the mapped resource types, behind the scenes, we create additional FHIR resources to satisfy several FAIR criteria. According to the data maturity indicators and evaluation methods of the FAIR Data Maturity Model, we achieved the maximum level (level 5) for being Findable, Accessible, and Interoperable and level 3 for being Reusable. Conclusions: We developed and extensively evaluated our data transformation approach to unlock the value of existing health data residing in disparate data silos to make them available for sharing according to the FAIR principles. We showed that our method can successfully transform existing health data sets into HL7 FHIR without loss of data utility, and the result is FAIR in terms of the FAIR Data Maturity Model. We support institutional migration to HL7 FHIR, which not only leads to FAIR data sharing but also eases the integration with different research networks. %M 36884270 %R 10.2196/42822 %U https://www.jmir.org/2023/1/e42822 %U https://doi.org/10.2196/42822 %U http://www.ncbi.nlm.nih.gov/pubmed/36884270 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e41100 %T Supervised Relation Extraction Between Suicide-Related Entities and Drugs: Development and Usability Study of an Annotated PubMed Corpus %A Karapetian,Karina %A Jeon,Soo Min %A Kwon,Jin-Won %A Suh,Young-Kyoon %+ School of Computer Science and Engineering, Kyungpook National University, Rm. 520, IT-5, 80 Daehak-ro, Bukgu, Daegu, 41566, Republic of Korea, 82 53 950 6372, yksuh@knu.ac.kr %K suicide %K adverse drug events %K information extraction %K relation classification %K bidirectional encoder representations from transformers %K pharmacovigilance %K natural language processing %K PubMed %K corpus %K language model %D 2023 %7 8.3.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Drug-induced suicide has been debated as a crucial issue in both clinical and public health research. Published research articles contain valuable data on the drugs associated with suicidal adverse events. An automated process that extracts such information and rapidly detects drugs related to suicide risk is essential but has not been well established. Moreover, few data sets are available for training and validating classification models on drug-induced suicide. Objective: This study aimed to build a corpus of drug-suicide relations containing annotated entities for drugs, suicidal adverse events, and their relations. To confirm the effectiveness of the drug-suicide relation corpus, we evaluated the performance of a relation classification model using the corpus in conjunction with various embeddings. Methods: We collected the abstracts and titles of research articles associated with drugs and suicide from PubMed and manually annotated them along with their relations at the sentence level (adverse drug events, treatment, suicide means, or miscellaneous). To reduce the manual annotation effort, we preliminarily selected sentences with a pretrained zero-shot classifier or sentences containing only drug and suicide keywords. We trained a relation classification model using various Bidirectional Encoder Representations from Transformer embeddings with the proposed corpus. We then compared the performances of the model with different Bidirectional Encoder Representations from Transformer–based embeddings and selected the most suitable embedding for our corpus. Results: Our corpus comprised 11,894 sentences extracted from the titles and abstracts of the PubMed research articles. Each sentence was annotated with drug and suicide entities and the relationship between these 2 entities (adverse drug events, treatment, means, and miscellaneous). All of the tested relation classification models that were fine-tuned on the corpus accurately detected sentences of suicidal adverse events regardless of their pretrained type and data set properties. Conclusions: To our knowledge, this is the first and most extensive corpus of drug-suicide relations. %M 36884281 %R 10.2196/41100 %U https://www.jmir.org/2023/1/e41100 %U https://doi.org/10.2196/41100 %U http://www.ncbi.nlm.nih.gov/pubmed/36884281 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e42231 %T Safety Concerns in Mobility-Assistive Products for Older Adults: Content Analysis of Online Reviews %A Mali,Namrata %A Restrepo,Felipe %A Abrahams,Alan %A Sands,Laura %A Goldberg,David M %A Gruss,Richard %A Zaman,Nohel %A Shields,Wendy %A Omaki,Elise %A Ehsani,Johnathon %A Ractham,Peter %A Kaewkitipong,Laddawan %+ Center of Excellence in Operations and Information Management, Thammasat Business School, Thammasat University, 2 Prachan Rd., Pranakorn, Bangkok, 10200, Thailand, 66 26132200, laddawan@tbs.tu.ac.th %K injury prevention %K consumer-reported injuries %K older adults %K online reviews %K mobility-assistive devices %K product failures %D 2023 %7 2.3.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: Older adults who have difficulty moving around are commonly advised to adopt mobility-assistive devices to prevent injuries. However, limited evidence exists on the safety of these devices. Existing data sources such as the National Electronic Injury Surveillance System tend to focus on injury description rather than the underlying context, thus providing little to no actionable information regarding the safety of these devices. Although online reviews are often used by consumers to assess the safety of products, prior studies have not explored consumer-reported injuries and safety concerns within online reviews of mobility-assistive devices. Objective: This study aimed to investigate injury types and contexts stemming from the use of mobility-assistive devices, as reported by older adults or their caregivers in online reviews. It not only identified injury severities and mobility-assistive device failure pathways but also shed light on the development of safety information and protocols for these products. Methods: Reviews concerning assistive devices were extracted from the “assistive aid” categories, which are typically intended for older adult use, on Amazon’s US website. The extracted reviews were filtered so that only those pertaining to mobility-assistive devices (canes, gait or transfer belts, ramps, walkers or rollators, and wheelchairs or transport chairs) were retained. We conducted large-scale content analysis of these 48,886 retained reviews by coding them according to injury type (no injury, potential future injury, minor injury, and major injury) and injury pathway (device critical component breakage or decoupling; unintended movement; instability; poor, uneven surface handling; and trip hazards). Coding efforts were carried out across 2 separate phases in which the team manually verified all instances coded as minor injury, major injury, or potential future injury and established interrater reliability to validate coding efforts. Results: The content analysis provided a better understanding of the contexts and conditions leading to user injury, as well as the severity of injuries associated with these mobility-assistive devices. Injury pathways—device critical component failures; unintended device movement; poor, uneven surface handling; instability; and trip hazards—were identified for 5 product types (canes, gait and transfer belts, ramps, walkers and rollators, and wheelchairs and transport chairs). Outcomes were normalized per 10,000 posting counts (online reviews) mentioning minor injury, major injury, or potential future injury by product category. Overall, per 10,000 reviews, 240 (2.4%) described mobility-assistive equipment–related user injuries, whereas 2318 (23.18%) revealed potential future injuries. Conclusions: This study highlights mobility-assistive device injury contexts and severities, suggesting that consumers who posted online reviews attribute most serious injuries to a defective item, rather than user misuse. It implies that many mobility-assistive device injuries may be preventable through patient and caregiver education on how to evaluate new and existing equipment for risk of potential future injury. %M 36862459 %R 10.2196/42231 %U https://www.jmir.org/2023/1/e42231 %U https://doi.org/10.2196/42231 %U http://www.ncbi.nlm.nih.gov/pubmed/36862459 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e36477 %T A Machine Learning Approach to Support Urgent Stroke Triage Using Administrative Data and Social Determinants of Health at Hospital Presentation: Retrospective Study %A Chen,Min %A Tan,Xuan %A Padman,Rema %+ The H John Heinz III College of Information Systems and Public Policy, Carnegie Mellon University, 4800 Forbes Avenue, Hamburg Hall 2101D, Pittsburgh, PA, 15213, United States, 1 412 268 2180, rpadman@cmu.edu %K stroke %K diagnosis %K triage %K decision support %K social determinants of health %K prediction %K machine learning %K interpretability %K medical decision-making %K retrospective study %K claims data %D 2023 %7 30.1.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: The key to effective stroke management is timely diagnosis and triage. Machine learning (ML) methods developed to assist in detecting stroke have focused on interpreting detailed clinical data such as clinical notes and diagnostic imaging results. However, such information may not be readily available when patients are initially triaged, particularly in rural and underserved communities. Objective: This study aimed to develop an ML stroke prediction algorithm based on data widely available at the time of patients’ hospital presentations and assess the added value of social determinants of health (SDoH) in stroke prediction. Methods: We conducted a retrospective study of the emergency department and hospitalization records from 2012 to 2014 from all the acute care hospitals in the state of Florida, merged with the SDoH data from the American Community Survey. A case-control design was adopted to construct stroke and stroke mimic cohorts. We compared the algorithm performance and feature importance measures of the ML models (ie, gradient boosting machine and random forest) with those of the logistic regression model based on 3 sets of predictors. To provide insights into the prediction and ultimately assist care providers in decision-making, we used TreeSHAP for tree-based ML models to explain the stroke prediction. Results: Our analysis included 143,203 hospital visits of unique patients, and it was confirmed based on the principal diagnosis at discharge that 73% (n=104,662) of these patients had a stroke. The approach proposed in this study has high sensitivity and is particularly effective at reducing the misdiagnosis of dangerous stroke chameleons (false-negative rate <4%). ML classifiers consistently outperformed the benchmark logistic regression in all 3 input combinations. We found significant consistency across the models in the features that explain their performance. The most important features are age, the number of chronic conditions on admission, and primary payer (eg, Medicare or private insurance). Although both the individual- and community-level SDoH features helped improve the predictive performance of the models, the inclusion of the individual-level SDoH features led to a much larger improvement (area under the receiver operating characteristic curve increased from 0.694 to 0.823) than the inclusion of the community-level SDoH features (area under the receiver operating characteristic curve increased from 0.823 to 0.829). Conclusions: Using data widely available at the time of patients’ hospital presentations, we developed a stroke prediction model with high sensitivity and reasonable specificity. The prediction algorithm uses variables that are routinely collected by providers and payers and might be useful in underresourced hospitals with limited availability of sensitive diagnostic tools or incomplete data-gathering capabilities. %M 36716097 %R 10.2196/36477 %U https://www.jmir.org/2023/1/e36477 %U https://doi.org/10.2196/36477 %U http://www.ncbi.nlm.nih.gov/pubmed/36716097 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 12 %N %P e40565 %T Artificial Intelligence and Precision Health Through Lenses of Ethics and Social Determinants of Health: Protocol for a State-of-the-Art Literature Review %A Wamala-Andersson,Sarah %A Richardson,Matt X %A Landerdahl Stridsberg,Sara %A Ryan,Jillian %A Sukums,Felix %A Goh,Yong-Shian %+ Department of Health and Welfare Technology, School of Health, Care and Social Welfare, Malardalen University, Hamngatan 15, Eskilstuna, 63220, Sweden, 46 0766980150, sarah.wamala.andersson@mdh.se %K artificial intelligence %K clinical outcome %K detection %K diagnosis %K diagnostic %K disease management %K ethical framework %K ethical %K ethics %K health outcome %K health promotion %K literature review %K patient centered %K person centered %K precision health %K precision medicine %K prevention %K review methodology %K search strategy %K social determinant %D 2023 %7 24.1.2023 %9 Protocol %J JMIR Res Protoc %G English %X Background:  Precision health is a rapidly developing field, largely driven by the development of artificial intelligence (AI)–related solutions. AI facilitates complex analysis of numerous health data risk assessment, early detection of disease, and initiation of timely preventative health interventions that can be highly tailored to the individual. Despite such promise, ethical concerns arising from the rapid development and use of AI-related technologies have led to development of national and international frameworks to address responsible use of AI. Objective:  We aimed to address research gaps and provide new knowledge regarding (1) examples of existing AI applications and what role they play regarding precision health, (2) what salient features can be used to categorize them, (3) what evidence exists for their effects on precision health outcomes, (4) how do these AI applications comply with established ethical and responsible framework, and (5) how these AI applications address equity and social determinants of health (SDOH). Methods:  This protocol delineates a state-of-the-art literature review of novel AI-based applications in precision health. Published and unpublished studies were retrieved from 6 electronic databases. Articles included in this study were from the inception of the databases to January 2023. The review will encompass applications that use AI as a primary or supporting system or method when primarily applied for precision health purposes in human populations. It includes any geographical location or setting, including the internet, community-based, and acute or clinical settings, reporting clinical, behavioral, and psychosocial outcomes, including detection-, diagnosis-, promotion-, prevention-, management-, and treatment-related outcomes. Results:   This is step 1 toward a full state-of-the-art literature review with data analyses, results, and discussion of findings, which will also be published. The anticipated consequences on equity from the perspective of SDOH will be analyzed. Keyword cluster relationships and analyses will be visualized to indicate which research foci are leading the development of the field and where research gaps exist. Results will be presented based on the data analysis plan that includes primary analyses, visualization of sources, and secondary analyses. Implications for future research and person-centered public health will be discussed. Conclusions:  Results from the review will potentially guide the continued development of AI applications, future research in reducing the knowledge gaps, and improvement of practice related to precision health. New insights regarding examples of existing AI applications, their salient features, their role regarding precision health, and the existing evidence that exists for their effects on precision health outcomes will be demonstrated. Additionally, a demonstration of how existing AI applications address equity and SDOH and comply with established ethical and responsible frameworks will be provided. International Registered Report Identifier (IRRID): PRR1-10.2196/40565 %M 36692922 %R 10.2196/40565 %U https://www.researchprotocols.org/2023/1/e40565 %U https://doi.org/10.2196/40565 %U http://www.ncbi.nlm.nih.gov/pubmed/36692922 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 11 %N %P e38590 %T Dealing With Missing, Imbalanced, and Sparse Features During the Development of a Prediction Model for Sudden Death Using Emergency Medicine Data: Machine Learning Approach %A Chen,Xiaojie %A Chen,Han %A Nan,Shan %A Kong,Xiangtian %A Duan,Huilong %A Zhu,Haiyan %+ First Medical Center of Chinese People's Liberation Army General Hospital, 28 Fuxing Road, Haidian District, Beijing, 100037, China, 86 13521361644, xiaoyanzibj301@163.com %K emergency medicine %K prediction model %K data preprocessing %K imbalanced data %K missing value interpolation %K sparse features %K clinical informatics %K machine learning %K medical informatics %D 2023 %7 20.1.2023 %9 Original Paper %J JMIR Med Inform %G English %X Background: In emergency departments (EDs), early diagnosis and timely rescue, which are supported by prediction modes using ED data, can increase patients’ chances of survival. Unfortunately, ED data usually contain missing, imbalanced, and sparse features, which makes it challenging to build early identification models for diseases. Objective: This study aims to propose a systematic approach to deal with the problems of missing, imbalanced, and sparse features for developing sudden-death prediction models using emergency medicine (or ED) data. Methods: We proposed a 3-step approach to deal with data quality issues: a random forest (RF) for missing values, k-means for imbalanced data, and principal component analysis (PCA) for sparse features. For continuous and discrete variables, the decision coefficient R2 and the κ coefficient were used to evaluate performance, respectively. The area under the receiver operating characteristic curve (AUROC) and the area under the precision-recall curve (AUPRC) were used to estimate the model’s performance. To further evaluate the proposed approach, we carried out a case study using an ED data set obtained from the Hainan Hospital of Chinese PLA General Hospital. A logistic regression (LR) prediction model for patient condition worsening was built. Results: A total of 1085 patients with rescue records and 17,959 patients without rescue records were selected and significantly imbalanced. We extracted 275, 402, and 891 variables from laboratory tests, medications, and diagnosis, respectively. After data preprocessing, the median R2 of the RF continuous variable interpolation was 0.623 (IQR 0.647), and the median of the κ coefficient for discrete variable interpolation was 0.444 (IQR 0.285). The LR model constructed using the initial diagnostic data showed poor performance and variable separation, which was reflected in the abnormally high odds ratio (OR) values of the 2 variables of cardiac arrest and respiratory arrest (201568034532 and 1211118945, respectively) and an abnormal 95% CI. Using processed data, the recall of the model reached 0.746, the F1-score was 0.73, and the AUROC was 0.708. Conclusions: The proposed systematic approach is valid for building a prediction model for emergency patients. %M 36662548 %R 10.2196/38590 %U https://medinform.jmir.org/2023/1/e38590 %U https://doi.org/10.2196/38590 %U http://www.ncbi.nlm.nih.gov/pubmed/36662548 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 25 %N %P e43521 %T An Exploratory Study of Medical Journal’s Twitter Use: Metadata, Networks, and Content Analyses %A Kim,Donghun %A Jung,Woojin %A Jiang,Ting %A Zhu,Yongjun %+ Department of Library and Information Science, Yonsei University, 50 Yonsei-ro Seodaemun-gu, Seoul, 03722, Republic of Korea, 82 02 2123 2409, zhu@yonsei.ac.kr %K medical journals %K social networks %K Twitter %D 2023 %7 19.1.2023 %9 Original Paper %J J Med Internet Res %G English %X Background: An increasing number of medical journals are using social media to promote themselves and communicate with their readers. However, little is known about how medical journals use Twitter and what their social media management strategies are. Objective: This study aimed to understand how medical journals use Twitter from a global standpoint. We conducted a broad, in-depth analysis of all the available Twitter accounts of medical journals indexed by major indexing services, with a particular focus on their social networks and content. Methods: The Twitter profiles and metadata of medical journals were analyzed along with the social networks on their Twitter accounts. Results: The results showed that overall, publishers used different strategies regarding Twitter adoption, Twitter use patterns, and their subsequent decisions. The following specific findings were noted: journals with Twitter accounts had a significantly higher number of publications and a greater impact than their counterparts; subscription journals had a slightly higher Twitter adoption rate (2%) than open access journals; journals with higher impact had more followers; and prestigious journals rarely followed other lesser-known journals on social media. In addition, an in-depth analysis of 2000 randomly selected tweets from 4 prestigious journals revealed that The Lancet had dedicated considerable effort to communicating with people about health information and fulfilling its social responsibility by organizing committees and activities to engage with a broad range of health-related issues; The New England Journal of Medicine and the Journal of the American Medical Association focused on promoting research articles and attempting to maximize the visibility of their research articles; and the British Medical Journal provided copious amounts of health information and discussed various health-related social problems to increase social awareness of the field of medicine. Conclusions: Our study used various perspectives to investigate how medical journals use Twitter and explored the Twitter management strategies of 4 of the most prestigious journals. Our study provides a detailed understanding of medical journals’ use of Twitter from various perspectives and can help publishers, journals, and researchers to better use Twitter for their respective purposes. %M 36656626 %R 10.2196/43521 %U https://www.jmir.org/2023/1/e43521 %U https://doi.org/10.2196/43521 %U http://www.ncbi.nlm.nih.gov/pubmed/36656626 %0 Journal Article %@ 2291-5222 %I JMIR Publications %V 9 %N 12 %P e31618 %T Identifying Data Quality Dimensions for Person-Generated Wearable Device Data: Multi-Method Study %A Cho,Sylvia %A Weng,Chunhua %A Kahn,Michael G %A Natarajan,Karthik %+ Department of Biomedical Informatics, Columbia University, 622 West 168th Street PH20, New York, NY, 10032, United States, 1 212 305 5334, sc3901@cumc.columbia.edu %K patient-generated health data %K data accuracy %K data quality %K wearable device %K fitness trackers %K qualitative research %D 2021 %7 23.12.2021 %9 Original Paper %J JMIR Mhealth Uhealth %G English %X Background: There is a growing interest in using person-generated wearable device data for biomedical research, but there are also concerns regarding the quality of data such as missing or incorrect data. This emphasizes the importance of assessing data quality before conducting research. In order to perform data quality assessments, it is essential to define what data quality means for person-generated wearable device data by identifying the data quality dimensions. Objective: This study aims to identify data quality dimensions for person-generated wearable device data for research purposes. Methods: This study was conducted in 3 phases: literature review, survey, and focus group discussion. The literature review was conducted following the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guideline to identify factors affecting data quality and its associated data quality challenges. In addition, we conducted a survey to confirm and complement results from the literature review and to understand researchers’ perceptions on data quality dimensions that were previously identified as dimensions for the secondary use of electronic health record (EHR) data. We sent the survey to researchers with experience in analyzing wearable device data. Focus group discussion sessions were conducted with domain experts to derive data quality dimensions for person-generated wearable device data. On the basis of the results from the literature review and survey, a facilitator proposed potential data quality dimensions relevant to person-generated wearable device data, and the domain experts accepted or rejected the suggested dimensions. Results: In total, 19 studies were included in the literature review, and 3 major themes emerged: device- and technical-related, user-related, and data governance–related factors. The associated data quality problems were incomplete data, incorrect data, and heterogeneous data. A total of 20 respondents answered the survey. The major data quality challenges faced by researchers were completeness, accuracy, and plausibility. The importance ratings on data quality dimensions in an existing framework showed that the dimensions for secondary use of EHR data are applicable to person-generated wearable device data. There were 3 focus group sessions with domain experts in data quality and wearable device research. The experts concluded that intrinsic data quality features, such as conformance, completeness, and plausibility, and contextual and fitness-for-use data quality features, such as completeness (breadth and density) and temporal data granularity, are important data quality dimensions for assessing person-generated wearable device data for research purposes. Conclusions: In this study, intrinsic and contextual and fitness-for-use data quality dimensions for person-generated wearable device data were identified. The dimensions were adapted from data quality terminologies and frameworks for the secondary use of EHR data with a few modifications. Further research on how data quality can be assessed with respect to each dimension is needed. %M 34941540 %R 10.2196/31618 %U https://mhealth.jmir.org/2021/12/e31618 %U https://doi.org/10.2196/31618 %U http://www.ncbi.nlm.nih.gov/pubmed/34941540 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 12 %P e25414 %T Lessons Learned: Beta-Testing the Digital Health Checklist for Researchers Prompts a Call to Action by Behavioral Scientists %A Bartlett Ellis,Rebecca %A Wright,Julie %A Miller,Lisa Soederberg %A Jake-Schoffman,Danielle %A Hekler,Eric B %A Goldstein,Carly M %A Arigo,Danielle %A Nebeker,Camille %+ Herbert Wertheim School of Public Health and Longevity Science, University of California San Diego, 9500 Gilman Drive, La Jolla, CA, 92093-0811, United States, 1 858 534 7786, nebeker@eng.ucsd.edu %K digital health %K mHealth %K research ethics %K institutional review board %K IRB %K behavioral medicine %K wearable sensors %K social media %K bioethics %K data management %K usability %K privacy %K access %K risks and benefits %K mobile phone %D 2021 %7 22.12.2021 %9 Viewpoint %J J Med Internet Res %G English %X Digital technologies offer unique opportunities for health research. For example, Twitter posts can support public health surveillance to identify outbreaks (eg, influenza and COVID-19), and a wearable fitness tracker can provide real-time data collection to assess the effectiveness of a behavior change intervention. With these opportunities, it is necessary to consider the potential risks and benefits to research participants when using digital tools or strategies. Researchers need to be involved in the risk assessment process, as many tools in the marketplace (eg, wellness apps, fitness sensors) are underregulated. However, there is little guidance to assist researchers and institutional review boards in their evaluation of digital tools for research purposes. To address this gap, the Digital Health Checklist for Researchers (DHC-R) was developed as a decision support tool. A participatory research approach involving a group of behavioral scientists was used to inform DHC-R development. Scientists beta-tested the checklist by retrospectively evaluating the technologies they had chosen for use in their research. This paper describes the lessons learned because of their involvement in the beta-testing process and concludes with recommendations for how the DHC-R could be useful for a variety of digital health stakeholders. Recommendations focus on future research and policy development to support research ethics, including the development of best practices to advance safe and responsible digital health research. %M 34941548 %R 10.2196/25414 %U https://www.jmir.org/2021/12/e25414 %U https://doi.org/10.2196/25414 %U http://www.ncbi.nlm.nih.gov/pubmed/34941548 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 12 %P e19250 %T Artificial Intelligence–Based Framework for Analyzing Health Care Staff Security Practice: Mapping Review and Simulation Study %A Yeng,Prosper Kandabongee %A Nweke,Livinus Obiora %A Yang,Bian %A Ali Fauzi,Muhammad %A Snekkenes,Einar Arthur %+ Department of Information Security and Communication Technology, Norwegian University of Science and Technology, Teknologivegen 22, Gjovik, 2815, Norway, 47 61135400, prosper.yeng@ntnu.no %K artificial intelligence %K machine learning %K health care %K security practice %K framework %K security %K modeling %K analysis %D 2021 %7 22.12.2021 %9 Review %J JMIR Med Inform %G English %X Background: Blocklisting malicious activities in health care is challenging in relation to access control in health care security practices due to the fear of preventing legitimate access for therapeutic reasons. Inadvertent prevention of legitimate access can contravene the availability trait of the confidentiality, integrity, and availability triad, and may result in worsening health conditions, leading to serious consequences, including deaths. Therefore, health care staff are often provided with a wide range of access such as a “breaking-the-glass” or “self-authorization” mechanism for emergency access. However, this broad access can undermine the confidentiality and integrity of sensitive health care data because breaking-the-glass can lead to vast unauthorized access, which could be problematic when determining illegitimate access in security practices. Objective: A review was performed to pinpoint appropriate artificial intelligence (AI) methods and data sources that can be used for effective modeling and analysis of health care staff security practices. Based on knowledge obtained from the review, a framework was developed and implemented with simulated data to provide a comprehensive approach toward effective modeling and analyzing security practices of health care staff in real access logs. Methods: The flow of our approach was a mapping review to provide AI methods, data sources and their attributes, along with other categories as input for framework development. To assess implementation of the framework, electronic health record (EHR) log data were simulated and analyzed, and the performance of various approaches in the framework was compared. Results: Among the total 130 articles initially identified, 18 met the inclusion and exclusion criteria. A thorough assessment and analysis of the included articles revealed that K-nearest neighbor, Bayesian network, and decision tree (C4.5) algorithms were predominantly applied to EHR and network logs with varying input features of health care staff security practices. Based on the review results, a framework was developed and implemented with simulated logs. The decision tree obtained the best precision of 0.655, whereas the best recall was achieved by the support vector machine (SVM) algorithm at 0.977. However, the best F1-score was obtained by random forest at 0.775. In brief, three classifiers (random forest, decision tree, and SVM) in the two-class approach achieved the best precision of 0.998. Conclusions: The security practices of health care staff can be effectively analyzed using a two-class approach to detect malicious and nonmalicious security practices. Based on our comparative study, the algorithms that can effectively be used in related studies include random forest, decision tree, and SVM. Deviations of security practices from required health care staff’s security behavior in the big data context can be analyzed with real access logs to define appropriate incentives for improving conscious care security practice. %M 34941549 %R 10.2196/19250 %U https://medinform.jmir.org/2021/12/e19250 %U https://doi.org/10.2196/19250 %U http://www.ncbi.nlm.nih.gov/pubmed/34941549 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 5 %N 12 %P e30368 %T The Successful Synchronized Orchestration of an Investigator-Initiated Multicenter Trial Using a Clinical Trial Management System and Team Approach: Design and Utility Study %A Mudaranthakam,Dinesh Pal %A Brown,Alexandra %A Kerling,Elizabeth %A Carlson,Susan E %A Valentine,Christina J %A Gajewski,Byron %+ University of Kansas Medical Center, 3901 Rainbow Blvd, Kansas City, KS, 66160, United States, 1 9139456922, dmudaranthakam@kumc.edu %K data management %K data quality %K metrics %K trial execution %K clinical trials %K cost %K accrual %K accrual inequality %K rare diseases %K healthcare %K health care %K health operations %D 2021 %7 22.12.2021 %9 Original Paper %J JMIR Form Res %G English %X Background: As the cost of clinical trials continues to rise, novel approaches are required to ensure ethical allocation of resources. Multisite trials have been increasingly utilized in phase 1 trials for rare diseases and in phase 2 and 3 trials to meet accrual needs. The benefits of multisite trials include easier patient recruitment, expanded generalizability, and more robust statistical analyses. However, there are several problems more likely to arise in multisite trials, including accrual inequality, protocol nonadherence, data entry mistakes, and data integration difficulties. Objective: The Biostatistics & Data Science department at the University of Kansas Medical Center developed a clinical trial management system (comprehensive research information system [CRIS]) specifically designed to streamline multisite clinical trial management. Methods: A National Institute of Child Health and Human Development–funded phase 3 trial, the ADORE (assessment of docosahexaenoic acid [DHA] on reducing early preterm birth) trial fully utilized CRIS to provide automated accrual reports, centralize data capture, automate trial completion reports, and streamline data harmonization. Results: Using the ADORE trial as an example, we describe the utility of CRIS in database design, regulatory compliance, training standardization, study management, and automated reporting. Our goal is to continue to build a CRIS through use in subsequent multisite trials. Reports generated to suit the needs of future studies will be available as templates. Conclusions: The implementation of similar tools and systems could provide significant cost-saving and operational benefit to multisite trials. Trial Registration: ClinicalTrials.gov NCT02626299; https://tinyurl.com/j6erphcj %M 34941552 %R 10.2196/30368 %U https://formative.jmir.org/2021/12/e30368 %U https://doi.org/10.2196/30368 %U http://www.ncbi.nlm.nih.gov/pubmed/34941552 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 12 %P e34286 %T EpiHacks, a Process for Technologists and Health Experts to Cocreate Optimal Solutions for Disease Prevention and Control: User-Centered Design Approach %A Divi,Nomita %A Smolinski,Mark %+ Ending Pandemics, 870 Market Street, Suite 528, San Francisco, CA, 94102, United States, 1 6173591733, nomita@endingpandemics.org %K epidemiology %K public health %K diagnostic %K tool %K disease surveillance %K technology solution %K innovative approaches to disease surveillance %K One Health %K surveillance %K hack %K innovation %K expert %K solution %K prevention %K control %D 2021 %7 15.12.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: Technology-based innovations that are created collaboratively by local technology specialists and health experts can optimize the addressing of priority needs for disease prevention and control. An EpiHack is a distinct, collaborative approach to developing solutions that combines the science of epidemiology with the format of a hackathon. Since 2013, a total of 12 EpiHacks have collectively brought together over 500 technology and health professionals from 29 countries. Objective: We aimed to define the EpiHack process and summarize the impacts of the technology-based innovations that have been created through this approach. Methods: The key components and timeline of an EpiHack were described in detail. The focus areas, outputs, and impacts of the twelve EpiHacks that were conducted between 2013 and 2021 were summarized. Results: EpiHack solutions have served to improve surveillance for influenza, dengue, and mass gatherings, as well as laboratory sample tracking and One Health surveillance, in rural and urban communities. Several EpiHack tools were scaled during the COVID-19 pandemic to support local governments in conducting active surveillance. All tools were designed to be open source to allow for easy replication and adaptation by other governments or parties. Conclusions: EpiHacks provide an efficient, flexible, and replicable new approach to generating relevant and timely innovations that are locally developed and owned, are scalable, and are sustainable. %M 34807832 %R 10.2196/34286 %U https://www.jmir.org/2021/12/e34286 %U https://doi.org/10.2196/34286 %U http://www.ncbi.nlm.nih.gov/pubmed/34807832 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 12 %P e30970 %T Transformation and Evaluation of the MIMIC Database in the OMOP Common Data Model: Development and Usability Study %A Paris,Nicolas %A Lamer,Antoine %A Parrot,Adrien %+ InterHop, 30 avenue du Maine, Paris, 75015, France, 33 3 20 62 69 69, nicolas.paris@riseup.net %K data reuse %K open data %K OMOP %K common data model %K critical care %K machine learning %K big data %K health informatics %K health data %K health database %K electronic health records %K open access database %K digital health %K intensive care %K health care %D 2021 %7 14.12.2021 %9 Original Paper %J JMIR Med Inform %G English %X Background: In the era of big data, the intensive care unit (ICU) is likely to benefit from real-time computer analysis and modeling based on close patient monitoring and electronic health record data. The Medical Information Mart for Intensive Care (MIMIC) is the first open access database in the ICU domain. Many studies have shown that common data models (CDMs) improve database searching by allowing code, tools, and experience to be shared. The Observational Medical Outcomes Partnership (OMOP) CDM is spreading all over the world. Objective: The objective was to transform MIMIC into an OMOP database and to evaluate the benefits of this transformation for analysts. Methods: We transformed MIMIC (version 1.4.21) into OMOP format (version 5.3.3.1) through semantic and structural mapping. The structural mapping aimed at moving the MIMIC data into the right place in OMOP, with some data transformations. The mapping was divided into 3 phases: conception, implementation, and evaluation. The conceptual mapping aimed at aligning the MIMIC local terminologies to OMOP's standard ones. It consisted of 3 phases: integration, alignment, and evaluation. A documented, tested, versioned, exemplified, and open repository was set up to support the transformation and improvement of the MIMIC community's source code. The resulting data set was evaluated over a 48-hour datathon. Results: With an investment of 2 people for 500 hours, 64% of the data items of the 26 MIMIC tables were standardized into the OMOP CDM and 78% of the source concepts mapped to reference terminologies. The model proved its ability to support community contributions and was well received during the datathon, with 160 participants and 15,000 requests executed with a maximum duration of 1 minute. Conclusions: The resulting MIMIC-OMOP data set is the first MIMIC-OMOP data set available free of charge with real disidentified data ready for replicable intensive care research. This approach can be generalized to any medical field. %M 34904958 %R 10.2196/30970 %U https://medinform.jmir.org/2021/12/e30970 %U https://doi.org/10.2196/30970 %U http://www.ncbi.nlm.nih.gov/pubmed/34904958 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 12 %P e29286 %T Leveraging National Claims and Hospital Big Data: Cohort Study on a Statin-Drug Interaction Use Case %A Bannay,Aurélie %A Bories,Mathilde %A Le Corre,Pascal %A Riou,Christine %A Lemordant,Pierre %A Van Hille,Pascal %A Chazard,Emmanuel %A Dode,Xavier %A Cuggia,Marc %A Bouzillé,Guillaume %+ Inserm, Laboratoire Traitement du Signal et de l'Image - UMR 1099, Centre Hospitalier Universitaire de Rennes, Université de Rennes 1, UFR Santé, laboratoire d'informatique médicale, 2 avenue du Professeur Léon Bernard, Rennes, 35000, France, 33 615711230, guillaume.bouzille@gmail.com %K drug interactions %K statins %K administrative claims %K health care %K big data %K data linking %K data warehousing %D 2021 %7 13.12.2021 %9 Original Paper %J JMIR Med Inform %G English %X Background: Linking different sources of medical data is a promising approach to analyze care trajectories. The aim of the INSHARE (Integrating and Sharing Health Big Data for Research) project was to provide the blueprint for a technological platform that facilitates integration, sharing, and reuse of data from 2 sources: the clinical data warehouse (CDW) of the Rennes academic hospital, called eHOP (entrepôt Hôpital), and a data set extracted from the French national claim data warehouse (Système National des Données de Santé [SNDS]). Objective: This study aims to demonstrate how the INSHARE platform can support big data analytic tasks in the health field using a pharmacovigilance use case based on statin consumption and statin-drug interactions. Methods: A Spark distributed cluster-computing framework was used for the record linkage procedure and all analyses. A semideterministic record linkage method based on the common variables between the chosen data sources was developed to identify all patients discharged after at least one hospital stay at the Rennes academic hospital between 2015 and 2017. The use-case study focused on a cohort of patients treated with statins prescribed by their general practitioner or during their hospital stay. Results: The whole process (record linkage procedure and use-case analyses) required 88 minutes. Of the 161,532 and 164,316 patients from the SNDS and eHOP CDW data sets, respectively, 159,495 patients were successfully linked (98.74% and 97.07% of patients from SNDS and eHOP CDW, respectively). Of the 16,806 patients with at least one statin delivery, 8293 patients started the consumption before and continued during the hospital stay, 6382 patients stopped statin consumption at hospital admission, and 2131 patients initiated statins in hospital. Statin-drug interactions occurred more frequently during hospitalization than in the community (3800/10,424, 36.45% and 3253/14,675, 22.17%, respectively; P<.001). Only 121 patients had the most severe level of statin-drug interaction. Hospital stay burden (length of stay and in-hospital mortality) was more severe in patients with statin-drug interactions during hospitalization. Conclusions: This study demonstrates the added value of combining and reusing clinical and claim data to provide large-scale measures of drug-drug interaction prevalence and care pathways outside hospitals. It builds a path to move the current health care system toward a Learning Health System using knowledge generated from research on real-world health data. %M 34898457 %R 10.2196/29286 %U https://medinform.jmir.org/2021/12/e29286 %U https://doi.org/10.2196/29286 %U http://www.ncbi.nlm.nih.gov/pubmed/34898457 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 12 %P e32698 %T A BERT-Based Generation Model to Transform Medical Texts to SQL Queries for Electronic Medical Records: Model Development and Validation %A Pan,Youcheng %A Wang,Chenghao %A Hu,Baotian %A Xiang,Yang %A Wang,Xiaolong %A Chen,Qingcai %A Chen,Junjie %A Du,Jingcheng %+ Intelligent Computing Research Center, Harbin Institute of Technology, No. 6, Pingshan 1st Road, Shenzhen, 518055, China, 86 136 9164 0856, hubaotian@hit.edu.cn %K electronic medical record %K text-to-SQL generation %K BERT %K grammar-based decoding %K tree-structured intermediate representation %D 2021 %7 8.12.2021 %9 Original Paper %J JMIR Med Inform %G English %X Background: Electronic medical records (EMRs) are usually stored in relational databases that require SQL queries to retrieve information of interest. Effectively completing such queries can be a challenging task for medical experts due to the barriers in expertise. Existing text-to-SQL generation studies have not been fully embraced in the medical domain. Objective: The objective of this study was to propose a neural generation model that can jointly consider the characteristics of medical text and the SQL structure to automatically transform medical texts to SQL queries for EMRs. Methods: We proposed a medical text–to-SQL model (MedTS), which employed a pretrained Bidirectional Encoder Representations From Transformers model as the encoder and leveraged a grammar-based long short-term memory network as the decoder to predict the intermediate representation that can easily be transformed into the final SQL query. We adopted the syntax tree as the intermediate representation rather than directly regarding the SQL query as an ordinary word sequence, which is more in line with the tree-structure nature of SQL and can also effectively reduce the search space during generation. Experiments were conducted on the MIMICSQL dataset, and 5 competitor methods were compared. Results: Experimental results demonstrated that MedTS achieved the accuracy of 0.784 and 0.899 on the test set in terms of logic form and execution, respectively, which significantly outperformed the existing state-of-the-art methods. Further analyses proved that the performance on each component of the generated SQL was relatively balanced and offered substantial improvements. Conclusions: The proposed MedTS was effective and robust for improving the performance of medical text–to-SQL generation, indicating strong potential to be applied in the real medical scenario. %M 34889749 %R 10.2196/32698 %U https://medinform.jmir.org/2021/12/e32698 %U https://doi.org/10.2196/32698 %U http://www.ncbi.nlm.nih.gov/pubmed/34889749 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 12 %P e25022 %T On Missingness Features in Machine Learning Models for Critical Care: Observational Study %A Singh,Janmajay %A Sato,Masahiro %A Ohkuma,Tomoko %+ Fuji Xerox Co, Ltd, 6 Chome-1-1 Minatomirai, Nishi Ward, Yokohama, 220-0012, Japan, 81 7041120526, janmajaysingh14@gmail.com %K electronic health records %K informative missingness %K machine learning %K missing data %K hospital mortality %K sepsis %D 2021 %7 8.12.2021 %9 Original Paper %J JMIR Med Inform %G English %X Background: Missing data in electronic health records is inevitable and considered to be nonrandom. Several studies have found that features indicating missing patterns (missingness) encode useful information about a patient’s health and advocate for their inclusion in clinical prediction models. But their effectiveness has not been comprehensively evaluated. Objective: The goal of the research is to study the effect of including informative missingness features in machine learning models for various clinically relevant outcomes and explore robustness of these features across patient subgroups and task settings. Methods: A total of 48,336 electronic health records from the 2012 and 2019 PhysioNet Challenges were used, and mortality, length of stay, and sepsis outcomes were chosen. The latter dataset was multicenter, allowing external validation. Gated recurrent units were used to learn sequential patterns in the data and classify or predict labels of interest. Models were evaluated on various criteria and across population subgroups evaluating discriminative ability and calibration. Results: Generally improved model performance in retrospective tasks was observed on including missingness features. Extent of improvement depended on the outcome of interest (area under the curve of the receiver operating characteristic [AUROC] improved from 1.2% to 7.7%) and even patient subgroup. However, missingness features did not display utility in a simulated prospective setting, being outperformed (0.9% difference in AUROC) by the model relying only on pathological features. This was despite leading to earlier detection of disease (true positives), since including these features led to a concomitant rise in false positive detections. Conclusions: This study comprehensively evaluated effectiveness of missingness features on machine learning models. A detailed understanding of how these features affect model performance may lead to their informed use in clinical settings especially for administrative tasks like length of stay prediction where they present the greatest benefit. While missingness features, representative of health care processes, vary greatly due to intra- and interhospital factors, they may still be used in prediction models for clinically relevant outcomes. However, their use in prospective models producing frequent predictions needs to be explored further. %M 34889756 %R 10.2196/25022 %U https://medinform.jmir.org/2021/12/e25022 %U https://doi.org/10.2196/25022 %U http://www.ncbi.nlm.nih.gov/pubmed/34889756 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 5 %N 12 %P e20767 %T Using Artificial Neural Network Condensation to Facilitate Adaptation of Machine Learning in Medical Settings by Reducing Computational Burden: Model Design and Evaluation Study %A Liu,Dianbo %A Zheng,Ming %A Sepulveda,Nestor Andres %+ Department of Biomedical Informatics, Harvard University, 1 Autumn Street, Boston, MA, 02215, United States, 1 6177101859, dianbo@mit.edu %K artificial neural network %K electronic medical records %K parameter pruning %K machine learning %K computational burden %K %D 2021 %7 8.12.2021 %9 Original Paper %J JMIR Form Res %G English %X Background: Machine learning applications in the health care domain can have a great impact on people’s lives. At the same time, medical data is usually big, requiring a significant number of computational resources. Although this might not be a problem for the wide adoption of machine learning tools in high-income countries, the availability of computational resources can be limited in low-income countries and on mobile devices. This can limit many people from benefiting from the advancement in machine learning applications in the field of health care. Objective: In this study, we explore three methods to increase the computational efficiency and reduce model sizes of either recurrent neural networks (RNNs) or feedforward deep neural networks (DNNs) without compromising their accuracy. Methods: We used inpatient mortality prediction as our case analysis upon review of an intensive care unit dataset. We reduced the size of RNN and DNN by applying pruning of “unused” neurons. Additionally, we modified the RNN structure by adding a hidden layer to the RNN cell but reducing the total number of recurrent layers to accomplish a reduction of the total parameters used in the network. Finally, we implemented quantization on DNN by forcing the weights to be 8 bits instead of 32 bits. Results: We found that all methods increased implementation efficiency, including training speed, memory size, and inference speed, without reducing the accuracy of mortality prediction. Conclusions: Our findings suggest that neural network condensation allows for the implementation of sophisticated neural network algorithms on devices with lower computational resources. %M 34889747 %R 10.2196/20767 %U https://formative.jmir.org/2021/12/e20767 %U https://doi.org/10.2196/20767 %U http://www.ncbi.nlm.nih.gov/pubmed/34889747 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 11 %P e30308 %T The Collaborative Metadata Repository (CoMetaR) Web App: Quantitative and Qualitative Usability Evaluation %A Stöhr,Mark R %A Günther,Andreas %A Majeed,Raphael W %+ Justus-Liebig-University Giessen, Universities of Giessen and Marburg Lung Center (UGMLC), German Center for Lung Research (DZL), Klinikstraße 36, Gießen, 35392, Germany, 49 641 985 42117, mark.stoehr@innere.med.uni-giessen.de %K usability %K metadata %K data visualization %K semantic web %K data management %K data warehousing %K communication barriers %K quality improvement %K biological ontologies %K data curation %D 2021 %7 29.11.2021 %9 Original Paper %J JMIR Med Inform %G English %X Background: In the field of medicine and medical informatics, the importance of comprehensive metadata has long been recognized, and the composition of metadata has become its own field of profession and research. To ensure sustainable and meaningful metadata are maintained, standards and guidelines such as the FAIR (Findability, Accessibility, Interoperability, Reusability) principles have been published. The compilation and maintenance of metadata is performed by field experts supported by metadata management apps. The usability of these apps, for example, in terms of ease of use, efficiency, and error tolerance, crucially determines their benefit to those interested in the data. Objective: This study aims to provide a metadata management app with high usability that assists scientists in compiling and using rich metadata. We aim to evaluate our recently developed interactive web app for our collaborative metadata repository (CoMetaR). This study reflects how real users perceive the app by assessing usability scores and explicit usability issues. Methods: We evaluated the CoMetaR web app by measuring the usability of 3 modules: core module, provenance module, and data integration module. We defined 10 tasks in which users must acquire information specific to their user role. The participants were asked to complete the tasks in a live web meeting. We used the System Usability Scale questionnaire to measure the usability of the app. For qualitative analysis, we applied a modified think aloud method with the following thematic analysis and categorization into the ISO 9241-110 usability categories. Results: A total of 12 individuals participated in the study. We found that over 97% (85/88) of all the tasks were completed successfully. We measured usability scores of 81, 81, and 72 for the 3 evaluated modules. The qualitative analysis resulted in 24 issues with the app. Conclusions: A usability score of 81 implies very good usability for the 2 modules, whereas a usability score of 72 still indicates acceptable usability for the third module. We identified 24 issues that serve as starting points for further development. Our method proved to be effective and efficient in terms of effort and outcome. It can be adapted to evaluate apps within the medical informatics field and potentially beyond. %M 34847059 %R 10.2196/30308 %U https://medinform.jmir.org/2021/11/e30308 %U https://doi.org/10.2196/30308 %U http://www.ncbi.nlm.nih.gov/pubmed/34847059 %0 Journal Article %@ 2561-326X %I JMIR Publications %V 5 %N 11 %P e33124 %T Toward Automated Data Extraction According to Tabular Data Structure: Cross-sectional Pilot Survey of the Comparative Clinical Literature %A Holub,Karl %A Hardy,Nicole %A Kallmes,Kevin %+ Nested Knowledge, 1430 Avon Street N, St. Paul, MN, 55117, United States, 1 507 271 7051, kevinkallmes@supedit.com %K table structure %K systematic review %K automated data extraction %K data reporting conventions %K clinical comparative data %K data elements %K statistic formats %D 2021 %7 24.11.2021 %9 Review %J JMIR Form Res %G English %X Background: Systematic reviews depend on time-consuming extraction of data from the PDFs of underlying studies. To date, automation efforts have focused on extracting data from the text, and no approach has yet succeeded in fully automating ingestion of quantitative evidence. However, the majority of relevant data is generally presented in tables, and the tabular structure is more amenable to automated extraction than free text. Objective: The purpose of this study was to classify the structure and format of descriptive statistics reported in tables in the comparative medical literature. Methods: We sampled 100 published randomized controlled trials from 2019 based on a search in PubMed; these results were imported to the AutoLit platform. Studies were excluded if they were nonclinical, noncomparative, not in English, protocols, or not available in full text. In AutoLit, tables reporting baseline or outcome data in all studies were characterized based on reporting practices. Measurement context, meaning the structure in which the interventions of interest, patient arm breakdown, measurement time points, and data element descriptions were presented, was classified based on the number of contextual pieces and metadata reported. The statistic formats for reported metrics (specific instances of reporting of data elements) were then classified by location and broken down into reporting strategies for continuous, dichotomous, and categorical metrics. Results: We included 78 of 100 sampled studies, one of which (1.3%) did not report data elements in tables. The remaining 77 studies reported baseline and outcome data in 174 tables, and 96% (69/72) of these tables broke down reporting by patient arms. Fifteen structures were found for the reporting of measurement context, which were broadly grouped into: 1×1 contexts, where two pieces of context are reported in total (eg, arms in columns, data elements in rows); 2×1 contexts, where two pieces of context are given on row headers (eg, time points in columns, arms nested in data elements on rows); and 1×2 contexts, where two pieces of context are given on column headers. The 1×1 contexts were present in 57% of tables (99/174), compared to 20% (34/174) for 2×1 contexts and 15% (26/174) for 1×2 contexts; the remaining 8% (15/174) used unique/other stratification methods. Statistic formats were reported in the headers or descriptions of 84% (65/74) of studies. Conclusions: In this cross-sectional pilot review, we found a high density of information in tables, but with major heterogeneity in presentation of measurement context. The highest-density studies reported both baseline and outcome measures in tables, with arm-level breakout, intervention labels, and arm sizes present, and reported both the statistic formats and units. The measurement context formats presented here, broadly classified into three classes that cover 92% (71/78) of studies, form a basis for understanding the frequency of different reporting styles, supporting automated detection of the data format for extraction of metrics. %M 34821562 %R 10.2196/33124 %U https://formative.jmir.org/2021/11/e33124 %U https://doi.org/10.2196/33124 %U http://www.ncbi.nlm.nih.gov/pubmed/34821562 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 10 %N 11 %P e31750 %T Approaches and Criteria for Provenance in Biomedical Data Sets and Workflows: Protocol for a Scoping Review %A Gierend,Kerstin %A Krüger,Frank %A Waltemath,Dagmar %A Fünfgeld,Maximilian %A Ganslandt,Thomas %A Zeleke,Atinkut Alamirrew %+ Department of Biomedical Informatics at the Center for Preventive Medicine and Digital Health, Medical Faculty Mannheim, Heidelberg University, Theodor-Kutzer-Ufer 1-3, Mannheim, 68167, Germany, 49 0621 383 ext 8087, kerstin.gierend@medma.uni-heidelberg.de %K provenance %K biomedical %K workflow %K data sharing %K lineage %K scoping review %K data genesis %K scientific data %K digital objects %K healthcare data %D 2021 %7 22.11.2021 %9 Protocol %J JMIR Res Protoc %G English %X Background: Provenance supports the understanding of data genesis, and it is a key factor to ensure the trustworthiness of digital objects containing (sensitive) scientific data. Provenance information contributes to a better understanding of scientific results and fosters collaboration on existing data as well as data sharing. This encompasses defining comprehensive concepts and standards for transparency and traceability, reproducibility, validity, and quality assurance during clinical and scientific data workflows and research. Objective: The aim of this scoping review is to investigate existing evidence regarding approaches and criteria for provenance tracking as well as disclosing current knowledge gaps in the biomedical domain. This review covers modeling aspects as well as metadata frameworks for meaningful and usable provenance information during creation, collection, and processing of (sensitive) scientific biomedical data. This review also covers the examination of quality aspects of provenance criteria. Methods: This scoping review will follow the methodological framework by Arksey and O'Malley. Relevant publications will be obtained by querying PubMed and Web of Science. All papers in English language will be included, published between January 1, 2006 and March 23, 2021. Data retrieval will be accompanied by manual search for grey literature. Potential publications will then be exported into a reference management software, and duplicates will be removed. Afterwards, the obtained set of papers will be transferred into a systematic review management tool. All publications will be screened, extracted, and analyzed: title and abstract screening will be carried out by 4 independent reviewers. Majority vote is required for consent to eligibility of papers based on the defined inclusion and exclusion criteria. Full-text reading will be performed independently by 2 reviewers and in the last step, key information will be extracted on a pretested template. If agreement cannot be reached, the conflict will be resolved by a domain expert. Charted data will be analyzed by categorizing and summarizing the individual data items based on the research questions. Tabular or graphical overviews will be given, if applicable. Results: The reporting follows the extension of the Preferred Reporting Items for Systematic reviews and Meta-Analyses statements for Scoping Reviews. Electronic database searches in PubMed and Web of Science resulted in 469 matches after deduplication. As of September 2021, the scoping review is in the full-text screening stage. The data extraction using the pretested charting template will follow the full-text screening stage. We expect the scoping review report to be completed by February 2022. Conclusions: Information about the origin of healthcare data has a major impact on the quality and the reusability of scientific results as well as follow-up activities. This protocol outlines plans for a scoping review that will provide information about current approaches, challenges, or knowledge gaps with provenance tracking in biomedical sciences. International Registered Report Identifier (IRRID): DERR1-10.2196/31750 %M 34813494 %R 10.2196/31750 %U https://www.researchprotocols.org/2021/11/e31750 %U https://doi.org/10.2196/31750 %U http://www.ncbi.nlm.nih.gov/pubmed/34813494 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 11 %P e29176 %T An Open-Source, Standard-Compliant, and Mobile Electronic Data Capture System for Medical Research (OpenEDC): Design and Evaluation Study %A Greulich,Leonard %A Hegselmann,Stefan %A Dugas,Martin %+ Institute of Medical Informatics, University of Münster, Albert-Schweitzer-Campus 1, Building A11, Münster, 48149, Germany, 49 15905368729, leonard.greulich@uni-muenster.de %K electronic data capture %K open science %K data interoperability %K metadata reuse %K mobile health %K data standard %K mobile phone %D 2021 %7 19.11.2021 %9 Original Paper %J JMIR Med Inform %G English %X Background: Medical research and machine learning for health care depend on high-quality data. Electronic data capture (EDC) systems have been widely adopted for metadata-driven digital data collection. However, many systems use proprietary and incompatible formats that inhibit clinical data exchange and metadata reuse. In addition, the configuration and financial requirements of typical EDC systems frequently prevent small-scale studies from benefiting from their inherent advantages. Objective: The aim of this study is to develop and publish an open-source EDC system that addresses these issues. We aim to plan a system that is applicable to a wide range of research projects. Methods: We conducted a literature-based requirements analysis to identify the academic and regulatory demands for digital data collection. After designing and implementing OpenEDC, we performed a usability evaluation to obtain feedback from users. Results: We identified 20 frequently stated requirements for EDC. According to the International Organization for Standardization/International Electrotechnical Commission (ISO/IEC) 25010 norm, we categorized the requirements into functional suitability, availability, compatibility, usability, and security. We developed OpenEDC based on the regulatory-compliant Clinical Data Interchange Standards Consortium Operational Data Model (CDISC ODM) standard. Mobile device support enables the collection of patient-reported outcomes. OpenEDC is publicly available and released under the MIT open-source license. Conclusions: Adopting an established standard without modifications supports metadata reuse and clinical data exchange, but it limits item layouts. OpenEDC is a stand-alone web app that can be used without a setup or configuration. This should foster compatibility between medical research and open science. OpenEDC is targeted at observational and translational research studies by clinicians. %M 34806987 %R 10.2196/29176 %U https://medinform.jmir.org/2021/11/e29176 %U https://doi.org/10.2196/29176 %U http://www.ncbi.nlm.nih.gov/pubmed/34806987 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 11 %P e34493 %T Sensor Data Integration: A New Cross-Industry Collaboration to Articulate Value, Define Needs, and Advance a Framework for Best Practices %A Clay,Ieuan %A Angelopoulos,Christian %A Bailey,Anne Lord %A Blocker,Aaron %A Carini,Simona %A Carvajal,Rodrigo %A Drummond,David %A McManus,Kimberly F %A Oakley-Girvan,Ingrid %A Patel,Krupal B %A Szepietowski,Phillip %A Goldsack,Jennifer C %+ Digital Medicine Society (DiMe), 90 Canal Street, Boston, MA, 02114, United States, 1 765 234 3463, ieuan@dimesociety.org %K digital measures %K data integration %K patient centricity %K utility %D 2021 %7 9.11.2021 %9 Viewpoint %J J Med Internet Res %G English %X Data integration, the processes by which data are aggregated, combined, and made available for use, has been key to the development and growth of many technological solutions. In health care, we are experiencing a revolution in the use of sensors to collect data on patient behaviors and experiences. Yet, the potential of this data to transform health outcomes is being held back. Deficits in standards, lexicons, data rights, permissioning, and security have been well documented, less so the cultural adoption of sensor data integration as a priority for large-scale deployment and impact on patient lives. The use and reuse of trustworthy data to make better and faster decisions across drug development and care delivery will require an understanding of all stakeholder needs and best practices to ensure these needs are met. The Digital Medicine Society is launching a new multistakeholder Sensor Data Integration Tour of Duty to address these challenges and more, providing a clear direction on how sensor data can fulfill its potential to enhance patient lives. %M 34751656 %R 10.2196/34493 %U https://www.jmir.org/2021/11/e34493 %U https://doi.org/10.2196/34493 %U http://www.ncbi.nlm.nih.gov/pubmed/34751656 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 11 %P e26914 %T Local Differential Privacy in the Medical Domain to Protect Sensitive Information: Algorithm Development and Real-World Validation %A Sung,MinDong %A Cha,Dongchul %A Park,Yu Rang %+ Department of Biomedical Systems Informatics, Yonsei University College of Medicine, Yonsei-ro 50-1, Seoul, 03722, Republic of Korea, 82 2 228 2363, yurangpark@yuhs.ac %K privacy-preserving %K differential privacy %K medical informatics %K medical data %K privacy %K electronic health record %K algorithm %K development %K validation %K big data %K medical data %K feasibility %K machine learning %K synthetic data %D 2021 %7 8.11.2021 %9 Original Paper %J JMIR Med Inform %G English %X Background: Privacy is of increasing interest in the present big data era, particularly the privacy of medical data. Specifically, differential privacy has emerged as the standard method for preservation of privacy during data analysis and publishing. Objective: Using machine learning techniques, we applied differential privacy to medical data with diverse parameters and checked the feasibility of our algorithms with synthetic data as well as the balance between data privacy and utility. Methods: All data were normalized to a range between –1 and 1, and the bounded Laplacian method was applied to prevent the generation of out-of-bound values after applying the differential privacy algorithm. To preserve the cardinality of the categorical variables, we performed postprocessing via discretization. The algorithm was evaluated using both synthetic and real-world data (from the eICU Collaborative Research Database). We evaluated the difference between the original data and the perturbated data using misclassification rates and the mean squared error for categorical data and continuous data, respectively. Further, we compared the performance of classification models that predict in-hospital mortality using real-world data. Results: The misclassification rate of categorical variables ranged between 0.49 and 0.85 when the value of ε was 0.1, and it converged to 0 as ε increased. When ε was between 102 and 103, the misclassification rate rapidly dropped to 0. Similarly, the mean squared error of the continuous variables decreased as ε increased. The performance of the model developed from perturbed data converged to that of the model developed from original data as ε increased. In particular, the accuracy of a random forest model developed from the original data was 0.801, and this value ranged from 0.757 to 0.81 when ε was 10-1 and 104, respectively. Conclusions: We applied local differential privacy to medical domain data, which are diverse and high dimensional. Higher noise may offer enhanced privacy, but it simultaneously hinders utility. We should choose an appropriate degree of noise for data perturbation to balance privacy and utility depending on specific situations. %M 34747711 %R 10.2196/26914 %U https://medinform.jmir.org/2021/11/e26914 %U https://doi.org/10.2196/26914 %U http://www.ncbi.nlm.nih.gov/pubmed/34747711 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 10 %P e29871 %T Data Anonymization for Pervasive Health Care: Systematic Literature Mapping Study %A Zuo,Zheming %A Watson,Matthew %A Budgen,David %A Hall,Robert %A Kennelly,Chris %A Al Moubayed,Noura %+ Department of Computer Science, Durham University, Lower Mountjoy, South Rd, Durham, DH1 3LE, United Kingdom, 44 1913341749, Noura.al-moubayed@durham.ac.uk %K healthcare %K privacy-preserving %K GDPR %K DPA 2018 %K EHR %K SLM %K data science %K anonymization %K reidentification risk %K usability %D 2021 %7 15.10.2021 %9 Review %J JMIR Med Inform %G English %X Background: Data science offers an unparalleled opportunity to identify new insights into many aspects of human life with recent advances in health care. Using data science in digital health raises significant challenges regarding data privacy, transparency, and trustworthiness. Recent regulations enforce the need for a clear legal basis for collecting, processing, and sharing data, for example, the European Union’s General Data Protection Regulation (2016) and the United Kingdom’s Data Protection Act (2018). For health care providers, legal use of the electronic health record (EHR) is permitted only in clinical care cases. Any other use of the data requires thoughtful considerations of the legal context and direct patient consent. Identifiable personal and sensitive information must be sufficiently anonymized. Raw data are commonly anonymized to be used for research purposes, with risk assessment for reidentification and utility. Although health care organizations have internal policies defined for information governance, there is a significant lack of practical tools and intuitive guidance about the use of data for research and modeling. Off-the-shelf data anonymization tools are developed frequently, but privacy-related functionalities are often incomparable with regard to use in different problem domains. In addition, tools to support measuring the risk of the anonymized data with regard to reidentification against the usefulness of the data exist, but there are question marks over their efficacy. Objective: In this systematic literature mapping study, we aim to alleviate the aforementioned issues by reviewing the landscape of data anonymization for digital health care. Methods: We used Google Scholar, Web of Science, Elsevier Scopus, and PubMed to retrieve academic studies published in English up to June 2020. Noteworthy gray literature was also used to initialize the search. We focused on review questions covering 5 bottom-up aspects: basic anonymization operations, privacy models, reidentification risk and usability metrics, off-the-shelf anonymization tools, and the lawful basis for EHR data anonymization. Results: We identified 239 eligible studies, of which 60 were chosen for general background information; 16 were selected for 7 basic anonymization operations; 104 covered 72 conventional and machine learning–based privacy models; four and 19 papers included seven and 15 metrics, respectively, for measuring the reidentification risk and degree of usability; and 36 explored 20 data anonymization software tools. In addition, we also evaluated the practical feasibility of performing anonymization on EHR data with reference to their usability in medical decision-making. Furthermore, we summarized the lawful basis for delivering guidance on practical EHR data anonymization. Conclusions: This systematic literature mapping study indicates that anonymization of EHR data is theoretically achievable; yet, it requires more research efforts in practical implementations to balance privacy preservation and usability to ensure more reliable health care applications. %M 34652278 %R 10.2196/29871 %U https://medinform.jmir.org/2021/10/e29871 %U https://doi.org/10.2196/29871 %U http://www.ncbi.nlm.nih.gov/pubmed/34652278 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 10 %P e30697 %T The National COVID Cohort Collaborative: Analyses of Original and Computationally Derived Electronic Health Record Data %A Foraker,Randi %A Guo,Aixia %A Thomas,Jason %A Zamstein,Noa %A Payne,Philip RO %A Wilcox,Adam %A , %+ Division of General Medical Sciences, School of Medicine, Washington University in St. Louis, 600 S. Taylor Avenue, Suite 102, Campus Box 8102, St. Louis, MO, 63110, United States, 1 314 273 2211, randi.foraker@wustl.edu %K synthetic data %K protected health information %K COVID-19 %K electronic health records and systems %K data analysis %D 2021 %7 4.10.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: Computationally derived (“synthetic”) data can enable the creation and analysis of clinical, laboratory, and diagnostic data as if they were the original electronic health record data. Synthetic data can support data sharing to answer critical research questions to address the COVID-19 pandemic. Objective: We aim to compare the results from analyses of synthetic data to those from original data and assess the strengths and limitations of leveraging computationally derived data for research purposes. Methods: We used the National COVID Cohort Collaborative’s instance of MDClone, a big data platform with data-synthesizing capabilities (MDClone Ltd). We downloaded electronic health record data from 34 National COVID Cohort Collaborative institutional partners and tested three use cases, including (1) exploring the distributions of key features of the COVID-19–positive cohort; (2) training and testing predictive models for assessing the risk of admission among these patients; and (3) determining geospatial and temporal COVID-19–related measures and outcomes, and constructing their epidemic curves. We compared the results from synthetic data to those from original data using traditional statistics, machine learning approaches, and temporal and spatial representations of the data. Results: For each use case, the results of the synthetic data analyses successfully mimicked those of the original data such that the distributions of the data were similar and the predictive models demonstrated comparable performance. Although the synthetic and original data yielded overall nearly the same results, there were exceptions that included an odds ratio on either side of the null in multivariable analyses (0.97 vs 1.01) and differences in the magnitude of epidemic curves constructed for zip codes with low population counts. Conclusions: This paper presents the results of each use case and outlines key considerations for the use of synthetic data, examining their role in collaborative research for faster insights. %M 34559671 %R 10.2196/30697 %U https://www.jmir.org/2021/10/e30697 %U https://doi.org/10.2196/30697 %U http://www.ncbi.nlm.nih.gov/pubmed/34559671 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 9 %P e15739 %T Exploring the Use of Genomic and Routinely Collected Data: Narrative Literature Review and Interview Study %A Daniels,Helen %A Jones,Kerina Helen %A Heys,Sharon %A Ford,David Vincent %+ Population Data Science, Swansea University, Singleton Park, Swansea, SA2 8PP, United Kingdom, 44 01792606572, h.daniels@swansea.ac.uk %K genomic data %K routine data %K electronic health records %K health data science %K genome %K data regulation %K case study %K eHealth %D 2021 %7 24.9.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: Advancing the use of genomic data with routinely collected health data holds great promise for health care and research. Increasing the use of these data is a high priority to understand and address the causes of disease. Objective: This study aims to provide an outline of the use of genomic data alongside routinely collected data in health research to date. As this field prepares to move forward, it is important to take stock of the current state of play in order to highlight new avenues for development, identify challenges, and ensure that adequate data governance models are in place for safe and socially acceptable progress. Methods: We conducted a literature review to draw information from past studies that have used genomic and routinely collected data and conducted interviews with individuals who use these data for health research. We collected data on the following: the rationale of using genomic data in conjunction with routinely collected data, types of genomic and routinely collected data used, data sources, project approvals, governance and access models, and challenges encountered. Results: The main purpose of using genomic and routinely collected data was to conduct genome-wide and phenome-wide association studies. Routine data sources included electronic health records, disease and death registries, health insurance systems, and deprivation indices. The types of genomic data included polygenic risk scores, single nucleotide polymorphisms, and measures of genetic activity, and biobanks generally provided these data. Although the literature search showed that biobanks released data to researchers, the case studies revealed a growing tendency for use within a data safe haven. Challenges of working with these data revolved around data collection, data storage, technical, and data privacy issues. Conclusions: Using genomic and routinely collected data holds great promise for progressing health research. Several challenges are involved, particularly in terms of privacy. Overcoming these barriers will ensure that the use of these data to progress health research can be exploited to its full potential. %M 34559060 %R 10.2196/15739 %U https://www.jmir.org/2021/9/e15739 %U https://doi.org/10.2196/15739 %U http://www.ncbi.nlm.nih.gov/pubmed/34559060 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 8 %P e28229 %T A Fine-Tuned Bidirectional Encoder Representations From Transformers Model for Food Named-Entity Recognition: Algorithm Development and Validation %A Stojanov,Riste %A Popovski,Gorjan %A Cenikj,Gjorgjina %A Koroušić Seljak,Barbara %A Eftimov,Tome %+ Computer Systems Department, Jožef Stefan Institute, Jamova Cesta 39, Ljubljana, 1000, Slovenia, 386 1 477 3386, tome.eftimov@ijs.si %K food information extraction %K named-entity recognition %K fine-tuning BERT %K semantic annotation %K information extraction %K BERT %K bidirectional encoder representations from transformers %K natural language processing %K machine learning %D 2021 %7 9.8.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: Recently, food science has been garnering a lot of attention. There are many open research questions on food interactions, as one of the main environmental factors, with other health-related entities such as diseases, treatments, and drugs. In the last 2 decades, a large amount of work has been done in natural language processing and machine learning to enable biomedical information extraction. However, machine learning in food science domains remains inadequately resourced, which brings to attention the problem of developing methods for food information extraction. There are only few food semantic resources and few rule-based methods for food information extraction, which often depend on some external resources. However, an annotated corpus with food entities along with their normalization was published in 2019 by using several food semantic resources. Objective: In this study, we investigated how the recently published bidirectional encoder representations from transformers (BERT) model, which provides state-of-the-art results in information extraction, can be fine-tuned for food information extraction. Methods: We introduce FoodNER, which is a collection of corpus-based food named-entity recognition methods. It consists of 15 different models obtained by fine-tuning 3 pretrained BERT models on 5 groups of semantic resources: food versus nonfood entity, 2 subsets of Hansard food semantic tags, FoodOn semantic tags, and Systematized Nomenclature of Medicine Clinical Terms food semantic tags. Results: All BERT models provided very promising results with 93.30% to 94.31% macro F1 scores in the task of distinguishing food versus nonfood entity, which represents the new state-of-the-art technology in food information extraction. Considering the tasks where semantic tags are predicted, all BERT models obtained very promising results once again, with their macro F1 scores ranging from 73.39% to 78.96%. Conclusions: FoodNER can be used to extract and annotate food entities in 5 different tasks: food versus nonfood entities and distinguishing food entities on the level of food groups by using the closest Hansard semantic tags, the parent Hansard semantic tags, the FoodOn semantic tags, or the Systematized Nomenclature of Medicine Clinical Terms semantic tags. %M 34383671 %R 10.2196/28229 %U https://www.jmir.org/2021/8/e28229 %U https://doi.org/10.2196/28229 %U http://www.ncbi.nlm.nih.gov/pubmed/34383671 %0 Journal Article %@ 2368-7959 %I JMIR Publications %V 8 %N 8 %P e19824 %T Deep Learning With Anaphora Resolution for the Detection of Tweeters With Depression: Algorithm Development and Validation Study %A Wongkoblap,Akkapon %A Vadillo,Miguel A %A Curcin,Vasa %+ DIGITECH, Suranaree University of Technology, 111 University Avenue, Muang, Nakhon Ratchasima, 30000, Thailand, 66 44224336, wongkoblap@sut.ac.th %K depression %K mental health %K Twitter %K social media %K deep learning %K anaphora resolution %K multiple-instance learning %K depression markers %D 2021 %7 6.8.2021 %9 Original Paper %J JMIR Ment Health %G English %X Background: Mental health problems are widely recognized as a major public health challenge worldwide. This concern highlights the need to develop effective tools for detecting mental health disorders in the population. Social networks are a promising source of data wherein patients publish rich personal information that can be mined to extract valuable psychological cues; however, these data come with their own set of challenges, such as the need to disambiguate between statements about oneself and third parties. Traditionally, natural language processing techniques for social media have looked at text classifiers and user classification models separately, hence presenting a challenge for researchers who want to combine text sentiment and user sentiment analysis. Objective: The objective of this study is to develop a predictive model that can detect users with depression from Twitter posts and instantly identify textual content associated with mental health topics. The model can also address the problem of anaphoric resolution and highlight anaphoric interpretations. Methods: We retrieved the data set from Twitter by using a regular expression or stream of real-time tweets comprising 3682 users, of which 1983 self-declared their depression and 1699 declared no depression. Two multiple instance learning models were developed—one with and one without an anaphoric resolution encoder—to identify users with depression and highlight posts related to the mental health of the author. Several previously published models were applied to our data set, and their performance was compared with that of our models. Results: The maximum accuracy, F1 score, and area under the curve of our anaphoric resolution model were 92%, 92%, and 90%, respectively. The model outperformed alternative predictive models, which ranged from classical machine learning models to deep learning models. Conclusions: Our model with anaphoric resolution shows promising results when compared with other predictive models and provides valuable insights into textual content that is relevant to the mental health of the tweeter. %M 34383688 %R 10.2196/19824 %U https://mental.jmir.org/2021/8/e19824 %U https://doi.org/10.2196/19824 %U http://www.ncbi.nlm.nih.gov/pubmed/34383688 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 7 %P e26823 %T Predicting Biologic Therapy Outcome of Patients With Spondyloarthritis: Joint Models for Longitudinal and Survival Analysis %A Barata,Carolina %A Rodrigues,Ana Maria %A Canhão,Helena %A Vinga,Susana %A Carvalho,Alexandra M %+ Instituto de Telecomunicações, Instituto Superior Técnico, Universidade de Lisboa, Lisbon, Portugal, 351 218 418 454, alexandra.carvalho@tecnico.ulisboa.pt %K data mining %K survival analysis %K joint models %K spondyloarthritis %K drug survival %K rheumatic disease %K electronic medical records %K medical records %D 2021 %7 30.7.2021 %9 Original Paper %J JMIR Med Inform %G English %X Background: Rheumatic diseases are one of the most common chronic diseases worldwide. Among them, spondyloarthritis (SpA) is a group of highly debilitating diseases, with an early onset age, which significantly impacts patients’ quality of life, health care systems, and society in general. Recent treatment options consist of using biologic therapies, and establishing the most beneficial option according to the patients’ characteristics is a challenge that needs to be overcome. Meanwhile, the emerging availability of electronic medical records has made necessary the development of methods that can extract insightful information while handling all the challenges of dealing with complex, real-world data. Objective: The aim of this study was to achieve a better understanding of SpA patients’ therapy responses and identify the predictors that affect them, thereby enabling the prognosis of therapy success or failure. Methods: A data mining approach based on joint models for the survival analysis of the biologic therapy failure is proposed, which considers the information of both baseline and time-varying variables extracted from the electronic medical records of SpA patients from the database, Reuma.pt. Results: Our results show that being a male, starting biologic therapy at an older age, having a larger time interval between disease start and initiation of the first biologic drug, and being human leukocyte antigen (HLA)–B27 positive are indicators of a good prognosis for the biological drug survival; meanwhile, having disease onset or biologic therapy initiation occur in more recent years, a larger number of education years, and higher values of C-reactive protein or Bath Ankylosing Spondylitis Functional Index (BASFI) at baseline are all predictors of a greater risk of failure of the first biologic therapy. Conclusions: Among this Portuguese subpopulation of SpA patients, those who were male, HLA-B27 positive, and with a later biologic therapy starting date or a larger time interval between disease start and initiation of the first biologic therapy showed longer therapy adherence. Joint models proved to be a valuable tool for the analysis of electronic medical records in the field of rheumatic diseases and may allow for the identification of potential predictors of biologic therapy failure. %M 34328435 %R 10.2196/26823 %U https://medinform.jmir.org/2021/7/e26823 %U https://doi.org/10.2196/26823 %U http://www.ncbi.nlm.nih.gov/pubmed/34328435 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 6 %P e25482 %T Semantic Linkages of Obsessions From an International Obsessive-Compulsive Disorder Mobile App Data Set: Big Data Analytics Study %A Feusner,Jamie D %A Mohideen,Reza %A Smith,Stephen %A Patanam,Ilyas %A Vaitla,Anil %A Lam,Christopher %A Massi,Michelle %A Leow,Alex %+ Semel Institute for Neuroscience and Human Behavior, University of California Los Angeles, 300 UCLA Medical Plaza, Suite 2200, Los Angeles, CA, 90095, United States, 1 3102064951, jfeusner@mednet.ucla.edu %K OCD %K natural language processing %K clinical subtypes %K semantic %K word embedding %K clustering %D 2021 %7 21.6.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: Obsessive-compulsive disorder (OCD) is characterized by recurrent intrusive thoughts, urges, or images (obsessions) and repetitive physical or mental behaviors (compulsions). Previous factor analytic and clustering studies suggest the presence of three or four subtypes of OCD symptoms. However, these studies have relied on predefined symptom checklists, which are limited in breadth and may be biased toward researchers’ previous conceptualizations of OCD. Objective: In this study, we examine a large data set of freely reported obsession symptoms obtained from an OCD mobile app as an alternative to uncovering potential OCD subtypes. From this, we examine data-driven clusters of obsessions based on their latent semantic relationships in the English language using word embeddings. Methods: We extracted free-text entry words describing obsessions in a large sample of users of a mobile app, NOCD. Semantic vector space modeling was applied using the Global Vectors for Word Representation algorithm. A domain-specific extension, Mittens, was also applied to enhance the corpus with OCD-specific words. The resulting representations provided linear substructures of the word vector in a 100-dimensional space. We applied principal component analysis to the 100-dimensional vector representation of the most frequent words, followed by k-means clustering to obtain clusters of related words. Results: We obtained 7001 unique words representing obsessions from 25,369 individuals. Heuristics for determining the optimal number of clusters pointed to a three-cluster solution for grouping subtypes of OCD. The first had themes relating to relationship and just-right; the second had themes relating to doubt and checking; and the third had themes relating to contamination, somatic, physical harm, and sexual harm. All three clusters showed close semantic relationships with each other in the central area of convergence, with themes relating to harm. An equal-sized split-sample analysis across individuals and a split-sample analysis over time both showed overall stable cluster solutions. Words in the third cluster were the most frequently occurring words, followed by words in the first cluster. Conclusions: The clustering of naturally acquired obsessional words resulted in three major groupings of semantic themes, which partially overlapped with predefined checklists from previous studies. Furthermore, the closeness of the overall embedded relationships across clusters and their central convergence on harm suggests that, at least at the level of self-reported obsessional thoughts, most obsessions have close semantic relationships. Harm to self or others may be an underlying organizing theme across many obsessions. Notably, relationship-themed words, not previously included in factor-analytic studies, clustered with just-right words. These novel insights have potential implications for understanding how an apparent multitude of obsessional symptoms are connected by underlying themes. This observation could aid exposure-based treatment approaches and could be used as a conceptual framework for future research. %M 33892466 %R 10.2196/25482 %U https://www.jmir.org/2021/6/e25482 %U https://doi.org/10.2196/25482 %U http://www.ncbi.nlm.nih.gov/pubmed/33892466 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 6 %P e17137 %T Team Science in Precision Medicine: Study of Coleadership and Coauthorship Across Health Organizations %A An,Ning %A Mattison,John %A Chen,Xinyu %A Alterovitz,Gil %+ Brigham and Women's Hospital, Harvard Medical School, 75 Francis Street, Boston, MA, 02115, United States, 1 617 329 1445, gil_alterovitz@hms.harvard.edu %K precision medicine %K team science %D 2021 %7 14.6.2021 %9 Viewpoint %J J Med Internet Res %G English %X Background: Interdisciplinary collaborations bring lots of benefits to researchers in multiple areas, including precision medicine. Objective: This viewpoint aims at studying how cross-institution team science would affect the development of precision medicine. Methods: Publications of organizations on the eHealth Catalogue of Activities were collected in 2015 and 2017. The significance of the correlation between coleadership and coauthorship among different organizations was calculated using the Pearson chi-square test of independence. Other nonparametric tests examined whether organizations with coleaders publish more and better papers than organizations without coleaders. Results: A total of 374 publications from 69 organizations were analyzed in 2015, and 7064 papers from 87 organizations were analyzed in 2017. Organizations with coleadership published more papers (P<.001, 2015 and 2017), which received higher citations (Z=–13.547, P<.001, 2017), compared to those without coleadership. Organizations with coleaders tended to publish papers together (P<.001, 2015 and 2017). Conclusions: Our findings suggest that organizations in the field of precision medicine could greatly benefit from institutional-level team science. As a result, stronger collaboration is recommended. %M 34125070 %R 10.2196/17137 %U https://www.jmir.org/2021/6/e17137 %U https://doi.org/10.2196/17137 %U http://www.ncbi.nlm.nih.gov/pubmed/34125070 %0 Journal Article %@ 2368-7959 %I JMIR Publications %V 8 %N 6 %P e26681 %T Design and Implementation of an Informatics Infrastructure for Standardized Data Acquisition, Transfer, Storage, and Export in Psychiatric Clinical Routine: Feasibility Study %A Blitz,Rogério %A Storck,Michael %A Baune,Bernhard T %A Dugas,Martin %A Opel,Nils %+ Department of Psychiatry, University of Münster, Albert-Schweitzer-Str. 11, Münster, 48149, Germany, 49 2518358160, n_opel01@uni-muenster.de %K medical informatics %K digital mental health %K digital data collection %K psychiatry %K single-source metadata architecture transformation %K mental health %K design %K implementation %K feasibility %K informatics %K infrastructure %K data %D 2021 %7 9.6.2021 %9 Original Paper %J JMIR Ment Health %G English %X Background: Empirically driven personalized diagnostic applications and treatment stratification is widely perceived as a major hallmark in psychiatry. However, databased personalized decision making requires standardized data acquisition and data access, which are currently absent in psychiatric clinical routine. Objective: Here, we describe the informatics infrastructure implemented at the psychiatric Münster University Hospital, which allows standardized acquisition, transfer, storage, and export of clinical data for future real-time predictive modelling in psychiatric routine. Methods: We designed and implemented a technical architecture that includes an extension of the electronic health record (EHR) via scalable standardized data collection and data transfer between EHRs and research databases, thus allowing the pooling of EHRs and research data in a unified database and technical solutions for the visual presentation of collected data and analyses results in the EHR. The Single-source Metadata ARchitecture Transformation (SMA:T) was used as the software architecture. SMA:T is an extension of the EHR system and uses module-driven engineering to generate standardized applications and interfaces. The operational data model was used as the standard. Standardized data were entered on iPads via the Mobile Patient Survey (MoPat) and the web application Mopat@home, and the standardized transmission, processing, display, and export of data were realized via SMA:T. Results: The technical feasibility of the informatics infrastructure was demonstrated in the course of this study. We created 19 standardized documentation forms with 241 items. For 317 patients, 6451 instances were automatically transferred to the EHR system without errors. Moreover, 96,323 instances were automatically transferred from the EHR system to the research database for further analyses. Conclusions: In this study, we present the successful implementation of the informatics infrastructure enabling standardized data acquisition and data access for future real-time predictive modelling in clinical routine in psychiatry. The technical solution presented here might guide similar initiatives at other sites and thus help to pave the way toward future application of predictive models in psychiatric clinical routine. %M 34106072 %R 10.2196/26681 %U https://mental.jmir.org/2021/6/e26681 %U https://doi.org/10.2196/26681 %U http://www.ncbi.nlm.nih.gov/pubmed/34106072 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 4 %P e26075 %T Predictability of COVID-19 Hospitalizations, Intensive Care Unit Admissions, and Respiratory Assistance in Portugal: Longitudinal Cohort Study %A Patrício,André %A Costa,Rafael S %A Henriques,Rui %+ LAQV-REQUIMTE, NOVA School of Science and Technology, Universidade NOVA de Lisboa, Campus Caparica, 2829-516, Caparica, 2829-516, Portugal, 351 21 294 8351, rs.costa@fct.unl.pt %K COVID-19 %K machine learning %K intensive care admissions %K respiratory assistance %K predictive models %K data modeling %K clinical informatics %D 2021 %7 28.4.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: In the face of the current COVID-19 pandemic, the timely prediction of upcoming medical needs for infected individuals enables better and quicker care provision when necessary and management decisions within health care systems. Objective: This work aims to predict the medical needs (hospitalizations, intensive care unit admissions, and respiratory assistance) and survivability of individuals testing positive for SARS-CoV-2 infection in Portugal. Methods: A retrospective cohort of 38,545 infected individuals during 2020 was used. Predictions of medical needs were performed using state-of-the-art machine learning approaches at various stages of a patient’s cycle, namely, at testing (prehospitalization), at posthospitalization, and during postintensive care. A thorough optimization of state-of-the-art predictors was undertaken to assess the ability to anticipate medical needs and infection outcomes using demographic and comorbidity variables, as well as dates associated with symptom onset, testing, and hospitalization. Results: For the target cohort, 75% of hospitalization needs could be identified at the time of testing for SARS-CoV-2 infection. Over 60% of respiratory needs could be identified at the time of hospitalization. Both predictions had >50% precision. Conclusions: The conducted study pinpoints the relevance of the proposed predictive models as good candidates to support medical decisions in the Portuguese population, including both monitoring and in-hospital care decisions. A clinical decision support system is further provided to this end. %M 33835931 %R 10.2196/26075 %U https://www.jmir.org/2021/4/e26075 %U https://doi.org/10.2196/26075 %U http://www.ncbi.nlm.nih.gov/pubmed/33835931 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 4 %P e27275 %T Impact of Big Data Analytics on People’s Health: Overview of Systematic Reviews and Recommendations for Future Studies %A Borges do Nascimento,Israel Júnior %A Marcolino,Milena Soriano %A Abdulazeem,Hebatullah Mohamed %A Weerasekara,Ishanka %A Azzopardi-Muscat,Natasha %A Gonçalves,Marcos André %A Novillo-Ortiz,David %+ Division of Country Health Policies and Systems, World Health Organization, Regional Office for Europe, Marmorej 51, Copenhagen, 2100, Denmark, 45 61614868, dnovillo@who.int %K public health %K big data %K health status %K evidence-based medicine %K big data analytics %K secondary data analysis %K machine learning %K systematic review %K overview %K World Health Organization %D 2021 %7 13.4.2021 %9 Review %J J Med Internet Res %G English %X Background: Although the potential of big data analytics for health care is well recognized, evidence is lacking on its effects on public health. Objective: The aim of this study was to assess the impact of the use of big data analytics on people’s health based on the health indicators and core priorities in the World Health Organization (WHO) General Programme of Work 2019/2023 and the European Programme of Work (EPW), approved and adopted by its Member States, in addition to SARS-CoV-2–related studies. Furthermore, we sought to identify the most relevant challenges and opportunities of these tools with respect to people’s health. Methods: Six databases (MEDLINE, Embase, Cochrane Database of Systematic Reviews via Cochrane Library, Web of Science, Scopus, and Epistemonikos) were searched from the inception date to September 21, 2020. Systematic reviews assessing the effects of big data analytics on health indicators were included. Two authors independently performed screening, selection, data extraction, and quality assessment using the AMSTAR-2 (A Measurement Tool to Assess Systematic Reviews 2) checklist. Results: The literature search initially yielded 185 records, 35 of which met the inclusion criteria, involving more than 5,000,000 patients. Most of the included studies used patient data collected from electronic health records, hospital information systems, private patient databases, and imaging datasets, and involved the use of big data analytics for noncommunicable diseases. “Probability of dying from any of cardiovascular, cancer, diabetes or chronic renal disease” and “suicide mortality rate” were the most commonly assessed health indicators and core priorities within the WHO General Programme of Work 2019/2023 and the EPW 2020/2025. Big data analytics have shown moderate to high accuracy for the diagnosis and prediction of complications of diabetes mellitus as well as for the diagnosis and classification of mental disorders; prediction of suicide attempts and behaviors; and the diagnosis, treatment, and prediction of important clinical outcomes of several chronic diseases. Confidence in the results was rated as “critically low” for 25 reviews, as “low” for 7 reviews, and as “moderate” for 3 reviews. The most frequently identified challenges were establishment of a well-designed and structured data source, and a secure, transparent, and standardized database for patient data. Conclusions: Although the overall quality of included studies was limited, big data analytics has shown moderate to high accuracy for the diagnosis of certain diseases, improvement in managing chronic diseases, and support for prompt and real-time analyses of large sets of varied input data to diagnose and predict disease outcomes. Trial Registration: International Prospective Register of Systematic Reviews (PROSPERO) CRD42020214048; https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=214048 %M 33847586 %R 10.2196/27275 %U https://www.jmir.org/2021/4/e27275 %U https://doi.org/10.2196/27275 %U http://www.ncbi.nlm.nih.gov/pubmed/33847586 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 4 %P e24656 %T An Automatic Ontology-Based Approach to Support Logical Representation of Observable and Measurable Data for Healthy Lifestyle Management: Proof-of-Concept Study %A Chatterjee,Ayan %A Prinz,Andreas %A Gerdes,Martin %A Martinez,Santiago %+ Department of Information and Communication Technologies, Centre for e-Health, University of Agder, Jon Lilletuns Vei 9, Grimstad, 4879, Norway, 47 38141000, ayan.chatterjee@uia.no %K activity %K nutrition %K sensor %K questionnaire %K SSN %K ontology %K SNOMED CT %K eCoach %K personalized %K recommendation %K automated %K CDSS %K healthy lifestyle %K interoperability %K eHealth %K goal setting %K semantics %K simulation %K proposition %D 2021 %7 9.4.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: Lifestyle diseases, because of adverse health behavior, are the foremost cause of death worldwide. An eCoach system may encourage individuals to lead a healthy lifestyle with early health risk prediction, personalized recommendation generation, and goal evaluation. Such an eCoach system needs to collect and transform distributed heterogenous health and wellness data into meaningful information to train an artificially intelligent health risk prediction model. However, it may produce a data compatibility dilemma. Our proposed eHealth ontology can increase interoperability between different heterogeneous networks, provide situation awareness, help in data integration, and discover inferred knowledge. This “proof-of-concept” study will help sensor, questionnaire, and interview data to be more organized for health risk prediction and personalized recommendation generation targeting obesity as a study case. Objective: The aim of this study is to develop an OWL-based ontology (UiA eHealth Ontology/UiAeHo) model to annotate personal, physiological, behavioral, and contextual data from heterogeneous sources (sensor, questionnaire, and interview), followed by structuring and standardizing of diverse descriptions to generate meaningful, practical, personalized, and contextual lifestyle recommendations based on the defined rules. Methods: We have developed a simulator to collect dummy personal, physiological, behavioral, and contextual data related to artificial participants involved in health monitoring. We have integrated the concepts of “Semantic Sensor Network Ontology” and “Systematized Nomenclature of Medicine—Clinical Terms” to develop our proposed eHealth ontology. The ontology has been created using Protégé (version 5.x). We have used the Java-based “Jena Framework” (version 3.16) for building a semantic web application that includes resource description framework (RDF) application programming interface (API), OWL API, native tuple store (tuple database), and the SPARQL (Simple Protocol and RDF Query Language) query engine. The logical and structural consistency of the proposed ontology has been evaluated with the “HermiT 1.4.3.x” ontology reasoner available in Protégé 5.x. Results: The proposed ontology has been implemented for the study case “obesity.” However, it can be extended further to other lifestyle diseases. “UiA eHealth Ontology” has been constructed using logical axioms, declaration axioms, classes, object properties, and data properties. The ontology can be visualized with “Owl Viz,” and the formal representation has been used to infer a participant’s health status using the “HermiT” reasoner. We have also developed a module for ontology verification that behaves like a rule-based decision support system to predict the probability for health risk, based on the evaluation of the results obtained from SPARQL queries. Furthermore, we discussed the potential lifestyle recommendation generation plan against adverse behavioral risks. Conclusions: This study has led to the creation of a meaningful, context-specific ontology to model massive, unintuitive, raw, unstructured observations for health and wellness data (eg, sensors, interviews, questionnaires) and to annotate them with semantic metadata to create a compact, intelligible abstraction for health risk predictions for individualized recommendation generation. %M 33835031 %R 10.2196/24656 %U https://www.jmir.org/2021/4/e24656 %U https://doi.org/10.2196/24656 %U http://www.ncbi.nlm.nih.gov/pubmed/33835031 %0 Journal Article %@ 2369-2960 %I JMIR Publications %V 7 %N 4 %P e24288 %T Reporting and Availability of COVID-19 Demographic Data by US Health Departments (April to October 2020): Observational Study %A Ossom-Williamson,Peace %A Williams,Isaac Maximilian %A Kim,Kukhyoung %A Kindratt,Tiffany B %+ Public Health Program, Department of Kinesiology, College of Nursing and Health Innovation, University of Texas at Arlington, 500 W Nedderman Drive, Arlington, TX, 75919, United States, 1 817 272 7917, tiffany.kindratt@uta.edu %K coronavirus disease 2019 %K COVID-19 %K SARS-CoV-2 %K race %K ethnicity %K age %K sex %K health equity %K open data %K dashboards %D 2021 %7 6.4.2021 %9 Original Paper %J JMIR Public Health Surveill %G English %X Background: There is an urgent need for consistent collection of demographic data on COVID-19 morbidity and mortality and sharing it with the public in open and accessible ways. Due to the lack of consistency in data reporting during the initial spread of COVID-19, the Equitable Data Collection and Disclosure on COVID-19 Act was introduced into the Congress that mandates collection and reporting of demographic COVID-19 data on testing, treatments, and deaths by age, sex, race and ethnicity, primary language, socioeconomic status, disability, and county. To our knowledge, no studies have evaluated how COVID-19 demographic data have been collected before and after the introduction of this legislation. Objective: This study aimed to evaluate differences in reporting and public availability of COVID-19 demographic data by US state health departments and Washington, District of Columbia (DC) before (pre-Act), immediately after (post-Act), and 6 months after (6-month follow-up) the introduction of the Equitable Data Collection and Disclosure on COVID-19 Act in the Congress on April 21, 2020. Methods: We reviewed health department websites of all 50 US states and Washington, DC (N=51). We evaluated how each state reported age, sex, and race and ethnicity data for all confirmed COVID-19 cases and deaths and how they made this data available (ie, charts and tables only or combined with dashboards and machine-actionable downloadable formats) at the three timepoints. Results: We found statistically significant increases in the number of health departments reporting age-specific data for COVID-19 cases (P=.045) and resulting deaths (P=.002), sex-specific data for COVID-19 deaths (P=.003), and race- and ethnicity-specific data for confirmed cases (P=.003) and deaths (P=.005) post-Act and at the 6-month follow-up (P<.05 for all). The largest increases were race and ethnicity state data for confirmed cases (pre-Act: 18/51, 35%; post-Act: 31/51, 61%; 6-month follow-up: 46/51, 90%) and deaths due to COVID-19 (pre-Act: 13/51, 25%; post-Act: 25/51, 49%; and 6-month follow-up: 39/51, 76%). Although more health departments reported race and ethnicity data based on federal requirements (P<.001), over half (29/51, 56.9%) still did not report all racial and ethnic groups as per the Office of Management and Budget guidelines (pre-Act: 5/51, 10%; post-Act: 21/51, 41%; and 6-month follow-up: 27/51, 53%). The number of health departments that made COVID-19 data available for download significantly increased from 7 to 23 (P<.001) from our initial data collection (April 2020) to the 6-month follow-up, (October 2020). Conclusions: Although the increased demand for disaggregation has improved public reporting of demographics across health departments, an urgent need persists for the introduced legislation to be passed by the Congress for the US states to consistently collect and make characteristics of COVID-19 cases, deaths, and vaccinations available in order to allocate resources to mitigate disease spread. %M 33821804 %R 10.2196/24288 %U https://publichealth.jmir.org/2021/4/e24288 %U https://doi.org/10.2196/24288 %U http://www.ncbi.nlm.nih.gov/pubmed/33821804 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 4 %P e25645 %T A Framework for Criteria-Based Selection and Processing of Fast Healthcare Interoperability Resources (FHIR) Data for Statistical Analysis: Design and Implementation Study %A Gruendner,Julian %A Gulden,Christian %A Kampf,Marvin %A Mate,Sebastian %A Prokosch,Hans-Ulrich %A Zierk,Jakob %+ Chair of Medical Informatics, Department of Medical Informatics, Biometrics and Epidemiology, Friedrich-Alexander University Erlangen-Nürnberg, Wetterkreuz 15, Erlangen-Tennenlohe, Germany, 49 91318526785, julian.gruendner@fau.de %K data analysis %K data science %K data standardization %K digital medical information %K eHealth %K Fast Healthcare Interoperability Resources %K data harmonization %K medical information %K patient privacy %K data repositories %K HL7 FHIR %D 2021 %7 1.4.2021 %9 Original Paper %J JMIR Med Inform %G English %X Background: The harmonization and standardization of digital medical information for research purposes is a challenging and ongoing collaborative effort. Current research data repositories typically require extensive efforts in harmonizing and transforming original clinical data. The Fast Healthcare Interoperability Resources (FHIR) format was designed primarily to represent clinical processes; therefore, it closely resembles the clinical data model and is more widely available across modern electronic health records. However, no common standardized data format is directly suitable for statistical analyses, and data need to be preprocessed before statistical analysis. Objective: This study aimed to elucidate how FHIR data can be queried directly with a preprocessing service and be used for statistical analyses. Methods: We propose that the binary JavaScript Object Notation format of the PostgreSQL (PSQL) open source database is suitable for not only storing FHIR data, but also extending it with preprocessing and filtering services, which directly transform data stored in FHIR format into prepared data subsets for statistical analysis. We specified an interface for this preprocessor, implemented and deployed it at University Hospital Erlangen-Nürnberg, generated 3 sample data sets, and analyzed the available data. Results: We imported real-world patient data from 2016 to 2018 into a standard PSQL database, generating a dataset of approximately 35.5 million FHIR resources, including “Patient,” “Encounter,” “Condition” (diagnoses specified using International Classification of Diseases codes), “Procedure,” and “Observation” (laboratory test results). We then integrated the developed preprocessing service with the PSQL database and the locally installed web-based KETOS analysis platform. Advanced statistical analyses were feasible using the developed framework using 3 clinically relevant scenarios (data-driven establishment of hemoglobin reference intervals, assessment of anemia prevalence in patients with cancer, and investigation of the adverse effects of drugs). Conclusions: This study shows how the standard open source database PSQL can be used to store FHIR data and be integrated with a specifically developed preprocessing and analysis framework. This enables dataset generation with advanced medical criteria and the integration of subsequent statistical analysis. The web-based preprocessing service can be deployed locally at the hospital level, protecting patients’ privacy while being integrated with existing open source data analysis tools currently being developed across Germany. %M 33792554 %R 10.2196/25645 %U https://medinform.jmir.org/2021/4/e25645 %U https://doi.org/10.2196/25645 %U http://www.ncbi.nlm.nih.gov/pubmed/33792554 %0 Journal Article %@ 2152-7202 %I JMIR Publications %V 13 %N 1 %P e23011 %T Data Sharing Goals for Nonprofit Funders of Clinical Trials %A Coetzee,Timothy %A Ball,Mad Price %A Boutin,Marc %A Bronson,Abby %A Dexter,David T %A English,Rebecca A %A Furlong,Patricia %A Goodman,Andrew D %A Grossman,Cynthia %A Hernandez,Adrian F %A Hinners,Jennifer E %A Hudson,Lynn %A Kennedy,Annie %A Marchisotto,Mary Jane %A Matrisian,Lynn %A Myers,Elizabeth %A Nowell,W Benjamin %A Nosek,Brian A %A Sherer,Todd %A Shore,Carolyn %A Sim,Ida %A Smolensky,Luba %A Williams,Christopher %A Wood,Julie %A Terry,Sharon F %+ Genetic Alliance, 4301 Connecticut Ave NW #404, Washington, DC, United States, 1 (202) 966 5557, sterry@geneticalliance.org %K clinical trial %K biomedical research %K data sharing %K patients %D 2021 %7 29.3.2021 %9 Viewpoint %J J Participat Med %G English %X Sharing clinical trial data can provide value to research participants and communities by accelerating the development of new knowledge and therapies as investigators merge data sets to conduct new analyses, reproduce published findings to raise standards for original research, and learn from the work of others to generate new research questions. Nonprofit funders, including disease advocacy and patient-focused organizations, play a pivotal role in the promotion and implementation of data sharing policies. Funders are uniquely positioned to promote and support a culture of data sharing by serving as trusted liaisons between potential research participants and investigators who wish to access these participants’ networks for clinical trial recruitment. In short, nonprofit funders can drive policies and influence research culture. The purpose of this paper is to detail a set of aspirational goals and forward thinking, collaborative data sharing solutions for nonprofit funders to fold into existing funding policies. The goals of this paper convey the complexity of the opportunities and challenges facing nonprofit funders and the appropriate prioritization of data sharing within their organizations and may serve as a starting point for a data sharing toolkit for nonprofit funders of clinical trials to provide the clarity of mission and mechanisms to enforce the data sharing practices their communities already expect are happening. %M 33779573 %R 10.2196/23011 %U https://jopm.jmir.org/2021/1/e23011 %U https://doi.org/10.2196/23011 %U http://www.ncbi.nlm.nih.gov/pubmed/33779573 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 3 %P e23328 %T Realistic High-Resolution Body Computed Tomography Image Synthesis by Using Progressive Growing Generative Adversarial Network: Visual Turing Test %A Park,Ho Young %A Bae,Hyun-Jin %A Hong,Gil-Sun %A Kim,Minjee %A Yun,JiHye %A Park,Sungwon %A Chung,Won Jung %A Kim,NamKug %+ Department of Radiology and Research Institute of Radiology, University of Ulsan College of Medicine & Asan Medical Center, 88 Olympic-ro 43-gil, Songpa-gu, Seoul, 05505, Republic of Korea, 82 2 3010 1548, hgs2013@gmail.com %K generative adversarial network %K unsupervised deep learning %K computed tomography %K synthetic body images %K visual Turing test %D 2021 %7 17.3.2021 %9 Original Paper %J JMIR Med Inform %G English %X Background: Generative adversarial network (GAN)–based synthetic images can be viable solutions to current supervised deep learning challenges. However, generating highly realistic images is a prerequisite for these approaches. Objective: The aim of this study was to investigate and validate the unsupervised synthesis of highly realistic body computed tomography (CT) images by using a progressive growing GAN (PGGAN) trained to learn the probability distribution of normal data. Methods: We trained the PGGAN by using 11,755 body CT scans. Ten radiologists (4 radiologists with <5 years of experience [Group I], 4 radiologists with 5-10 years of experience [Group II], and 2 radiologists with >10 years of experience [Group III]) evaluated the results in a binary approach by using an independent validation set of 300 images (150 real and 150 synthetic) to judge the authenticity of each image. Results: The mean accuracy of the 10 readers in the entire image set was higher than random guessing (1781/3000, 59.4% vs 1500/3000, 50.0%, respectively; P<.001). However, in terms of identifying synthetic images as fake, there was no significant difference in the specificity between the visual Turing test and random guessing (779/1500, 51.9% vs 750/1500, 50.0%, respectively; P=.29). The accuracy between the 3 reader groups with different experience levels was not significantly different (Group I, 696/1200, 58.0%; Group II, 726/1200, 60.5%; and Group III, 359/600, 59.8%; P=.36). Interreader agreements were poor (κ=0.11) for the entire image set. In subgroup analysis, the discrepancies between real and synthetic CT images occurred mainly in the thoracoabdominal junction and in the anatomical details. Conclusions: The GAN can synthesize highly realistic high-resolution body CT images that are indistinguishable from real images; however, it has limitations in generating body images of the thoracoabdominal junction and lacks accuracy in the anatomical details. %M 33609339 %R 10.2196/23328 %U https://medinform.jmir.org/2021/3/e23328 %U https://doi.org/10.2196/23328 %U http://www.ncbi.nlm.nih.gov/pubmed/33609339 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 2 %P e18766 %T Classification Accuracy of Hepatitis C Virus Infection Outcome: Data Mining Approach %A Frias,Mario %A Moyano,Jose M %A Rivero-Juarez,Antonio %A Luna,Jose M %A Camacho,Ángela %A Fardoun,Habib M %A Machuca,Isabel %A Al-Twijri,Mohamed %A Rivero,Antonio %A Ventura,Sebastian %+ Department of Clinical Virology and Zoonoses, Maimonides Biomedical Research Institute of Córdoba, Avenida Menéndez Pidal s/n, Córdoba, 14004, Spain, 34 957213806, ariveror@gmail.com %K HIV/HCV %K data mining %K PART %K ensemble %K classification accuracy %D 2021 %7 24.2.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: The dataset from genes used to predict hepatitis C virus outcome was evaluated in a previous study using a conventional statistical methodology. Objective: The aim of this study was to reanalyze this same dataset using the data mining approach in order to find models that improve the classification accuracy of the genes studied. Methods: We built predictive models using different subsets of factors, selected according to their importance in predicting patient classification. We then evaluated each independent model and also a combination of them, leading to a better predictive model. Results: Our data mining approach identified genetic patterns that escaped detection using conventional statistics. More specifically, the partial decision trees and ensemble models increased the classification accuracy of hepatitis C virus outcome compared with conventional methods. Conclusions: Data mining can be used more extensively in biomedicine, facilitating knowledge building and management of human diseases. %M 33624609 %R 10.2196/18766 %U https://www.jmir.org/2021/2/e18766 %U https://doi.org/10.2196/18766 %U http://www.ncbi.nlm.nih.gov/pubmed/33624609 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 9 %N 2 %P e21679 %T Lexicon Development for COVID-19-related Concepts Using Open-source Word Embedding Sources: An Intrinsic and Extrinsic Evaluation %A Parikh,Soham %A Davoudi,Anahita %A Yu,Shun %A Giraldo,Carolina %A Schriver,Emily %A Mowery,Danielle %+ Department of Biostatistics, Epidemiology, & Informatics, Institute for Biomedical Informatics, University of Pennsylvania, A206 Richards Hall,, 3700 Hamilton Walk, Philadelphia, PA, 19104-6021, United States, 1 215 746 6677, dlmowery@pennmedicine.upenn.edu %K natural language processing %K word embedding %K COVID-19 %K intrinsic %K open-source %K computation %K model %K prediction %K semantic %K syntactic %K pattern %D 2021 %7 22.2.2021 %9 Original Paper %J JMIR Med Inform %G English %X Background: Scientists are developing new computational methods and prediction models to better clinically understand COVID-19 prevalence, treatment efficacy, and patient outcomes. These efforts could be improved by leveraging documented COVID-19–related symptoms, findings, and disorders from clinical text sources in an electronic health record. Word embeddings can identify terms related to these clinical concepts from both the biomedical and nonbiomedical domains, and are being shared with the open-source community at large. However, it’s unclear how useful openly available word embeddings are for developing lexicons for COVID-19–related concepts. Objective: Given an initial lexicon of COVID-19–related terms, this study aims to characterize the returned terms by similarity across various open-source word embeddings and determine common semantic and syntactic patterns between the COVID-19 queried terms and returned terms specific to the word embedding source. Methods: We compared seven openly available word embedding sources. Using a series of COVID-19–related terms for associated symptoms, findings, and disorders, we conducted an interannotator agreement study to determine how accurately the most similar returned terms could be classified according to semantic types by three annotators. We conducted a qualitative study of COVID-19 queried terms and their returned terms to detect informative patterns for constructing lexicons. We demonstrated the utility of applying such learned synonyms to discharge summaries by reporting the proportion of patients identified by concept among three patient cohorts: pneumonia (n=6410), acute respiratory distress syndrome (n=8647), and COVID-19 (n=2397). Results: We observed high pairwise interannotator agreement (Cohen kappa) for symptoms (0.86-0.99), findings (0.93-0.99), and disorders (0.93-0.99). Word embedding sources generated based on characters tend to return more synonyms (mean count of 7.2 synonyms) compared to token-based embedding sources (mean counts range from 2.0 to 3.4). Word embedding sources queried using a qualifier term (eg, dry cough or muscle pain) more often returned qualifiers of the similar semantic type (eg, “dry” returns consistency qualifiers like “wet” and “runny”) compared to a single term (eg, cough or pain) queries. A higher proportion of patients had documented fever (0.61-0.84), cough (0.41-0.55), shortness of breath (0.40-0.59), and hypoxia (0.51-0.56) retrieved than other clinical features. Terms for dry cough returned a higher proportion of patients with COVID-19 (0.07) than the pneumonia (0.05) and acute respiratory distress syndrome (0.03) populations. Conclusions: Word embeddings are valuable technology for learning related terms, including synonyms. When leveraging openly available word embedding sources, choices made for the construction of the word embeddings can significantly influence the words learned. %M 33544689 %R 10.2196/21679 %U https://medinform.jmir.org/2021/2/e21679 %U https://doi.org/10.2196/21679 %U http://www.ncbi.nlm.nih.gov/pubmed/33544689 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 23 %N 2 %P e16348 %T A Social Media Campaign (#datasaveslives) to Promote the Benefits of Using Health Data for Research Purposes: Mixed Methods Analysis %A Hassan,Lamiece %A Nenadic,Goran %A Tully,Mary Patricia %+ Division of Informatics, Imaging and Data Sciences, The University of Manchester, Oxford Road, Manchester, M13 9PL, United Kingdom, 44 01612751160, lamiece.hassan@manchester.ac.uk %K social media %K public engagement %K social network analysis %K medical research %D 2021 %7 16.2.2021 %9 Original Paper %J J Med Internet Res %G English %X Background: Social media provides the potential to engage a wide audience about scientific research, including the public. However, little empirical research exists to guide health scientists regarding what works and how to optimize impact. We examined the social media campaign #datasaveslives established in 2014 to highlight positive examples of the use and reuse of health data in research. Objective: This study aims to examine how the #datasaveslives hashtag was used on social media, how often, and by whom; thus, we aim to provide insights into the impact of a major social media campaign in the UK health informatics research community and further afield. Methods: We analyzed all publicly available posts (tweets) that included the hashtag #datasaveslives (N=13,895) on the microblogging platform Twitter between September 1, 2016, and August 31, 2017. Using a combination of qualitative and quantitative analyses, we determined the frequency and purpose of tweets. Social network analysis was used to analyze and visualize tweet sharing (retweet) networks among hashtag users. Results: Overall, we found 4175 original posts and 9720 retweets featuring #datasaveslives by 3649 unique Twitter users. In total, 66.01% (2756/4175) of the original posts were retweeted at least once. Higher frequencies of tweets were observed during the weeks of prominent policy publications, popular conferences, and public engagement events. Cluster analysis based on retweet relationships revealed an interconnected series of groups of #datasaveslives users in academia, health services and policy, and charities and patient networks. Thematic analysis of tweets showed that #datasaveslives was used for a broader range of purposes than indexing information, including event reporting, encouraging participation and action, and showing personal support for data sharing. Conclusions: This study shows that a hashtag-based social media campaign was effective in encouraging a wide audience of stakeholders to disseminate positive examples of health research. Furthermore, the findings suggest that the campaign supported community building and bridging practices within and between the interdisciplinary sectors related to the field of health data science and encouraged individuals to demonstrate personal support for sharing health data. %M 33591280 %R 10.2196/16348 %U http://www.jmir.org/2021/2/e16348/ %U https://doi.org/10.2196/16348 %U http://www.ncbi.nlm.nih.gov/pubmed/33591280 %0 Journal Article %@ 1929-0748 %I JMIR Publications %V 10 %N 2 %P e22505 %T Initiatives, Concepts, and Implementation Practices of FAIR (Findable, Accessible, Interoperable, and Reusable) Data Principles in Health Data Stewardship Practice: Protocol for a Scoping Review %A Inau,Esther Thea %A Sack,Jean %A Waltemath,Dagmar %A Zeleke,Atinkut Alamirrew %+ Medical Informatics, Institute for Community Medicine, University Medicine Greifswald, Ellernholzstraße 1-2, Greifswald, 17487, Germany, 49 3834 86 7548, inaue@uni-greifswald.de %K data stewardship %K FAIR data principles %K health research %K PRISMA %K scoping review %D 2021 %7 2.2.2021 %9 Protocol %J JMIR Res Protoc %G English %X Background: Data stewardship is an essential driver of research and clinical practice. Data collection, storage, access, sharing, and analytics are dependent on the proper and consistent use of data management principles among the investigators. Since 2016, the FAIR (findable, accessible, interoperable, and reusable) guiding principles for research data management have been resonating in scientific communities. Enabling data to be findable, accessible, interoperable, and reusable is currently believed to strengthen data sharing, reduce duplicated efforts, and move toward harmonization of data from heterogeneous unconnected data silos. FAIR initiatives and implementation trends are rising in different facets of scientific domains. It is important to understand the concepts and implementation practices of the FAIR data principles as applied to human health data by studying the flourishing initiatives and implementation lessons relevant to improved health research, particularly for data sharing during the coronavirus pandemic. Objective: This paper aims to conduct a scoping review to identify concepts, approaches, implementation experiences, and lessons learned in FAIR initiatives in the health data domain. Methods: The Arksey and O’Malley stage-based methodological framework for scoping reviews will be used for this review. PubMed, Web of Science, and Google Scholar will be searched to access relevant primary and grey publications. Articles written in English and published from 2014 onwards with FAIR principle concepts or practices in the health domain will be included. Duplication among the 3 data sources will be removed using a reference management software. The articles will then be exported to a systematic review management software. At least two independent authors will review the eligibility of each article based on defined inclusion and exclusion criteria. A pretested charting tool will be used to extract relevant information from the full-text papers. Qualitative thematic synthesis analysis methods will be employed by coding and developing themes. Themes will be derived from the research questions and contents in the included papers. Results: The results will be reported using the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-analyses Extension for Scoping Reviews) reporting guidelines. We anticipate finalizing the manuscript for this work in 2021. Conclusions: We believe comprehensive information about the FAIR data principles, initiatives, implementation practices, and lessons learned in the FAIRification process in the health domain is paramount to supporting both evidence-based clinical practice and research transparency in the era of big data and open research publishing. International Registered Report Identifier (IRRID): PRR1-10.2196/22505 %M 33528373 %R 10.2196/22505 %U https://www.researchprotocols.org/2021/2/e22505 %U https://doi.org/10.2196/22505 %U http://www.ncbi.nlm.nih.gov/pubmed/33528373 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 8 %N 11 %P e21252 %T Patient Triage by Topic Modeling of Referral Letters: Feasibility Study %A Spasic,Irena %A Button,Kate %+ School of Computer Science & Informatics, Cardiff University, 5 The Parade, Cardiff, CF24 3AA, United Kingdom, 44 02920870320, spasici@cardiff.ac.uk %K natural language processing %K machine learning %K data science %K medical informatics %K computer-assisted decision making %D 2020 %7 6.11.2020 %9 Original Paper %J JMIR Med Inform %G English %X Background: Musculoskeletal conditions are managed within primary care, but patients can be referred to secondary care if a specialist opinion is required. The ever-increasing demand for health care resources emphasizes the need to streamline care pathways with the ultimate aim of ensuring that patients receive timely and optimal care. Information contained in referral letters underpins the referral decision-making process but is yet to be explored systematically for the purposes of treatment prioritization for musculoskeletal conditions. Objective: This study aims to explore the feasibility of using natural language processing and machine learning to automate the triage of patients with musculoskeletal conditions by analyzing information from referral letters. Specifically, we aim to determine whether referral letters can be automatically assorted into latent topics that are clinically relevant, that is, considered relevant when prescribing treatments. Here, clinical relevance is assessed by posing 2 research questions. Can latent topics be used to automatically predict treatment? Can clinicians interpret latent topics as cohorts of patients who share common characteristics or experiences such as medical history, demographics, and possible treatments? Methods: We used latent Dirichlet allocation to model each referral letter as a finite mixture over an underlying set of topics and model each topic as an infinite mixture over an underlying set of topic probabilities. The topic model was evaluated in the context of automating patient triage. Given a set of treatment outcomes, a binary classifier was trained for each outcome using previously extracted topics as the input features of the machine learning algorithm. In addition, a qualitative evaluation was performed to assess the human interpretability of topics. Results: The prediction accuracy of binary classifiers outperformed the stratified random classifier by a large margin, indicating that topic modeling could be used to predict the treatment, thus effectively supporting patient triage. The qualitative evaluation confirmed the high clinical interpretability of the topic model. Conclusions: The results established the feasibility of using natural language processing and machine learning to automate triage of patients with knee or hip pain by analyzing information from their referral letters. %M 33155985 %R 10.2196/21252 %U https://medinform.jmir.org/2020/11/e21252 %U https://doi.org/10.2196/21252 %U http://www.ncbi.nlm.nih.gov/pubmed/33155985 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 8 %N 10 %P e20324 %T Institution-Specific Machine Learning Models for Prehospital Assessment to Predict Hospital Admission: Prediction Model Development Study %A Shirakawa,Toru %A Sonoo,Tomohiro %A Ogura,Kentaro %A Fujimori,Ryo %A Hara,Konan %A Goto,Tadahiro %A Hashimoto,Hideki %A Takahashi,Yuji %A Naraba,Hiromu %A Nakamura,Kensuke %+ Department of Emergency Medicine, Hitachi General Hospital, Jounan-cho 2-1-1, Hitachi, 317-0077, Japan, 81 294 23 1111, tomohiro.sonoo@txpmedical.com %K prehospital %K prediction %K hospital admission %K emergency medicine %K machine learning %K data science %D 2020 %7 27.10.2020 %9 Original Paper %J JMIR Med Inform %G English %X Background: Although multiple prediction models have been developed to predict hospital admission to emergency departments (EDs) to address overcrowding and patient safety, only a few studies have examined prediction models for prehospital use. Development of institution-specific prediction models is feasible in this age of data science, provided that predictor-related information is readily collectable. Objective: We aimed to develop a hospital admission prediction model based on patient information that is commonly available during ambulance transport before hospitalization. Methods: Patients transported by ambulance to our ED from April 2018 through March 2019 were enrolled. Candidate predictors were age, sex, chief complaint, vital signs, and patient medical history, all of which were recorded by emergency medical teams during ambulance transport. Patients were divided into two cohorts for derivation (3601/5145, 70.0%) and validation (1544/5145, 30.0%). For statistical models, logistic regression, logistic lasso, random forest, and gradient boosting machine were used. Prediction models were developed in the derivation cohort. Model performance was assessed by area under the receiver operating characteristic curve (AUROC) and association measures in the validation cohort. Results: Of 5145 patients transported by ambulance, including deaths in the ED and hospital transfers, 2699 (52.5%) required hospital admission. Prediction performance was higher with the addition of predictive factors, attaining the best performance with an AUROC of 0.818 (95% CI 0.792-0.839) with a machine learning model and predictive factors of age, sex, chief complaint, and vital signs. Sensitivity and specificity of this model were 0.744 (95% CI 0.716-0.773) and 0.745 (95% CI 0.709-0.776), respectively. Conclusions: For patients transferred to EDs, we developed a well-performing hospital admission prediction model based on routinely collected prehospital information including chief complaints. %M 33107830 %R 10.2196/20324 %U http://medinform.jmir.org/2020/10/e20324/ %U https://doi.org/10.2196/20324 %U http://www.ncbi.nlm.nih.gov/pubmed/33107830 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 22 %N 10 %P e21980 %T Application of Big Data Technology for COVID-19 Prevention and Control in China: Lessons and Recommendations %A Wu,Jun %A Wang,Jian %A Nicholas,Stephen %A Maitland,Elizabeth %A Fan,Qiuyan %+ Dong Fureng Institute of Economic and Social Development, Wuhan University, No.54 Dongsi Lishi Hutong, Dongcheng District, Beijing, 100010, China, 86 13864157135, 00032954@whu.edu.cn %K big data %K COVID-19 %K disease prevention and control %D 2020 %7 9.10.2020 %9 Original Paper %J J Med Internet Res %G English %X Background: In the prevention and control of infectious diseases, previous research on the application of big data technology has mainly focused on the early warning and early monitoring of infectious diseases. Although the application of big data technology for COVID-19 warning and monitoring remain important tasks, prevention of the disease’s rapid spread and reduction of its impact on society are currently the most pressing challenges for the application of big data technology during the COVID-19 pandemic. After the outbreak of COVID-19 in Wuhan, the Chinese government and nongovernmental organizations actively used big data technology to prevent, contain, and control the spread of COVID-19. Objective: The aim of this study is to discuss the application of big data technology to prevent, contain, and control COVID-19 in China; draw lessons; and make recommendations. Methods: We discuss the data collection methods and key data information that existed in China before the outbreak of COVID-19 and how these data contributed to the prevention and control of COVID-19. Next, we discuss China’s new data collection methods and new information assembled after the outbreak of COVID-19. Based on the data and information collected in China, we analyzed the application of big data technology from the perspectives of data sources, data application logic, data application level, and application results. In addition, we analyzed the issues, challenges, and responses encountered by China in the application of big data technology from four perspectives: data access, data use, data sharing, and data protection. Suggestions for improvements are made for data collection, data circulation, data innovation, and data security to help understand China’s response to the epidemic and to provide lessons for other countries’ prevention and control of COVID-19. Results: In the process of the prevention and control of COVID-19 in China, big data technology has played an important role in personal tracking, surveillance and early warning, tracking of the virus’s sources, drug screening, medical treatment, resource allocation, and production recovery. The data used included location and travel data, medical and health data, news media data, government data, online consumption data, data collected by intelligent equipment, and epidemic prevention data. We identified a number of big data problems including low efficiency of data collection, difficulty in guaranteeing data quality, low efficiency of data use, lack of timely data sharing, and data privacy protection issues. To address these problems, we suggest unified data collection standards, innovative use of data, accelerated exchange and circulation of data, and a detailed and rigorous data protection system. Conclusions: China has used big data technology to prevent and control COVID-19 in a timely manner. To prevent and control infectious diseases, countries must collect, clean, and integrate data from a wide range of sources; use big data technology to analyze a wide range of big data; create platforms for data analyses and sharing; and address privacy issues in the collection and use of big data. %M 33001836 %R 10.2196/21980 %U http://www.jmir.org/2020/10/e21980/ %U https://doi.org/10.2196/21980 %U http://www.ncbi.nlm.nih.gov/pubmed/33001836 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 22 %N 10 %P e19879 %T Integrating Genomics and Clinical Data for Statistical Analysis by Using GEnome MINIng (GEMINI) and Fast Healthcare Interoperability Resources (FHIR): System Design and Implementation %A Gruendner,Julian %A Wolf,Nicolas %A Tögel,Lars %A Haller,Florian %A Prokosch,Hans-Ulrich %A Christoph,Jan %+ Department of Medical Informatics, Friedrich-Alexander University, Erlangen-Nürnberg, Wetterkreuz 15, Erlangen-Tennenlohe, 91058, Germany, 49 9131 8567787, julian.gruendner@fau.de %K next-generation sequencing %K data analysis %K genetic databases %K GEnome MINIng %K Fast Healthcare Interoperability Resources %K data standardization %D 2020 %7 7.10.2020 %9 Original Paper %J J Med Internet Res %G English %X Background: The introduction of next-generation sequencing (NGS) into molecular cancer diagnostics has led to an increase in the data available for the identification and evaluation of driver mutations and for defining personalized cancer treatment regimens. The meaningful combination of omics data, ie, pathogenic gene variants and alterations with other patient data, to understand the full picture of malignancy has been challenging. Objective: This study describes the implementation of a system capable of processing, analyzing, and subsequently combining NGS data with other clinical patient data for analysis within and across institutions. Methods: On the basis of the already existing NGS analysis workflows for the identification of malignant gene variants at the Institute of Pathology of the University Hospital Erlangen, we defined basic requirements on an NGS processing and analysis pipeline and implemented a pipeline based on the GEMINI (GEnome MINIng) open source genetic variation database. For the purpose of validation, this pipeline was applied to data from the 1000 Genomes Project and subsequently to NGS data derived from 206 patients of a local hospital. We further integrated the pipeline into existing structures of data integration centers at the University Hospital Erlangen and combined NGS data with local nongenomic patient-derived data available in Fast Healthcare Interoperability Resources format. Results: Using data from the 1000 Genomes Project and from the patient cohort as input, the implemented system produced the same results as already established methodologies. Further, it satisfied all our identified requirements and was successfully integrated into the existing infrastructure. Finally, we showed in an exemplary analysis how the data could be quickly loaded into and analyzed in KETOS, a web-based analysis platform for statistical analysis and clinical decision support. Conclusions: This study demonstrates that the GEMINI open source database can be augmented to create an NGS analysis pipeline. The pipeline generates high-quality results consistent with the already established workflows for gene variant annotation and pathological evaluation. We further demonstrate how NGS-derived genomic and other clinical data can be combined for further statistical analysis, thereby providing for data integration using standardized vocabularies and methods. Finally, we demonstrate the feasibility of the pipeline integration into hospital workflows by providing an exemplary integration into the data integration center infrastructure, which is currently being established across Germany. %M 33026356 %R 10.2196/19879 %U http://www.jmir.org/2020/10/e19879/ %U https://doi.org/10.2196/19879 %U http://www.ncbi.nlm.nih.gov/pubmed/33026356 %0 Journal Article %@ 2291-5222 %I JMIR Publications %V 8 %N 9 %P e17818 %T Using Machine Learning and Smartphone and Smartwatch Data to Detect Emotional States and Transitions: Exploratory Study %A Sultana,Madeena %A Al-Jefri,Majed %A Lee,Joon %+ Data Intelligence for Health Lab, Cumming School of Medicine, University of Calgary, Teaching Research & Wellness 5E17, 3280 Hospital Dr. NW, Calgary, AB, T2N 4Z6, Canada, 1 403 220 2968, joonwu.lee@ucalgary.ca %K mHealth %K mental health %K emotion detection %K emotional transition detection %K spatiotemporal context %K supervised machine learning %K artificial intelligence %K mobile phone %K digital biomarkers %K digital phenotyping %D 2020 %7 29.9.2020 %9 Original Paper %J JMIR Mhealth Uhealth %G English %X Background: Emotional state in everyday life is an essential indicator of health and well-being. However, daily assessment of emotional states largely depends on active self-reports, which are often inconvenient and prone to incomplete information. Automated detection of emotional states and transitions on a daily basis could be an effective solution to this problem. However, the relationship between emotional transitions and everyday context remains to be unexplored. Objective: This study aims to explore the relationship between contextual information and emotional transitions and states to evaluate the feasibility of detecting emotional transitions and states from daily contextual information using machine learning (ML) techniques. Methods: This study was conducted on the data of 18 individuals from a publicly available data set called ExtraSensory. Contextual and sensor data were collected using smartphone and smartwatch sensors in a free-living condition, where the number of days for each person varied from 3 to 9. Sensors included an accelerometer, a gyroscope, a compass, location services, a microphone, a phone state indicator, light, temperature, and a barometer. The users self-reported approximately 49 discrete emotions at different intervals via a smartphone app throughout the data collection period. We mapped the 49 reported discrete emotions to the 3 dimensions of the pleasure, arousal, and dominance model and considered 6 emotional states: discordant, pleased, dissuaded, aroused, submissive, and dominant. We built general and personalized models for detecting emotional transitions and states every 5 min. The transition detection problem is a binary classification problem that detects whether a person’s emotional state has changed over time, whereas state detection is a multiclass classification problem. In both cases, a wide range of supervised ML algorithms were leveraged, in addition to data preprocessing, feature selection, and data imbalance handling techniques. Finally, an assessment was conducted to shed light on the association between everyday context and emotional states. Results: This study obtained promising results for emotional state and transition detection. The best area under the receiver operating characteristic (AUROC) curve for emotional state detection reached 60.55% in the general models and an average of 96.33% across personalized models. Despite the highly imbalanced data, the best AUROC curve for emotional transition detection reached 90.5% in the general models and an average of 88.73% across personalized models. In general, feature analyses show that spatiotemporal context, phone state, and motion-related information are the most informative factors for emotional state and transition detection. Our assessment showed that lifestyle has an impact on the predictability of emotion. Conclusions: Our results demonstrate a strong association of daily context with emotional states and transitions as well as the feasibility of detecting emotional states and transitions using data from smartphone and smartwatch sensors. %M 32990638 %R 10.2196/17818 %U http://mhealth.jmir.org/2020/9/e17818/ %U https://doi.org/10.2196/17818 %U http://www.ncbi.nlm.nih.gov/pubmed/32990638 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 8 %N 9 %P e18920 %T Secure Record Linkage of Large Health Data Sets: Evaluation of a Hybrid Cloud Model %A Brown,Adrian Paul %A Randall,Sean M %+ Centre for Data Linkage, Curtin University, Kent Street, Bentley, 6021, Australia, 61 892669253, adrian.brown@curtin.edu.au %K cloud computing %K medical record linkage %K confidentiality %K data science %D 2020 %7 23.9.2020 %9 Original Paper %J JMIR Med Inform %G English %X Background: The linking of administrative data across agencies provides the capability to investigate many health and social issues with the potential to deliver significant public benefit. Despite its advantages, the use of cloud computing resources for linkage purposes is scarce, with the storage of identifiable information on cloud infrastructure assessed as high risk by data custodians. Objective: This study aims to present a model for record linkage that utilizes cloud computing capabilities while assuring custodians that identifiable data sets remain secure and local. Methods: A new hybrid cloud model was developed, including privacy-preserving record linkage techniques and container-based batch processing. An evaluation of this model was conducted with a prototype implementation using large synthetic data sets representative of administrative health data. Results: The cloud model kept identifiers on premises and uses privacy-preserved identifiers to run all linkage computations on cloud infrastructure. Our prototype used a managed container cluster in Amazon Web Services to distribute the computation using existing linkage software. Although the cost of computation was relatively low, the use of existing software resulted in an overhead of processing of 35.7% (149/417 min execution time). Conclusions: The result of our experimental evaluation shows the operational feasibility of such a model and the exciting opportunities for advancing the analysis of linkage outputs. %M 32965236 %R 10.2196/18920 %U http://medinform.jmir.org/2020/9/e18920/ %U https://doi.org/10.2196/18920 %U http://www.ncbi.nlm.nih.gov/pubmed/32965236 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 22 %N 8 %P e19572 %T Information Disclosure During the COVID-19 Epidemic in China: City-Level Observational Study %A Hu,Guangyu %A Li,Peiyi %A Yuan,Changzheng %A Tao,Chenglin %A Wen,Hai %A Liu,Qiannan %A Qiu,Wuqi %+ Institute of Medical Information/Center for Health Policy and Management, Chinese Academy of Medical Sciences and Peking Union Medical College, No 3 Bldg 3th Fl, Yabao Road, Chaoyang District, Beijing, 100020, China, 86 01052328735, hu.guangyu@imicams.ac.cn %K information disclosure %K COVID-19 %K website %K risk %K communication %K China %K disclosure %K pandemic %K health information %K public health %D 2020 %7 27.8.2020 %9 Original Paper %J J Med Internet Res %G English %X Background: Information disclosure is a top priority for official responses to the COVID-19 pandemic. The timely and standardized information published by authorities as a response to the crisis can better inform the public and enable better preparations for the pandemic; however, there is limited evidence of any systematic analyses of the disclosed epidemic information. This in turn has important implications for risk communication. Objective: This study aimed to describe and compare the officially released content regarding local epidemic situations as well as analyze the characteristics of information disclosure through local communication in major cities in China. Methods: The 31 capital cities in mainland China were included in this city-level observational study. Data were retrieved from local municipalities and health commission websites as of March 18, 2020. A checklist was employed as a rapid qualitative assessment tool to analyze the information disclosure performance of each city. Descriptive analyses and data visualizations were produced to present and compare the comparative performances of the cities. Results: In total, 29 of 31 cities (93.5%) established specific COVID-19 webpages to disclose information. Among them, 12 of the city webpages were added to their corresponding municipal websites. A majority of the cities (21/31, 67.7%) published their first cases of infection in a timely manner on the actual day of confirmation. Regarding the information disclosures highlighted on the websites, news updates from local media or press briefings were the most prevalent (28/29, 96.6%), followed by epidemic surveillance (25/29, 86.2%), and advice for the public (25/29, 86.2%). Clarifications of misinformation and frequently asked questions were largely overlooked as only 2 cities provided this valuable information. The median daily update frequency of epidemic surveillance summaries was 1.2 times per day (IQR 1.0-1.3 times), and the majority of these summaries (18/25, 72.0%) also provided detailed information regarding confirmed cases. The reporting of key indicators in the epidemic surveillance summaries, as well as critical facts included in the confirmed case reports, varied substantially between cities. In general, the best performance in terms of timely reporting and the transparency of information disclosures were observed in the municipalities directly administered by the central government compared to the other cities. Conclusions: Timely and effective efforts to disclose information related to the COVID-19 epidemic have been made in major cities in China. Continued improvements to local authority reporting will contribute to more effective public communication and efficient public health research responses. The development of protocols and the standardization of epidemic message templates—as well as the use of uniform operating procedures to provide regular information updates—should be prioritized to ensure a coordinated national response. %M 32790640 %R 10.2196/19572 %U http://www.jmir.org/2020/8/e19572/ %U https://doi.org/10.2196/19572 %U http://www.ncbi.nlm.nih.gov/pubmed/32790640 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 22 %N 7 %P e18087 %T Bringing Code to Data: Do Not Forget Governance %A Suver,Christine %A Thorogood,Adrian %A Doerr,Megan %A Wilbanks,John %A Knoppers,Bartha %+ Sage Bionetworks, 2901 Third Avenue, Suite 330, Seattle, WA, 98121, United States, 1 206 928 8242, cfsuver@gmail.com %K data management %K privacy %K ethics, research %K data science %K machine learning %D 2020 %7 28.7.2020 %9 Viewpoint %J J Med Internet Res %G English %X Developing or independently evaluating algorithms in biomedical research is difficult because of restrictions on access to clinical data. Access is restricted because of privacy concerns, the proprietary treatment of data by institutions (fueled in part by the cost of data hosting, curation, and distribution), concerns over misuse, and the complexities of applicable regulatory frameworks. The use of cloud technology and services can address many of the barriers to data sharing. For example, researchers can access data in high performance, secure, and auditable cloud computing environments without the need for copying or downloading. An alternative path to accessing data sets requiring additional protection is the model-to-data approach. In model-to-data, researchers submit algorithms to run on secure data sets that remain hidden. Model-to-data is designed to enhance security and local control while enabling communities of researchers to generate new knowledge from sequestered data. Model-to-data has not yet been widely implemented, but pilots have demonstrated its utility when technical or legal constraints preclude other methods of sharing. We argue that model-to-data can make a valuable addition to our data sharing arsenal, with 2 caveats. First, model-to-data should only be adopted where necessary to supplement rather than replace existing data-sharing approaches given that it requires significant resource commitments from data stewards and limits scientific freedom, reproducibility, and scalability. Second, although model-to-data reduces concerns over data privacy and loss of local control when sharing clinical data, it is not an ethical panacea. Data stewards will remain hesitant to adopt model-to-data approaches without guidance on how to do so responsibly. To address this gap, we explored how commitments to open science, reproducibility, security, respect for data subjects, and research ethics oversight must be re-evaluated in a model-to-data context. %M 32540846 %R 10.2196/18087 %U http://www.jmir.org/2020/7/e18087/ %U https://doi.org/10.2196/18087 %U http://www.ncbi.nlm.nih.gov/pubmed/32540846 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 22 %N 7 %P e14591 %T Information Loss in Harmonizing Granular Race and Ethnicity Data: Descriptive Study of Standards %A Wang,Karen %A Grossetta Nardini,Holly %A Post,Lori %A Edwards,Todd %A Nunez-Smith,Marcella %A Brandt,Cynthia %+ Equity Research and Innovation Center, General Internal Medicine, Yale School of Medicine, 100 Church Street South, A200, New Haven, CT, 06520, United States, 1 203 785 5233, karen.wang@yale.edu %K continental population groups %K multiracial populations %K multiethnic groups %K data standards %K health status disparities %K race factors %K demography %D 2020 %7 20.7.2020 %9 Original Paper %J J Med Internet Res %G English %X Background: Data standards for race and ethnicity have significant implications for health equity research. Objective: We aim to describe a challenge encountered when working with a multiple–race and ethnicity assessment in the Eastern Caribbean Health Outcomes Research Network (ECHORN), a research collaborative of Barbados, Puerto Rico, Trinidad and Tobago, and the US Virgin Islands. Methods: We examined the data standards guiding harmonization of race and ethnicity data for multiracial and multiethnic populations, using the Office of Management and Budget (OMB) Statistical Policy Directive No. 15. Results: Of 1211 participants in the ECHORN cohort study, 901 (74.40%) selected 1 racial category. Of those that selected 1 category, 13.0% (117/901) selected Caribbean; 6.4% (58/901), Puerto Rican or Boricua; and 13.5% (122/901), the mixed or multiracial category. A total of 17.84% (216/1211) of participants selected 2 or more categories, with 15.19% (184/1211) selecting 2 categories and 2.64% (32/1211) selecting 3 or more categories. With aggregation of ECHORN data into OMB categories, 27.91% (338/1211) of the participants can be placed in the “more than one race” category. Conclusions: This analysis exposes the fundamental informatics challenges that current race and ethnicity data standards present to meaningful collection, organization, and dissemination of granular data about subgroup populations in diverse and marginalized communities. Current standards should reflect the science of measuring race and ethnicity and the need for multidisciplinary teams to improve evolving standards throughout the data life cycle. %M 32706693 %R 10.2196/14591 %U http://www.jmir.org/2020/7/e14591/ %U https://doi.org/10.2196/14591 %U http://www.ncbi.nlm.nih.gov/pubmed/32706693 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 8 %N 7 %P e17257 %T Accurate Prediction of Coronary Heart Disease for Patients With Hypertension From Electronic Health Records With Big Data and Machine-Learning Methods: Model Development and Performance Evaluation %A Du,Zhenzhen %A Yang,Yujie %A Zheng,Jing %A Li,Qi %A Lin,Denan %A Li,Ye %A Fan,Jianping %A Cheng,Wen %A Chen,Xie-Hui %A Cai,Yunpeng %+ Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, 1068 Xueyuan Blvd, Nanshan District, Shenzhen, China, 86 755 86392202, yp.cai@siat.ac.cn %K coronary heart disease %K machine learning %K electronic health records %K predictive algorithms %K hypertension %D 2020 %7 6.7.2020 %9 Original Paper %J JMIR Med Inform %G English %X Background: Predictions of cardiovascular disease risks based on health records have long attracted broad research interests. Despite extensive efforts, the prediction accuracy has remained unsatisfactory. This raises the question as to whether the data insufficiency, statistical and machine-learning methods, or intrinsic noise have hindered the performance of previous approaches, and how these issues can be alleviated. Objective: Based on a large population of patients with hypertension in Shenzhen, China, we aimed to establish a high-precision coronary heart disease (CHD) prediction model through big data and machine-learning Methods: Data from a large cohort of 42,676 patients with hypertension, including 20,156 patients with CHD onset, were investigated from electronic health records (EHRs) 1-3 years prior to CHD onset (for CHD-positive cases) or during a disease-free follow-up period of more than 3 years (for CHD-negative cases). The population was divided evenly into independent training and test datasets. Various machine-learning methods were adopted on the training set to achieve high-accuracy prediction models and the results were compared with traditional statistical methods and well-known risk scales. Comparison analyses were performed to investigate the effects of training sample size, factor sets, and modeling approaches on the prediction performance. Results: An ensemble method, XGBoost, achieved high accuracy in predicting 3-year CHD onset for the independent test dataset with an area under the receiver operating characteristic curve (AUC) value of 0.943. Comparison analysis showed that nonlinear models (K-nearest neighbor AUC 0.908, random forest AUC 0.938) outperform linear models (logistic regression AUC 0.865) on the same datasets, and machine-learning methods significantly surpassed traditional risk scales or fixed models (eg, Framingham cardiovascular disease risk models). Further analyses revealed that using time-dependent features obtained from multiple records, including both statistical variables and changing-trend variables, helped to improve the performance compared to using only static features. Subpopulation analysis showed that the impact of feature design had a more significant effect on model accuracy than the population size. Marginal effect analysis showed that both traditional and EHR factors exhibited highly nonlinear characteristics with respect to the risk scores. Conclusions: We demonstrated that accurate risk prediction of CHD from EHRs is possible given a sufficiently large population of training data. Sophisticated machine-learning methods played an important role in tackling the heterogeneity and nonlinear nature of disease prediction. Moreover, accumulated EHR data over multiple time points provided additional features that were valuable for risk prediction. Our study highlights the importance of accumulating big data from EHRs for accurate disease predictions. %M 32628616 %R 10.2196/17257 %U https://medinform.jmir.org/2020/7/e17257 %U https://doi.org/10.2196/17257 %U http://www.ncbi.nlm.nih.gov/pubmed/32628616 %0 Journal Article %@ 2369-2960 %I JMIR Publications %V 6 %N 2 %P e19170 %T A Snapshot of SARS-CoV-2 Genome Availability up to April 2020 and its Implications: Data Analysis %A Mavian,Carla %A Marini,Simone %A Prosperi,Mattia %A Salemi,Marco %+ Emerging Pathogens Institute, University of Florida, Mowry Rd 2055, Gainesville, FL, United States, 1 352 273 9567, salemi@pathology.ufl.edu %K covid-19 %K sars-cov-2 %K phylogenetics %K genome %K evolution %K genetics %K pandemic %K infectious disease %K virus %K sequence %K transmission %K tracing %K tracking %D 2020 %7 1.6.2020 %9 Original Paper %J JMIR Public Health Surveill %G English %X Background: The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic has been growing exponentially, affecting over 4 million people and causing enormous distress to economies and societies worldwide. A plethora of analyses based on viral sequences has already been published both in scientific journals and through non–peer-reviewed channels to investigate the genetic heterogeneity and spatiotemporal dissemination of SARS-CoV-2. However, a systematic investigation of phylogenetic information and sampling bias in the available data is lacking. Although the number of available genome sequences of SARS-CoV-2 is growing daily and the sequences show increasing phylogenetic information, country-specific data still present severe limitations and should be interpreted with caution. Objective: The objective of this study was to determine the quality of the currently available SARS-CoV-2 full genome data in terms of sampling bias as well as phylogenetic and temporal signals to inform and guide the scientific community. Methods: We used maximum likelihood–based methods to assess the presence of sufficient information for robust phylogenetic and phylogeographic studies in several SARS-CoV-2 sequence alignments assembled from GISAID (Global Initiative on Sharing All Influenza Data) data released between March and April 2020. Results: Although the number of high-quality full genomes is growing daily, and sequence data released in April 2020 contain sufficient phylogenetic information to allow reliable inference of phylogenetic relationships, country-specific SARS-CoV-2 data sets still present severe limitations. Conclusions: At the present time, studies assessing within-country spread or transmission clusters should be considered preliminary or hypothesis-generating at best. Hence, current reports should be interpreted with caution, and concerted efforts should continue to increase the number and quality of sequences required for robust tracing of the epidemic. %M 32412415 %R 10.2196/19170 %U http://publichealth.jmir.org/2020/2/e19170/ %U https://doi.org/10.2196/19170 %U http://www.ncbi.nlm.nih.gov/pubmed/32412415 %0 Journal Article %@ 2369-2960 %I JMIR Publications %V 6 %N 2 %P e19273 %T Tracking Social Media Discourse About the COVID-19 Pandemic: Development of a Public Coronavirus Twitter Data Set %A Chen,Emily %A Lerman,Kristina %A Ferrara,Emilio %+ Information Sciences Institute, University of Southern California, 4676 Admiralty Way, #1001, Marina del Rey, CA, 90292, United States, 1 310 448 8661, emiliofe@usc.edu %K COVID-19 %K SARS-CoV-2 %K social media %K network analysis %K computational social sciences %D 2020 %7 29.5.2020 %9 Original Paper %J JMIR Public Health Surveill %G English %X Background: At the time of this writing, the coronavirus disease (COVID-19) pandemic outbreak has already put tremendous strain on many countries' citizens, resources, and economies around the world. Social distancing measures, travel bans, self-quarantines, and business closures are changing the very fabric of societies worldwide. With people forced out of public spaces, much of the conversation about these phenomena now occurs online on social media platforms like Twitter. Objective: In this paper, we describe a multilingual COVID-19 Twitter data set that we are making available to the research community via our COVID-19-TweetIDs GitHub repository. Methods: We started this ongoing data collection on January 28, 2020, leveraging Twitter’s streaming application programming interface (API) and Tweepy to follow certain keywords and accounts that were trending at the time data collection began. We used Twitter’s search API to query for past tweets, resulting in the earliest tweets in our collection dating back to January 21, 2020. Results: Since the inception of our collection, we have actively maintained and updated our GitHub repository on a weekly basis. We have published over 123 million tweets, with over 60% of the tweets in English. This paper also presents basic statistics that show that Twitter activity responds and reacts to COVID-19-related events. Conclusions: It is our hope that our contribution will enable the study of online conversation dynamics in the context of a planetary-scale epidemic outbreak of unprecedented proportions and implications. This data set could also help track COVID-19-related misinformation and unverified rumors or enable the understanding of fear and panic—and undoubtedly more. %M 32427106 %R 10.2196/19273 %U http://publichealth.jmir.org/2020/2/e19273/ %U https://doi.org/10.2196/19273 %U http://www.ncbi.nlm.nih.gov/pubmed/32427106 %0 Journal Article %@ 2369-2960 %I JMIR Publications %V 6 %N 2 %P e18828 %T Predicting COVID-19 Incidence Through Analysis of Google Trends Data in Iran: Data Mining and Deep Learning Pilot Study %A Ayyoubzadeh,Seyed Mohammad %A Ayyoubzadeh,Seyed Mehdi %A Zahedi,Hoda %A Ahmadi,Mahnaz %A R Niakan Kalhori,Sharareh %+ Department of Health Information Management, School of Allied Medical Sciences, Tehran University of Medical Sciences, 3rd Floor, No 17, Farredanesh Alley, Ghods St, Enghelab Ave, Tehran, Iran, 98 21 88983025, niakan2@gmail.com %K coronavirus %K COVID-19 %K prediction %K incidence %K Google Trends %K linear regression %K LSTM %K pandemic %K outbreak %K public health %D 2020 %7 14.4.2020 %9 Original Paper %J JMIR Public Health Surveill %G English %X Background: The recent global outbreak of coronavirus disease (COVID-19) is affecting many countries worldwide. Iran is one of the top 10 most affected countries. Search engines provide useful data from populations, and these data might be useful to analyze epidemics. Utilizing data mining methods on electronic resources’ data might provide a better insight into the COVID-19 outbreak to manage the health crisis in each country and worldwide. Objective: This study aimed to predict the incidence of COVID-19 in Iran. Methods: Data were obtained from the Google Trends website. Linear regression and long short-term memory (LSTM) models were used to estimate the number of positive COVID-19 cases. All models were evaluated using 10-fold cross-validation, and root mean square error (RMSE) was used as the performance metric. Results: The linear regression model predicted the incidence with an RMSE of 7.562 (SD 6.492). The most effective factors besides previous day incidence included the search frequency of handwashing, hand sanitizer, and antiseptic topics. The RMSE of the LSTM model was 27.187 (SD 20.705). Conclusions: Data mining algorithms can be employed to predict trends of outbreaks. This prediction might support policymakers and health care managers to plan and allocate health care resources accordingly. %M 32234709 %R 10.2196/18828 %U http://publichealth.jmir.org/2020/2/e18828/ %U https://doi.org/10.2196/18828 %U http://www.ncbi.nlm.nih.gov/pubmed/32234709 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 22 %N 3 %P e15816 %T The Value of Data: Applying a Public Value Model to the English National Health Service %A Wilson,James %A Herron,Daniel %A Nachev,Parashkev %A McNally,Nick %A Williams,Bryan %A Rees,Geraint %+ Department of Philosophy, University College London, Gower Street, London, WC1E 6BT, United Kingdom, 44 207 679 0213, james.wilson@ucl.ac.uk %K health policy %K innovation %K public value %K intellectual property %K NHS Constitution %D 2020 %7 27.3.2020 %9 Proposal %J J Med Internet Res %G English %X Research and innovation in biomedicine and health care increasingly depend on electronic data. The emergence of data-driven technologies and associated digital transformations has focused attention on the value of such data. Despite the broad consensus of the value of health data, there is less consensus on the basis for that value; thus, the nature and extent of health data value remain unclear. Much of the existing literature presupposes that the value of data is to be understood primarily in financial terms, and assumes that a single financial value can be assigned. We here argue that the value of a dataset is instead relational; that is, the value depends on who wants to use it and for what purposes. Moreover, data are valued for both nonfinancial and financial reasons. Thus, it may be more accurate to discuss the values (plural) of a dataset rather than the singular value. This plurality of values opens up an important set of questions about how health data should be valued for the purposes of public policy. We argue that public value models provide a useful approach in this regard. According to public value theory, public value is created, or captured, to the extent that public sector institutions further their democratically established goals, and their impact on improving the lives of citizens. This article outlines how adopting such an approach might be operationalized within existing health care systems such as the English National Health Service, with particular focus on actionable conclusions. %M 32217501 %R 10.2196/15816 %U http://www.jmir.org/2020/3/e15816/ %U https://doi.org/10.2196/15816 %U http://www.ncbi.nlm.nih.gov/pubmed/32217501 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 8 %N 3 %P e16102 %T Insurance Customers’ Expectations for Sharing Health Data: Qualitative Survey Study %A Grundstrom,Casandra %A Korhonen,Olli %A Väyrynen,Karin %A Isomursu,Minna %+ Faculty of Information Technology and Electrical Engineering, University of Oulu, Pentti Kaiteran katu 1, PO Box 4500, Oulu, FI-90014, Finland, 358 0503536025, casandra.grundstrom@gmail.com %K data sharing %K qualitative research %K survey %K health insurance %K insurance %K medical informatics %K health services %D 2020 %7 26.3.2020 %9 Original Paper %J JMIR Med Inform %G English %X Background: Insurance organizations are essential stakeholders in health care ecosystems. For addressing future health care needs, insurance companies require access to health data to deliver preventative and proactive digital health services to customers. However, extant research is limited in examining the conditions that incentivize health data sharing. Objective: This study aimed to (1) identify the expectations of insurance customers when sharing health data, (2) determine the perceived intrinsic value of health data, and (3) explore the conditions that aid in incentivizing health data sharing in the relationship between an insurance organization and its customer. Methods: A Web-based survey was distributed to randomly selected customers from a Finnish insurance organization through email. A single open-text answer was used for a qualitative data analysis through inductive coding, followed by a thematic analysis. Furthermore, the 4 constructs of commitment, power, reciprocity, and trust from the social exchange theory (SET) were applied as a framework. Results: From the 5000 customers invited to participate, we received 452 surveys (response rate: 9.0%). Customer characteristics were found to reflect customer demographics. Of the 452 surveys, 48 (10.6%) open-text responses were skipped by the customer, 57 (12.6%) customers had no expectations from sharing health data, and 44 (9.7%) customers preferred to abstain from a data sharing relationship. Using the SET framework, we found that customers expected different conditions to be fulfilled by their insurance provider based on the commitment, power, reciprocity, and trust constructs. Of the 452 customers who completed the surveys, 64 (14.2%) customers required that the insurance organization meets their data treatment expectations (commitment). Overall, 4.9% (22/452) of customers were concerned about their health data being used against them to profile their health, to increase insurance prices, or to deny health insurance claims (power). A total of 28.5% (129/452) of customers expected some form of benefit, such as personalized digital health services, and 29.9% (135/452) of customers expected finance-related compensation (reciprocity). Furthermore, 7.5% (34/452) of customers expected some form of empathy from the insurance organization through enhanced transparency or an emotional connection (trust). Conclusions: To aid in the design and development of digital health services, insurance organizations need to address the customers’ expectations when sharing their health data. We established the expectations of customers in the social exchange of health data and explored the perceived values of data as intangible goods. Actions by the insurance organization should aim to increase trust through a culture of transparency, commitment to treat health data in a prescribed manner, provide reciprocal benefits through digital health services that customers deem valuable, and assuage fears of health data being used to prevent providing insurance coverage or increase costs. %M 32213467 %R 10.2196/16102 %U http://medinform.jmir.org/2020/3/e16102/ %U https://doi.org/10.2196/16102 %U http://www.ncbi.nlm.nih.gov/pubmed/32213467 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 8 %N 2 %P e14777 %T Optimizing Antihypertensive Medication Classification in Electronic Health Record-Based Data: Classification System Development and Methodological Comparison %A McDonough,Caitrin W %A Smith,Steven M %A Cooper-DeHoff,Rhonda M %A Hogan,William R %+ Department of Pharmacotherapy and Translational Research, College of Pharmacy, University of Florida, PO Box 100486, Gainesville, FL, , United States, 1 3522736435, cmcdonough@cop.ufl.edu %K antihypertensive agents %K electronic health records %K classification %K RxNorm %K phenotype %D 2020 %7 27.2.2020 %9 Original Paper %J JMIR Med Inform %G English %X Background: Computable phenotypes have the ability to utilize data within the electronic health record (EHR) to identify patients with certain characteristics. Many computable phenotypes rely on multiple types of data within the EHR including prescription drug information. Hypertension (HTN)-related computable phenotypes are particularly dependent on the correct classification of antihypertensive prescription drug information, as well as corresponding diagnoses and blood pressure information. Objective: This study aimed to create an antihypertensive drug classification system to be utilized with EHR-based data as part of HTN-related computable phenotypes. Methods: We compared 4 different antihypertensive drug classification systems based off of 4 different methodologies and terminologies, including 3 RxNorm Concept Unique Identifier (RxCUI)–based classifications and 1 medication name–based classification. The RxCUI-based classifications utilized data from (1) the Drug Ontology, (2) the new Medication Reference Terminology, and (3) the Anatomical Therapeutic Chemical Classification System and DrugBank, whereas the medication name–based classification relied on antihypertensive drug names. Each classification system was applied to EHR-based prescription drug data from hypertensive patients in the OneFlorida Data Trust. Results: There were 13,627 unique RxCUIs and 8025 unique medication names from the 13,879,046 prescriptions. We observed a broad overlap between the 4 methods, with 84.1% (691/822) to 95.3% (695/729) of terms overlapping pairwise between the different classification methods. Key differences arose from drug products with multiple dosage forms, drug products with an indication of benign prostatic hyperplasia, drug products that contain more than 1 ingredient (combination products), and terms within the classification systems corresponding to retired or obsolete RxCUIs. Conclusions: In total, 2 antihypertensive drug classifications were constructed, one based on RxCUIs and one based on medication name, that can be used in future computable phenotypes that require antihypertensive drug classifications. %M 32130152 %R 10.2196/14777 %U http://medinform.jmir.org/2020/2/e14777/ %U https://doi.org/10.2196/14777 %U http://www.ncbi.nlm.nih.gov/pubmed/32130152 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 8 %N 2 %P e16153 %T Explanatory Model of Dry Eye Disease Using Health and Nutrition Examinations: Machine Learning and Network-Based Factor Analysis From a National Survey %A Nam,Sang Min %A Peterson,Thomas A %A Butte,Atul J %A Seo,Kyoung Yul %A Han,Hyun Wook %+ Department of Biomedical Informatics, CHA University School of Medicine, CHA University, 335 Pangyo-ro, Seongnam, 13488, Republic of Korea, 82 318817109, hwhan@chamc.co.kr %K dry eye disease %K epidemiology %K machine learning %K systems analysis %K patient-specific modeling %D 2020 %7 20.2.2020 %9 Original Paper %J JMIR Med Inform %G English %X Background: Dry eye disease (DED) is a complex disease of the ocular surface, and its associated factors are important for understanding and effectively treating DED. Objective: This study aimed to provide an integrative and personalized model of DED by making an explanatory model of DED using as many factors as possible from the Korea National Health and Nutrition Examination Survey (KNHANES) data. Methods: Using KNHANES data for 2012 (4391 sample cases), a point-based scoring system was created for ranking factors associated with DED and assessing patient-specific DED risk. First, decision trees and lasso were used to classify continuous factors and to select important factors, respectively. Next, a survey-weighted multiple logistic regression was trained using these factors, and points were assigned using the regression coefficients. Finally, network graphs of partial correlations between factors were utilized to study the interrelatedness of DED-associated factors. Results: The point-based model achieved an area under the curve of 0.70 (95% CI 0.61-0.78), and 13 of 78 factors considered were chosen. Important factors included sex (+9 points for women), corneal refractive surgery (+9 points), current depression (+7 points), cataract surgery (+7 points), stress (+6 points), age (54-66 years; +4 points), rhinitis (+4 points), lipid-lowering medication (+4 points), and intake of omega-3 (0.43%-0.65% kcal/day; −4 points). Among these, the age group 54 to 66 years had high centrality in the network, whereas omega-3 had low centrality. Conclusions: Integrative understanding of DED was possible using the machine learning–based model and network-based factor analysis. This method for finding important risk factors and identifying patient-specific risk could be applied to other multifactorial diseases. %M 32130150 %R 10.2196/16153 %U http://medinform.jmir.org/2020/2/e16153/ %U https://doi.org/10.2196/16153 %U http://www.ncbi.nlm.nih.gov/pubmed/32130150 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 8 %N 2 %P e13046 %T Evaluation of Privacy Risks of Patients’ Data in China: Case Study %A Gong,Mengchun %A Wang,Shuang %A Wang,Lezi %A Liu,Chao %A Wang,Jianyang %A Guo,Qiang %A Zheng,Hao %A Xie,Kang %A Wang,Chenghong %A Hui,Zhouguang %+ Department of VIP Medical Services, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Panjiayuan Nanli #17, Chaoyang District, Beijing, 100021, China, 86 010 87787656, huizg@cicams.ac.cn %K patient privacy %K privacy risk %K Chinese patients’ data %K data sharing %K re-identification %D 2020 %7 5.2.2020 %9 Original Paper %J JMIR Med Inform %G English %X Background: Patient privacy is a ubiquitous problem around the world. Many existing studies have demonstrated the potential privacy risks associated with sharing of biomedical data. Owing to the increasing need for data sharing and analysis, health care data privacy is drawing more attention. However, to better protect biomedical data privacy, it is essential to assess the privacy risk in the first place. Objective: In China, there is no clear regulation for health systems to deidentify data. It is also not known whether a mechanism such as the Health Insurance Portability and Accountability Act (HIPAA) safe harbor policy will achieve sufficient protection. This study aimed to conduct a pilot study using patient data from Chinese hospitals to understand and quantify the privacy risks of Chinese patients. Methods: We used g-distinct analysis to evaluate the reidentification risks with regard to the HIPAA safe harbor approach when applied to Chinese patients’ data. More specifically, we estimated the risks based on the HIPAA safe harbor and limited dataset policies by assuming an attacker has background knowledge of the patient from the public domain. Results: The experiments were conducted on 0.83 million patients (with data field of date of birth, gender, and surrogate ZIP codes generated based on home address) across 33 provincial-level administrative divisions in China. Under the Limited Dataset policy, 19.58% (163,262/833,235) of the population could be uniquely identifiable under the g-distinct metric (ie, 1-distinct). In contrast, the Safe Harbor policy is able to significantly reduce privacy risk, where only 0.072% (601/833,235) of individuals are uniquely identifiable, and the majority of the population is 3000 indistinguishable (ie the population is expected to share common attributes with 3000 or less people). Conclusions: Through the experiments based on real-world patient data, this work illustrates that the results of g-distinct analysis about Chinese patient privacy risk are similar to those from a previous US study, in which data from different organizations/regions might be vulnerable to different reidentification risks under different policies. This work provides reference to Chinese health care entities for estimating patients’ privacy risk during data sharing, which laid the foundation of privacy risk study about Chinese patients’ data in the future. %M 32022691 %R 10.2196/13046 %U https://medinform.jmir.org/2020/2/e13046 %U https://doi.org/10.2196/13046 %U http://www.ncbi.nlm.nih.gov/pubmed/32022691 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 22 %N 1 %P e16816 %T Systematic Evaluation of Research Progress on Natural Language Processing in Medicine Over the Past 20 Years: Bibliometric Study on PubMed %A Wang,Jing %A Deng,Huan %A Liu,Bangtao %A Hu,Anbin %A Liang,Jun %A Fan,Lingye %A Zheng,Xu %A Wang,Tong %A Lei,Jianbo %+ Institute of Medical Technology, Health Science Center, Peking University, 38 Xueyuan Rd, Haidian District, Beijing, China, 86 8280 5901, jblei@hsc.pku.edu.cn %K natural language processing %K clinical %K medicine %K information extraction %K electronic medical record %D 2020 %7 23.1.2020 %9 Review %J J Med Internet Res %G English %X Background: Natural language processing (NLP) is an important traditional field in computer science, but its application in medical research has faced many challenges. With the extensive digitalization of medical information globally and increasing importance of understanding and mining big data in the medical field, NLP is becoming more crucial. Objective: The goal of the research was to perform a systematic review on the use of NLP in medical research with the aim of understanding the global progress on NLP research outcomes, content, methods, and study groups involved. Methods: A systematic review was conducted using the PubMed database as a search platform. All published studies on the application of NLP in medicine (except biomedicine) during the 20 years between 1999 and 2018 were retrieved. The data obtained from these published studies were cleaned and structured. Excel (Microsoft Corp) and VOSviewer (Nees Jan van Eck and Ludo Waltman) were used to perform bibliometric analysis of publication trends, author orders, countries, institutions, collaboration relationships, research hot spots, diseases studied, and research methods. Results: A total of 3498 articles were obtained during initial screening, and 2336 articles were found to meet the study criteria after manual screening. The number of publications increased every year, with a significant growth after 2012 (number of publications ranged from 148 to a maximum of 302 annually). The United States has occupied the leading position since the inception of the field, with the largest number of articles published. The United States contributed to 63.01% (1472/2336) of all publications, followed by France (5.44%, 127/2336) and the United Kingdom (3.51%, 82/2336). The author with the largest number of articles published was Hongfang Liu (70), while Stéphane Meystre (17) and Hua Xu (33) published the largest number of articles as the first and corresponding authors. Among the first author’s affiliation institution, Columbia University published the largest number of articles, accounting for 4.54% (106/2336) of the total. Specifically, approximately one-fifth (17.68%, 413/2336) of the articles involved research on specific diseases, and the subject areas primarily focused on mental illness (16.46%, 68/413), breast cancer (5.81%, 24/413), and pneumonia (4.12%, 17/413). Conclusions: NLP is in a period of robust development in the medical field, with an average of approximately 100 publications annually. Electronic medical records were the most used research materials, but social media such as Twitter have become important research materials since 2015. Cancer (24.94%, 103/413) was the most common subject area in NLP-assisted medical research on diseases, with breast cancers (23.30%, 24/103) and lung cancers (14.56%, 15/103) accounting for the highest proportions of studies. Columbia University and the talents trained therein were the most active and prolific research forces on NLP in the medical field. %M 32012074 %R 10.2196/16816 %U http://www.jmir.org/2020/1/e16816/ %U https://doi.org/10.2196/16816 %U http://www.ncbi.nlm.nih.gov/pubmed/32012074 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 7 %N 4 %P e14340 %T Automatic Detection of Hypoglycemic Events From the Electronic Health Record Notes of Diabetes Patients: Empirical Study %A Jin,Yonghao %A Li,Fei %A Vimalananda,Varsha G %A Yu,Hong %+ Department of Computer Science, University of Massachusetts Lowell, 220 Pawtucket St, Lowell, MA, 01854, United States, 1 9789343620, Hong_Yu@uml.edu %K natural language processing %K convolutional neural networks %K hypoglycemia %K adverse events %D 2019 %7 8.11.2019 %9 Original Paper %J JMIR Med Inform %G English %X Background: Hypoglycemic events are common and potentially dangerous conditions among patients being treated for diabetes. Automatic detection of such events could improve patient care and is valuable in population studies. Electronic health records (EHRs) are valuable resources for the detection of such events. Objective: In this study, we aim to develop a deep-learning–based natural language processing (NLP) system to automatically detect hypoglycemic events from EHR notes. Our model is called the High-Performing System for Automatically Detecting Hypoglycemic Events (HYPE). Methods: Domain experts reviewed 500 EHR notes of diabetes patients to determine whether each sentence contained a hypoglycemic event or not. We used this annotated corpus to train and evaluate HYPE, the high-performance NLP system for hypoglycemia detection. We built and evaluated both a classical machine learning model (ie, support vector machines [SVMs]) and state-of-the-art neural network models. Results: We found that neural network models outperformed the SVM model. The convolutional neural network (CNN) model yielded the highest performance in a 10-fold cross-validation setting: mean precision=0.96 (SD 0.03), mean recall=0.86 (SD 0.03), and mean F1=0.91 (SD 0.03). Conclusions: Despite the challenges posed by small and highly imbalanced data, our CNN-based HYPE system still achieved a high performance for hypoglycemia detection. HYPE can be used for EHR-based hypoglycemia surveillance and population studies in diabetes patients. %M 31702562 %R 10.2196/14340 %U http://medinform.jmir.org/2019/4/e14340/ %U https://doi.org/10.2196/14340 %U http://www.ncbi.nlm.nih.gov/pubmed/31702562 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 21 %N 11 %P e15511 %T Modeling Research Topics for Artificial Intelligence Applications in Medicine: Latent Dirichlet Allocation Application Study %A Tran,Bach Xuan %A Nghiem,Son %A Sahin,Oz %A Vu,Tuan Manh %A Ha,Giang Hai %A Vu,Giang Thu %A Pham,Hai Quang %A Do,Hoa Thi %A Latkin,Carl A %A Tam,Wilson %A Ho,Cyrus S H %A Ho,Roger C M %+ Institute for Preventive Medicine and Public Health, Hanoi Medical University, No 1 Ton That Tung Street, Hanoi, 100000, Vietnam, 84 98 222 8662, bach.ipmph@gmail.com %K artificial intelligence %K applications %K medicine %K scientometric %K bibliometric %K latent Dirichlet allocation %D 2019 %7 1.11.2019 %9 Original Paper %J J Med Internet Res %G English %X Background: Artificial intelligence (AI)–based technologies develop rapidly and have myriad applications in medicine and health care. However, there is a lack of comprehensive reporting on the productivity, workflow, topics, and research landscape of AI in this field. Objective: This study aimed to evaluate the global development of scientific publications and constructed interdisciplinary research topics on the theory and practice of AI in medicine from 1977 to 2018. Methods: We obtained bibliographic data and abstract contents of publications published between 1977 and 2018 from the Web of Science database. A total of 27,451 eligible articles were analyzed. Research topics were classified by latent Dirichlet allocation, and principal component analysis was used to identify the construct of the research landscape. Results: The applications of AI have mainly impacted clinical settings (enhanced prognosis and diagnosis, robot-assisted surgery, and rehabilitation), data science and precision medicine (collecting individual data for precision medicine), and policy making (raising ethical and legal issues, especially regarding privacy and confidentiality of data). However, AI applications have not been commonly used in resource-poor settings due to the limit in infrastructure and human resources. Conclusions: The application of AI in medicine has grown rapidly and focuses on three leading platforms: clinical practices, clinical material, and policies. AI might be one of the methods to narrow down the inequality in health care and medicine between developing and developed countries. Technology transfer and support from developed countries are essential measures for the advancement of AI application in health care in developing countries. %M 31682577 %R 10.2196/15511 %U https://www.jmir.org/2019/11/e15511 %U https://doi.org/10.2196/15511 %U http://www.ncbi.nlm.nih.gov/pubmed/31682577 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 7 %N 3 %P e14083 %T Developing a Standardization Algorithm for Categorical Laboratory Tests for Clinical Big Data Research: Retrospective Study %A Kim,Mina %A Shin,Soo-Yong %A Kang,Mira %A Yi,Byoung-Kee %A Chang,Dong Kyung %+ Department of Digital Health, Samsung Advanced Institute for Health Sciences & Technology, Sungkyunkwan University, Samsung Medical Center, Seoul, 2066 16419, Republic of Korea, 82 10 9933 0266, do.chang@samsung.com %K standardization %K electronic health records %K data quality %K data science %D 2019 %7 29.08.2019 %9 Original Paper %J JMIR Med Inform %G English %X Background: Data standardization is essential in electronic health records (EHRs) for both clinical practice and retrospective research. However, it is still not easy to standardize EHR data because of nonidentical duplicates, typographical errors, or inconsistencies. To overcome this drawback, standardization efforts have been undertaken for collecting data in a standardized format as well as for curating the stored data in EHRs. To perform clinical big data research, the stored data in EHR should be standardized, starting from laboratory results, given their importance. However, most of the previous efforts have been based on labor-intensive manual methods. Objective: We aimed to develop an automatic standardization method for eliminating the noises of categorical laboratory data, grouping, and mapping of cleaned data using standard terminology. Methods: We developed a method called standardization algorithm for laboratory test–categorical result (SALT-C) that can process categorical laboratory data, such as pos +, 250 4+ (urinalysis results), and reddish (urinalysis color results). SALT-C consists of five steps. First, it applies data cleaning rules to categorical laboratory data. Second, it categorizes the cleaned data into 5 predefined groups (urine color, urine dipstick, blood type, presence-finding, and pathogenesis tests). Third, all data in each group are vectorized. Fourth, similarity is calculated between the vectors of data and those of each value in the predefined value sets. Finally, the value closest to the data is assigned. Results: The performance of SALT-C was validated using 59,213,696 data points (167,938 unique values) generated over 23 years from a tertiary hospital. Apart from the data whose original meaning could not be interpreted correctly (eg, ** and _^), SALT-C mapped unique raw data to the correct reference value for each group with accuracy of 97.6% (123/126; urine color tests), 97.5% (198/203; (urine dipstick tests), 95% (53/56; blood type tests), 99.68% (162,291/162,805; presence-finding tests), and 99.61% (4643/4661; pathogenesis tests). Conclusions: The proposed SALT-C successfully standardized the categorical laboratory test results with high reliability. SALT-C can be beneficial for clinical big data research by reducing laborious manual standardization efforts. %M 31469075 %R 10.2196/14083 %U http://medinform.jmir.org/2019/3/e14083/ %U https://doi.org/10.2196/14083 %U http://www.ncbi.nlm.nih.gov/pubmed/31469075 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 21 %N 8 %P e14126 %T Why Do Data Users Say Health Care Data Are Difficult to Use? A Cross-Sectional Survey Study %A Kim,Ho Heon %A Kim,Bora %A Joo,Segyeong %A Shin,Soo-Yong %A Cha,Hyo Soung %A Park,Yu Rang %+ Department of Biomedical Systems Informatics, Yonsei University College of Medicine, 50-1 Yonsei-ro, Seodaemun-gu, Seoul, 03722, Republic of Korea, 82 10 5240 3434, yurangpark@yuhs.ac %K data anonymization %K privacy act %K data sharing %K data protection %K data linking %K health care data demand %D 2019 %7 06.08.2019 %9 Original Paper %J J Med Internet Res %G English %X Background: There has been significant effort in attempting to use health care data. However, laws that protect patients’ privacy have restricted data use because health care data contain sensitive information. Thus, discussions on privacy laws now focus on the active use of health care data beyond protection. However, current literature does not clarify the obstacles that make data usage and deidentification processes difficult or elaborate on users’ needs for data linking from practical perspectives. Objective: The objective of this study is to investigate (1) the current status of data use in each medical area, (2) institutional efforts and difficulties in deidentification processes, and (3) users’ data linking needs. Methods: We conducted a cross-sectional online survey. To recruit people who have used health care data, we publicized the promotion campaign and sent official documents to an academic society encouraging participation in the online survey. Results: In total, 128 participants responded to the online survey; 10 participants were excluded for either inconsistent responses or lack of demand for health care data. Finally, 118 participants’ responses were analyzed. The majority of participants worked in general hospitals or universities (62/118, 52.5% and 51/118, 43.2%, respectively, multiple-choice answers). More than half of participants responded that they have a need for clinical data (82/118, 69.5%) and public data (76/118, 64.4%). Furthermore, 85.6% (101/118) of respondents conducted deidentification measures when using data, and they considered rigid social culture as an obstacle for deidentification (28/101, 27.7%). In addition, they required data linking (98/118, 83.1%), and they noted deregulation and data standardization to allow access to health care data linking (33/98, 33.7% and 38/98, 38.8%, respectively). There were no significant differences in the proportion of responded data needs and linking in groups that used health care data for either public purposes or commercial purposes. Conclusions: This study provides a cross-sectional view from a practical, user-oriented perspective on the kinds of data users want to utilize, efforts and difficulties in deidentification processes, and the needs for data linking. Most users want to use clinical and public data, and most participants conduct deidentification processes and express a desire to conduct data linking. Our study confirmed that they noted regulation as a primary obstacle whether their purpose is commercial or public. A legal system based on both data utilization and data protection needs is required. %M 31389335 %R 10.2196/14126 %U https://www.jmir.org/2019/8/e14126/ %U https://doi.org/10.2196/14126 %U http://www.ncbi.nlm.nih.gov/pubmed/31389335 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 21 %N 7 %P e11672 %T Data Work: Meaning-Making in the Era of Data-Rich Medicine %A Fiske,Amelia %A Prainsack,Barbara %A Buyx,Alena %+ Institute for History and Ethics of Medicine, Technical University of Munich School of Medicine, Technical University of Munich, Ismaninger Straße 22, Munich, 81675, Germany, 49 8941404041, a.fiske@tum.de %K big data %K data work %K medical informatics %K internet %K data interpretation %K decision support systems %D 2019 %7 09.07.2019 %9 Viewpoint %J J Med Internet Res %G English %X In the era of data-rich medicine, an increasing number of domains of people’s lives are datafied and rendered usable for health care purposes. Yet, deriving insights for clinical practice and individual life choices and deciding what data or information should be used for this purpose pose difficult challenges that require tremendous time, resources, and skill. Thus, big data not only promises new clinical insights but also generates new—and heretofore largely unarticulated—forms of work for patients, families, and health care providers alike. Building on science studies, medical informatics, Anselm Strauss and colleagues’ concept of patient work, and subsequent elaborations of articulation work, in this article, we analyze the forms of work engendered by the need to make data and information actionable for the treatment decisions and lives of individual patients. We outline three areas of data work, which we characterize as the work of supporting digital data practices, the work of interpretation and contextualization, and the work of inclusion and interaction. This is a first step toward naming and making visible these forms of work in order that they can be adequately seen, rewarded, and assessed in the future. We argue that making data work visible is also necessary to ensure that the insights of big and diverse datasets can be applied in meaningful and equitable ways for better health care. %M 31290397 %R 10.2196/11672 %U https://www.jmir.org/2019/7/e11672/ %U https://doi.org/10.2196/11672 %U http://www.ncbi.nlm.nih.gov/pubmed/31290397 %0 Journal Article %@ 2291-9694 %I JMIR Publications %V 7 %N 2 %P e12702 %T Privacy-Preserving Analysis of Distributed Biomedical Data: Designing Efficient and Secure Multiparty Computations Using Distributed Statistical Learning Theory %A Dankar,Fida K %A Madathil,Nisha %A Dankar,Samar K %A Boughorbel,Sabri %+ United Arab Emirates University, College of IT, Al Ain, Abu Dhabi, 15551, United Arab Emirates, 971 37673333 ext 5569, fida.dankar@uaeu.ac.ae %K data analytics %K data aggregation %K personal genetic information %K patient data privacy %D 2019 %7 29.04.2019 %9 Original Paper %J JMIR Med Inform %G English %X Background: Biomedical research often requires large cohorts and necessitates the sharing of biomedical data with researchers around the world, which raises many privacy, ethical, and legal concerns. In the face of these concerns, privacy experts are trying to explore approaches to analyzing the distributed data while protecting its privacy. Many of these approaches are based on secure multiparty computations (SMCs). SMC is an attractive approach allowing multiple parties to collectively carry out calculations on their datasets without having to reveal their own raw data; however, it incurs heavy computation time and requires extensive communication between the involved parties. Objective: This study aimed to develop usable and efficient SMC applications that meet the needs of the potential end-users and to raise general awareness about SMC as a tool that supports data sharing. Methods: We have introduced distributed statistical computing (DSC) into the design of secure multiparty protocols, which allows us to conduct computations on each of the parties’ sites independently and then combine these computations to form 1 estimator for the collective dataset, thus limiting communication to the final step and reducing complexity. The effectiveness of our privacy-preserving model is demonstrated through a linear regression application. Results: Our secure linear regression algorithm was tested for accuracy and performance using real and synthetic datasets. The results showed no loss of accuracy (over nonsecure regression) and very good performance (20 min for 100 million records). Conclusions: We used DSC to securely calculate a linear regression model over multiple datasets. Our experiments showed very good performance (in terms of the number of records it can handle). We plan to extend our method to other estimators such as logistic regression. %M 31033449 %R 10.2196/12702 %U http://medinform.jmir.org/2019/2/e12702/ %U https://doi.org/10.2196/12702 %U http://www.ncbi.nlm.nih.gov/pubmed/31033449 %0 Journal Article %@ 1438-8871 %I JMIR Publications %V 21 %N 4 %P e13043 %T Health Care and Precision Medicine Research: Analysis of a Scalable Data Science Platform %A McPadden,Jacob %A Durant,Thomas JS %A Bunch,Dustin R %A Coppi,Andreas %A Price,Nathaniel %A Rodgerson,Kris %A Torre Jr,Charles J %A Byron,William %A Hsiao,Allen L %A Krumholz,Harlan M %A Schulz,Wade L %+ Department of Laboratory Medicine, Yale University School of Medicine, 55 Park Street PS345D, New Haven, CT, 06511, United States, 1 (203) 819 8609, wade.schulz@yale.edu %K data science %K monitoring, physiologic %K computational health care %K medical informatics computing %K big data %D 2019 %7 09.04.2019 %9 Original Paper %J J Med Internet Res %G English %X Background: Health care data are increasing in volume and complexity. Storing and analyzing these data to implement precision medicine initiatives and data-driven research has exceeded the capabilities of traditional computer systems. Modern big data platforms must be adapted to the specific demands of health care and designed for scalability and growth. Objective: The objectives of our study were to (1) demonstrate the implementation of a data science platform built on open source technology within a large, academic health care system and (2) describe 2 computational health care applications built on such a platform. Methods: We deployed a data science platform based on several open source technologies to support real-time, big data workloads. We developed data-acquisition workflows for Apache Storm and NiFi in Java and Python to capture patient monitoring and laboratory data for downstream analytics. Results: Emerging data management approaches, along with open source technologies such as Hadoop, can be used to create integrated data lakes to store large, real-time datasets. This infrastructure also provides a robust analytics platform where health care and biomedical research data can be analyzed in near real time for precision medicine and computational health care use cases. Conclusions: The implementation and use of integrated data science platforms offer organizations the opportunity to combine traditional datasets, including data from the electronic health record, with emerging big data sources, such as continuous patient monitoring and real-time laboratory results. These platforms can enable cost-effective and scalable analytics for the information that will be key to the delivery of precision medicine initiatives. Organizations that can take advantage of the technical advances found in data science platforms will have the opportunity to provide comprehensive access to health care data for computational health care and precision medicine research. %M 30964441 %R 10.2196/13043 %U https://www.jmir.org/2019/4/e13043/ %U https://doi.org/10.2196/13043 %U http://www.ncbi.nlm.nih.gov/pubmed/30964441