Review
Abstract
Background: Artificial intelligence (AI) applied to real-world data (RWD; eg, electronic health care records) has been identified as a potentially promising technical paradigm for the pharmacovigilance field. There are several instances of AI approaches applied to RWD; however, most studies focus on unstructured RWD (conducting natural language processing on various data sources, eg, clinical notes, social media, and blogs). Hence, it is essential to investigate how AI is currently applied to structured RWD in pharmacovigilance and how new approaches could enrich the existing methodology.
Objective: This scoping review depicts the emerging use of AI on structured RWD for pharmacovigilance purposes to identify relevant trends and potential research gaps.
Methods: The scoping review methodology is based on the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) methodology. We queried the MEDLINE database through the PubMed search engine. Relevant scientific manuscripts published from January 2010 to January 2024 were retrieved. The included studies were “mapped” against a set of evaluation criteria, including applied AI approaches, code availability, description of the data preprocessing pipeline, clinical validation of AI models, and implementation of trustworthy AI criteria following the guidelines of the FUTURE (Fairness, Universality, Traceability, Usability, Robustness, and Explainability)-AI initiative.
Results: The scoping review ultimately yielded 36 studies. There has been a significant increase in relevant studies after 2019. Most of the articles focused on adverse drug reaction detection procedures (23/36, 64%) for specific adverse effects. Furthermore, a substantial number of studies (34/36, 94%) used nonsymbolic AI approaches, emphasizing classification tasks. Random forest was the most popular machine learning approach identified in this review (17/36, 47%). The most common RWD sources used were electronic health care records (28/36, 78%). Typically, these data were not available in a widely acknowledged data model to facilitate interoperability, and they came from proprietary databases, limiting their availability for reproducing results. On the basis of the evaluation criteria classification, 10% (4/36) of the studies published their code in public registries, 16% (6/36) tested their AI models in clinical environments, and 36% (13/36) provided information about the data preprocessing pipeline. In addition, in terms of trustworthy AI, 89% (32/36) of the studies followed at least half of the trustworthy AI initiative guidelines. Finally, selection and confounding biases were the most common biases in the included studies.
Conclusions: AI, along with structured RWD, constitutes a promising line of work for drug safety and pharmacovigilance. However, in terms of AI, some approaches have not been examined extensively in this field (such as explainable AI and causal AI). Moreover, it would be helpful to have a data preprocessing protocol for RWD to support pharmacovigilance processes. Finally, because of personal data sensitivity, evaluation procedures have to be investigated further.
doi:10.2196/57824
Keywords
Introduction
Background
Pharmacovigilance is defined by the World Health Organization as “the science and activities relating to the detection, assessment, understanding, and prevention of adverse effects or any other drug-related problem” [
]. Pharmacovigilance plays a crucial role in ensuring the safety of medications and protecting the health of patients because it mostly focuses on the identification of potential adverse drug reactions (ADRs) after medicinal products have been licensed and released to the public.ADRs can range from mild and tolerable side effects to severe and life-threatening events. They constitute 5% to 7% of emergency department consultations [
]. Their impact in terms of public health is significant because there are estimates concluding that ADRs can cause an increase in the duration of hospitalization stays for outpatient (mean 9.2, SD 0.2 d) and inpatient (mean 6.1, SD 2.3 d) settings [ ]. Typically, pharmacovigilance professionals analyze data from individual case safety report (ICSR) databases (such as the Food and Drug Administration Adverse Event Reporting System, the database maintained by the US Food and Drug Administration) to identify potential pharmacovigilance signals, namely potential causal relationships between an ADR and a drug. ICSRs are typically submitted either by patients or by health care or pharmacovigilance professionals, and they are the main data source used today for pharmacovigilance. However, ICSR databases are subject to many biases; in addition, underreporting has been identified as a huge issue [ ]. Moreover, such databases frequently lack information that could make a significant difference in the examination of a potential signal (eg, patients’ medical history). Hence, the early detection of potential pharmacovigilance signals by collecting and analyzing data from various sources is critical to prevent serious side effects as soon as possible.The term “real-world data” (RWD) refers to data collected outside of the controlled environment of clinical trials, such as electronic health records (EHRs), patient registries, insurance claims databases, electronic prescription systems, and so on. There is a growing interest in using RWD for pharmacovigilance signal management to facilitate faster and more efficient postmarketing surveillance [
]. The significance of RWD in pharmacovigilance lies in its potential for representing longitudinal real-world patient experiences and health care practices that can provide insights into drug safety under real-life conditions. Analyzing RWD could also enrich and consolidate the already existing knowledge on ADRs (eg, by detecting new cofounders). Indicatively, a federated RWD network was used recently to validate the value of RWD in terms of pharmacovigilance signal management [ ].To this end, the European Medicines Agency and the US Food and Drug Administration have established infrastructures for the leverage of RWD for drug safety purposes, called Data Analysis and Real World Interrogation Network (DARWIN) [
] and the Sentinel Initiative [ ], respectively. RWD are also being actively investigated for purposes beyond drug safety (eg, epidemiology) [ ]. It should be noted that although RWD could in principle provide a good overview of patients’ clinical course, two major challenges are preventing their use: (1) these datasets typically come with significant data quality risks and usually contain a high proportion of null values and errors; and (2) because of legal, ethical, and regulatory issues (eg, patient privacy issues), it is difficult to access these data sources.Rationale
Artificial intelligence (AI) is widely acknowledged as a potentially very useful technical breakthrough that could be used to support decisions in health care (eg, clinical decision support systems) due to its ability to efficiently process big data to seek useful information. AI could be used to identify patterns and associations within large amounts of data (eg, RWD) where traditional statistical methods of data analysis may struggle to extract because of the amount and complexity (eg, nonlinear relationships between variables) of the data. AI has been widely investigated regarding its applications in health care (eg, personalized medicine) with promising results [
, ]; however, it is not yet widely applied in clinical practice. In the context of pharmacovigilance, AI could potentially support multiple aspects (eg, the identification of patient subpopulations who may be more vulnerable to specific ADRs), contributing to the vision of personalized drug safety management.Objectives
The objective of this scoping review (SR) was to identify and characterize the current research trends regarding the use of AI on structured RWD for pharmacovigilance and identify relevant gaps.
Methods
The PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) [
] methodology or rationale was applied. The PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) statement is a road map for authors to describe more precisely the state of the art and the findings of the literature search, as well as discuss the results.Eligibility Criteria
Journal and conference articles written in English were selected if they focused on pharmacovigilance and reported the use of symbolic and nonsymbolic AI approaches applied to RWD, specifically EHRs, insurance claims databases, and administrative health data (
).Inclusion criteria
- Article type: research
- Language: English
- Data type: tabular
- Data analysis method: symbolic artificial intelligence (AI) and nonsymbolic AI
Exclusion criteria
- Article type: review and opinion articles
- Language: other
- Data type: image and text
- Data analysis method: statistical
Review and opinion articles were excluded from the final manuscript selection. Furthermore, research articles focusing on image and text data (eg, social media and clinical notes) were also excluded. In addition, AI methods focusing on the use of natural language processing (NLP), natural language understanding, image processing, or object detection were considered beyond the scope of this work.
A key issue that came up during this SR was the lack of a clear distinction between plain statistical methods and machine learning (ML) approaches because these 2 domains frequently overlap, and these 2 terms are sometimes used interchangeably. In this manuscript, we acknowledge that the difference between AI and statistical methods is that AI creates models that can “learn” from data during iterative training processes, while statistical methods deal with finding relationships between variables. Thus, we considered the iterative “learning” part of an algorithm as the key feature to classify the algorithm as AI and ML. We excluded papers that were based on algorithms with no iterative “learning” scheme because we considered them to be part of the “plain statistical methods” approaches. Finally, we excluded papers that focused on adverse drug events related to medical devices.
Information Sources and Search Strategy
A search query was developed and executed on January 31, 2024, to include research articles from 2010 to 2024 exclusively from the MEDLINE scientific library, given that it is the oldest and biggest repository of journal articles in life sciences.
presents the query structure.Pharmacovigilance (keywords relevant to known adverse drug reaction [ADR] categories, synonyms of drug safety, pharmacovigilance terminology, and known individual case safety report [ICSR] databases)
V OR “pharmacovigil*” OR “pharmaco-vigil*” OR “side effect*” OR “adverse reaction*” OR “Product Surveillance” OR “postmarket*” OR pharmacoepidemiol* OR pharmaco-epidemiol* OR “drug safety” OR “drug event*” OR “toxicit*” OR “drug reaction*” OR “adverse drug*” OR “allerg*” OR “post-market*” OR “post market*” OR vaccinovigil* OR vaccino-vigil* OR eudravigilance OR “individual case safety report*” OR ICSR OR VAERS OR FAERS OR AERS OR vigibase OR “adverse effect*” OR “adverse event*” OR hypersensitiv* OR “spontaneous report*” OR “yellow card” OR “yellow-card” OR ADR OR “personalized pharmacovigilance” OR “precision pharmacovigilance” OR “pharmacosurveillance” OR “pharmaco-surveillance”
AI (categories of AI, terms that are used in the development of an AI model, explainable and interpretable AI methods, and different AI architectures)
“artificial intelligence” OR AI OR “machine learning” OR ML OR “neural network*” OR NN* OR “deep learning” OR DL OR ontolog* OR “knowledge engineering” OR KE OR reasoning OR inference OR “semantic web” OR “OWL” OR “Web Ontology Language” OR SWRL OR “RDF” OR “Resource Description Framework” OR “prediction” OR “estimation” OR “XAI” OR “SHAP” OR “Shapley value” OR “LIME” OR “Local Interpretable Model-agnostic Explanations” OR “DeepSHAP” OR “DeepLIFT” OR “CXplain” OR “Explainable Artificial Intelligence” OR “Explainable machine learning” OR “Interpretable artificial intelligence” OR “Interpretable machine learning”
RWD or real-world evidence (categories of RWD and data models that are used to store RWD)
“Real World Evidence” OR “Real World Data” OR RWE OR RWD OR “Observational Medical Outcomes Partnership” OR “OMOP” OR “Electronic Healthcare Record*” OR “EHR” OR “Electronic Medical Record*” OR “EMR*” OR EHDEN OR OHDSI OR i2b2 OR Sentinel OR DARWIN OR “Data Analysis and Real World Interrogation Network” OR administrative OR claim* OR “Observational Health Data Sciences and Informatics” OR “European Health Data Evidence Network” OR “multimodal data” OR “multimodal drug data” OR “multidimensional data” OR “multidimensional drug data” OR “multi-modal data” OR “multi-modal drug data” OR “multi-dimensional data” OR “multi-dimensional drug data”
Selection Process
The initial phase (phase 1) focused on screening the titles and abstracts of the articles retrieved from the search query (
) to map those that potentially met our inclusion criteria and exclude irrelevant studies using the Rayyan tool (Rayyan Systems Inc) [ ]. Rayyan is an AI tool designed to facilitate remote collaboration among researchers when conducting systematic literature reviews. The platform gathers the titles and abstracts of all articles selected for the study, and reviewers can evaluate the eligibility (ie, “include,” “exclude,” or “maybe”) of every article based on their review’s objectives in blind mode, that is, each reviewer assesses the articles without prior knowledge of the other reviewers’ decisions. We resolved any conflicts that arose during this process through consensus meetings involving all reviewers.The second phase focused on the full-text review of the papers selected during phase 1 to decide on the final set for inclusion in this study. In the full-text review of the studies selected based on titles and abstracts, we excluded research papers that did not meet ≥1 of the inclusion criteria (ie, strong focus on AI, RWD, and pharmacovigilance) as well as studies that met the exclusion criteria (eg, studies related to image and text data or those following only statistical approaches).
Data-Charting Process
A standard data extraction form was used to obtain an overview of the 36 selected studies (Tables S1 and S2 in
). For each study, we extracted information about the authors; journal name (where the study was published); publication year; country of origin (where the study was conducted); the objective of the study; types of organizations that participated in the study (based on the authors’ affiliations); and key findings that relate to the scoping review question, which are described in the next subsection (Data Collection Process and Mapping). Any inconsistencies were discussed and resolved among the reviewers.Data Collection Process and Mapping
The selected studies were further elaborated and mapped against evaluation criteria using a spreadsheet. The main categories of mapping criteria were as follows: pharmacovigilance objectives (drug safety core activities and drug safety special topics), data provenance (data source categories and data sources), countries of origin, AI algorithm categories, data preprocessing methods, the use of explainable AI (XAI) methods, code availability, the use of models in clinical practice, ethical AI, and so on.
presents an external description of the mapping criteria.Category of bias | Explanation | Subcategories |
Selection bias | The bias that occurs when the input data of an AI model underrepresent the target population |
|
Measurement bias | How the different features are collected and measured |
|
Temporal bias | How the study processes time-dependent features |
|
Algorithmic bias | The biases produced form AI model outputs |
|
Implicit bias | How stereotypes influence the AI model design and interpretation |
|
Confounding bias | How unaccounted-for confounders influence prediction |
|
Automation bias | This refers to the tendency to overly rely on automated systems |
|
Studying Risk-of-Bias Assessment
To effectively map the risk of bias in each included study, we considered selection, measurement, temporal, implicit, confounding and automation biases. Furthermore, we translated these categories into more specific categories according to our study (
).Synthesis Methods
The mapping strategy was designed based on the 3 main pillars of the objective; in addition, we included general information about the research papers (
). Furthermore, we included free-text fields in the mapping Microsoft Excel file to add significant extra details that cannot be easily classified. These fields included “objective,” “methods,” “assessment,” and “interesting results.” The criteria encompassed specific attributes (eg, drug safety core activities) that were defined based on previous experience of conducting an SR in the field [ ] and key interest aspects identified during the review.Furthermore, in terms of ethical AI, the included studies were evaluated based on trustworthy AI guidelines for solutions in medicine and health care from the FUTURE (Fairness, Universality, Traceability, Usability, Robustness, and Explainability)-AI initiative [
]. These guidelines are separated into 7 categories (fairness, universality, traceability, usability, robustness, explainability, and a general category). For our evaluation procedure, we included only the highly recommended subcategories from each of the 7 main categories for proof of concept (low technology readiness levels for ML models) [ ]. presents the selected criteria and their description.Categories and critera | Subcriteria | |
General information | ||
PubMed and MEDLINE ID (number) | ID number of articles | |
Authors (text) | List of authors | |
Title (text) | Article title | |
Journal (text) | ID number of articles | |
Year published (number) | Year of article publication | |
Types of organizations (text) | Types of organizations based on the authors’ affiliation; possible values: health care, government, academia, industry, pharmacovigilance monitoring | |
Country (text) | Country where the research was conducted based on the authors’ affiliations | |
Pharmacovigilance | ||
Drug safety core activities (text) | Possible values: ADEa detection, ADE monitoring, ADE prevention, ADE assessment, ADE information collection, and ADE reporting | |
Drug safety special topics (text) | Possible values: comparative drug analysis, drug interactions, MoAb identification and analysis, personalized drug safety, signal detection, specific (class of) disease, specific (class of) drugs, specific adverse effect, and vaccine safety | |
Drug (text) | Drugs being examined in the research papers | |
Reaction (text) | Reactions being examined in the research papers | |
Indication (text) | Indications being examined in the research papers | |
Reference terminologies (text) | Known health informatics terminologies that are detected in the research papers | |
AIc | ||
AI categories (text) | Possible values: nonsymbolic AI and symbolic AI | |
Nonsymbolic AI (text) | Possible values: classification and regression | |
Classification (text) | Possible values: random forest, logistic regression, artificial neural network, XGBoostd, support vector machine, decision tree, knowledge graph, k-nearest neighbors, gradient boost, naïve Bayes, random survival forest, and extra tree | |
Regression (text) | Possible values: logistic regression, linear regression, LASSOe, and regularized Cox regression | |
Data preprocessing type (text) | Possible values: dimensionality reduction, feature engineering, null imputation, and data cleansing | |
Data cleansing (text) | Possible values: data normalization and remove null values | |
Feature engineering (text) | Possible values: one-hot encoding, binning, splitting, and calculated features | |
Null imputation (text) | Possible values: regression or classification imputation | |
Explainable AI methods (text) | Possible values: LIMEf and SHAPg | |
Knowledge representation formalism (text) | Possible values: OWLh and RDFi | |
Knowledge engineering core activities (text) | Possible values: knowledge extraction, knowledge integration, and knowledge representation | |
Real-world data | ||
Data source categories (text) | Possible values: ADE databases, clinical narratives, clinical trials drug information databases, drug regulation documentation, EHRsj, genetics and biochemical databases, spontaneous reporting systems, dispensing records from pharmacies, and administrative claims data | |
Data source or sources (text) | Possible values: proprietary closed data sources (eg, specific hospital EHR), FAERSk, SIDERl, SMILESm, UK Biobank, Osteoarthritis Initiative dataset, PharmGKBn, TwoSIDES, EU-ADRo reference set, Stockholm Electronic Patient Record Corpus, MIMICp, OMIMq, DisGeNetr, and AEOLUSs | |
Data model (text) | Possible values: OMOP-CDMt, Sentinel, and custom | |
Evaluation criteria | ||
Code availability (text) | The availability of the code in an open registry; possible values: yes and no | |
Data preprocessing | Information about the data preprocessing procedures; possible values: yes and no | |
Clinical use | Information about the evaluation of the produced work pipeline in clinical environments; possible values: yes and no |
aADE: adverse drug event.
bMoA: mechanism of action.
cAI: artificial intelligence.
dXGBoost: extreme gradient boosting.
eLASSO: least absolute shrinkage and selection operator.
fLIME: local interpretable model-agnostic explanations.
gSHAP: Shapley additive explanations.
hOWL: Web Ontology Language.
iRDF: resource description framework.
jEHR: electronic health record.
kFAERS: Food and Drug Administration Adverse Event Reporting System.
lSIDER: Side Effect Resource.
mSMILES: Simplified Molecular Input Line Entry System.
nPharmGKB: Pharmacogenomics Knowledge Base.
oEU-ADR: European Union Adverse Drug Reaction.
pMIMIC: Medical Information Mart for Intensive Care.
qOMIM: Online Mendelian Inheritance in Man.
rDisGeNet: gene-disease association network.
sAEOLUS: Adverse Event Open Learning through Universal Standardization.
tOMOP-CDM: Observational Medical Outcomes Partnership Common Data Model.
Categories and recommendations | Description | |
Fairness | ||
Define sources of bias | Identification of possible types and sources of bias for the AIa tool during the design phase (eg, sex, gender, age, ethnicity, socioeconomics, geography, comorbidities or disability of patients, and human biases during data labeling) | |
Universality | ||
Define clinical settings | Specification of the clinical settings in which the AI tool will be applied (eg, primary health care centers, hospitals, home care, low- vs high-resource settings, and 1 country or multiple countries) | |
Evaluate using external data | Testing of the developed AI model to an external dataset with different characteristics from the training set | |
Traceability | ||
Provide documentation (eg, technical and clinical) | Creation of documentation files that provide technical (eg, public repositories) and clinical information (eg, bias of the model based on its use) | |
Usability | ||
Define user requirements | Specification of the model’s use from health care professionals | |
Robustness | ||
Define sources of data variation | Specification of data sources’ variation that may impact the AI tool’s robustness in the real world (differences in equipment, technical fault of the machine, data heterogeneities during data acquisition or annotation, or adversarial attacks) | |
Train with representative data | Data for the training process should represent the population based on the case study for which the AI model has been developed | |
Evaluate and optimize robustness | Risk mitigation measures should be implemented to optimize the robustness of the AI model, such as regularization, data augmentation, data harmonization, or domain adaptation | |
Explainability | ||
Define explainability needs | Use of interpretable or explainable models | |
General | ||
Engage interdisciplinary stakeholders throughout the AI lifecycle | —b | |
Implement measures for data privacy and security | — | |
Define adequate evaluation plan (eg, datasets, metrics, and reference methods) | — |
aAI: artificial intelligence.
bNot applicable.
Reporting Risk-of-Bias Assessment
The selection of only studies written in English and the exclusion of AI studies focused on text mining or NLP, image processing, and statistical analysis could be identified as potential risks for bias. Furthermore, the selection of papers only from the MEDLINE database could be identified as a potential bias risk because it potentially leads to the omission of papers from other databases (eg, AI databases).
Results
Study Selection
The PubMed search query originally returned 4264 studies. During the abstract and title screening process (phase 1), we selected 93 (2.18%) of the 4264 articles for full-text screening (phase 2). During phase 2, based on the inclusion criteria, of these 93 research papers, 36 (39%) were selected. The PRISMA-ScR flowchart (
) presents a detailed overview of the selection procedure. The PRISMA-ScR checklist is presented in .Study Characteristics
The included studies were published between 2015 and 2023, with a notable increase in the number of studies after 2019 (
).Of the 36 studies, 19 (53%) originated from the United States, 4 (11%) from Korea, and 4 (11%) from the United Kingdom, while the rest of the studies (n=9, 25%) were distributed across a variety of other countries (
).Most of the studies (30/36, 83%) were conducted from academia (
).Years | Studies, n (%) |
2015 | 1 (3) |
2016 | 2 (5) |
2017 | 3 (8) |
2018 | 2 (5) |
2019 | 2 (5) |
2020 | 4 (11) |
2021 | 10 (29) |
2022 | 7 (20) |
2023 | 5 (14) |
aIncludes studies conducted in multiple countries.
Countries | Studies, n (%) |
United States | 19 (53) |
South Korea | 4 (11) |
United Kingdom | 4 (11) |
Canada | 3 (8) |
Sweden | 3 (8) |
China | 3 (8) |
France | 3 (8) |
Australia | 2 (6) |
Netherlands | 2 (6) |
Bangladesh | 1 (3) |
Israel | 1 (3) |
Belgium | 1 (3) |
Denmark | 1 (3) |
Taiwan | 1 (3) |
Ireland | 1 (3) |
Switzerland | 1 (3) |
aIncludes studies that involved >1 type of organization.
Organizations | Studies, n (%) |
Academia | 30 (83) |
Health care | 9 (25) |
Industry | 6 (17) |
Government | 2 (6) |
Regulatory bodies | 1 (3) |
aIncludes studies that involved multiple databases.
bEHR: electronic health record.
cSRS: spontaneous reporting system.
dADE: adverse drug event.
In terms of AI, of the 36 studies, 34 (94%) applied only nonsymbolic AI, and 1 (3%) used only symbolic AI, while 1 (3%) study combined the symbolic and nonsymbolic AI technical paradigms. Of the 34 nonsymbolic AI articles, 29 (85%) used classification tasks, whereas 3 (9%) selected regression algorithms, 3 (9%) applied causality algorithms (causal inference: n=2, 67%; causal discovery: n=1, 33%), and only 1 (3%) applied an association rule mining technique. The association rule mining study [
] followed a mathematical framework called formal concept analysis to create association rules between drugs and phenotypes to detect possible ADRs. Moreover, of the 29 studies that used classification tasks, 6 (21%) used XAI techniques, of which 4 (67%) used Shapley additive explanations, 1 (17%) used local interpretable model-agnostic explanations, and 1 (17%) tested both approaches.Regarding RWD (
), of the 36 articles, 28 (78%) focused on the use of EHRs (from local hospital databases), 4 (11%) used data from pharmacy dispensing records, and 3 (8%) used administrative claims data, while 2 (6%) focused on patient registries and 1 (3%) on insurance claims. In addition, a variety of other sources were used, including RWD such as drug information databases (3/36, 8%), spontaneous reports (3/36, 8%), adverse drug event databases (2/36, 6%), electronic prescription data (2/36, 6%), and genetics and biochemical databases (1/36, 3%).Of the 36 studies, 23 (64%) used AI for ADR detection, 4 (11%) examined ADR assessment, 2 (6%) focused on ADR monitoring, 7 (19%) investigated ADR prevention, and 2 (6%) used AI to collect information about ADRs (
).Type of database | Studies, n (%) |
EHRsb | 28 (78) |
Drug information databases | 4 (11) |
Dispensing records from pharmacies | 4 (11) |
SRSsc | 3 (8) |
Administrative claims data | 3 (8) |
Patient registries | 2 (6) |
Electronic prescription data | 2 (6) |
ADEd databases | 2 (6) |
Insurance claims | 1 (3) |
aIncludes studies that examined multiple pharmacovigilance core activities.
bADR: adverse drug reaction.
Pharmacovigilance core activities | Studies, n (%) |
ADRb detection | 23 (64) |
ADR prevention | 7 (19) |
ADR assessment | 4 (11) |
ADR monitoring | 2 (6) |
ADR information collection | 2 (6) |
aIncludes studies that involved multiple AI algorithms.
bXGBoost: extreme gradient boosting.
cLASSO: least absolute shrinkage and selection operator.
dNo algorithms.
The classification studies (29/36, 81%;
) tested several AI techniques, with random forest (RF) being the most frequently used algorithm (17/29, 59%). However, the regression studies (3/36, 8%) developed AI models only with extreme gradient boosting (1/3, 33%) and logistic regression (2/3, 67%).Finally, for the evaluation of AI models (
), most of the studies (24/36, 67%) reported area under the receiver operating characteristic curve as the primary metric.Of the 36 studies, 32 (89%) investigated specific drug safety topics: 16 (50%) on specific adverse effects, 14 (44%) on specific class of drugs, 8 (25%) on specific (class of) diseases, 6 (19%) on signal detection, 3 (9%) on drug interactions, 2 (6%) on personalized drug safety, and 1 (3%) on vaccine safety (
).AI models and algorithms | Studies, n (%) | |
Classification (n=29)a | ||
Random forest | 17 (59) | |
XGBoostb | 10 (34) | |
Artificial neural network | 8 (28) | |
Logistic regression | 8 (28) | |
Support vector machine | 7 (24) | |
Decision tree | 5 (17) | |
K-nearest neighbor | 2 (7) | |
Gradient boost | 2 (7) | |
LASSOc | 2 (7) | |
Extra tree | 1 (3) | |
Naïve Bayes | 1 (3) | |
Random survival forest | 1 (3) | |
Linear regression | 1 (3) | |
Regularized Cox regression | 1 (3) | |
Regression (n=3) | ||
XGBoost | 1 (33) | |
Logistic regression | 2 (67) | |
Causalityd (n=3) | 3 (100) |
aIncludes studies that involved multiple model evaluation metrics.
AI model evaluation metrics | Studies, n (%) |
Area under the receiver operating characteristic curve | 24 (67) |
Accuracy | 10 (28) |
F1-score | 8 (22) |
Precision | 11 (31) |
Recall | 10 (28) |
Negative predictive value | 6 (17) |
Sensitivity | 9 (25) |
Specificity | 7 (19) |
Other (≤2) | 20 (56) |
aIncludes studies that examined multiple pharmacovigilance topics.
Specialization of pharmacovigilance topics | Studies, n (%) |
Specific adverse effect | 16 (50) |
Specific (class of) drugs | 14 (44) |
Specific (class of) disease | 8 (25) |
Signal detection | 6 (19) |
Drug interactions | 3 (9) |
Personalized drug safety | 2 (6) |
Vaccine safety | 1 (3) |
aIncludes studies that used multiple data sources.
bSIDER: Side Effect Resource.
cFAERS: Food and Drug Administration Adverse Event Reporting System.
presents the diversity in the data sources used in the included studies. Of the 36 studies, 29 (81%) chose proprietary closed data sources (eg, specific hospital EHRs) for their experiments. Along with EHR data, other data sources were also used (eg, Food and Drug Administration Adverse Event Reporting System and Side Effect Resource). Of the 36 studies, 2 (6%) selected the Stockholm Electronic Patient Record Corpus. The remaining RWD sources (Medical Information Mart for Intensive Care and the Osteoarthritis Initiative dataset) are represented in only 2 (6%) of the 36 studies (n=1, 50% for every database).
In terms of data models, of the 36 studies, 27 (75%) used proprietary data models, 3 (8%) did not mention any data model, 5 (14%) used the Observational Medical Outcomes Partnership Common Data Model (OMOP-CDM), and 1 (3%) used the Sentinel model (
).presents the case studies examined in the included articles. Notably, an important number of studies (20/36, 55%) did not work in specific ADR case studies. Another significant outcome is the diversity of case studies; the articles do not focus on a specific drug, indication, or reaction. It can be observed that chemotherapy drugs and their associated reactions in various types of cancers emerge as slightly more prominent categories in this review ( ).
Data sources | Studies, n (%) |
Proprietary closed data sources | 29 (81) |
Other | 12 (33) |
SIDERb | 3 (8) |
FAERSc | 2 (6) |
Stockholm Electronic Patient Record Corpus | 2 (6) |
aOMOP-CDM: Observational Medical Outcomes Partnership Common Data Model.
Data models | Studies, n (%) |
Custom | 27 (75) |
OMOP-CDMa | 5 (14) |
Unknown | 3 (8) |
Sentinel | 1 (3) |
Although most of the studies (21/36, 58%) used complex AI algorithms (black boxes), such as RF (an ensemble method) and artificial neural networks (ANNs), to construct their prediction models in all ADR categories, many studies (15/36, 42%) used simple interpretable ML approaches such as logistic regression. Moreover, it is important to highlight that all studies worked on EHR databases, except for the adverse drug event assessment category in which we detected a single study with a vaccine database.
RWD databases were also used alongside other types of data; for example, EHRs were mostly combined with spontaneous reporting systems and drug information databases, vaccine data with adverse drug event databases, and administrative claims data with spontaneous reporting systems. Furthermore, some of the studies (3/36, 8%) integrated different types of observational data to develop AI models, combining pharmacy dispensing records with EHRs and administrative claims data.
Evaluation Results
Only 3 (8%) of the 36 studies included in this SR openly provided their code. In addition, only 16 (44%) of the 36 studies included a detailed description of data preprocessing pipelines for RWD. Moreover, just 4 (11%) of the 36 studies evaluated their methodology within a clinical environment (
).Evaluation metrics | Studies, n (%) | |
Yes | No | |
Code availability | 3 (8) | 33 (92) |
Data preprocessing | 16 (44) | 20 (56) |
Clinical validation | 5 (14) | 31 (86) |
In terms of trustworthy AI, only 5 (14%) of the 36 studies scored <50% on the Fairness, Universality, Traceability, Usability, Robustness, and Explainability–AI (FUTURE-AI) criteria (
and ). Among the studies that achieved scores of >75% [ - ], 3 (75%) out of 4 used external data to evaluate their models, addressing the Universality criterion (Table S4 in ).Data models | Studies, n (%) |
Custom | 27 (75) |
OMOP-CDMa | 5 (14) |
Unknown | 3 (8) |
Sentinel | 1 (3) |
aIncludes studies that fell into multiple FUTURE-AI and general categories.
Study | Year | Code availability | Data preprocessing | Clinical use | FUTURE-AI criteria satisfaction (%) |
Anastopoulos et al [ | ]2021 | No | Yes | No | 75 |
Ward et al [ | ]2021 | No | Yes | No | 67 |
Zhang et al [ | ]2020 | No | Yes | No | 50 |
Kim et al [ | ]2021 | No | No | No | 67 |
Morel et al [ | ]2020 | Yes | No | Yes | 67 |
Zou et al [ | ]2021 | No | No | Yes | 67 |
Liu et al [ | ]2018 | No | No | Yes | 42 |
Zhu et al [ | ]2022 | No | Yes | No | 67 |
Kidwai-Khan et al [ | ]2022 | No | Yes | No | 58 |
Sharma et al [ | ]2022 | No | No | No | 58 |
On et al [ | ]2022 | No | Yes | No | 50 |
Datta et al [ | ]2021 | No | No | No | 58 |
Bagattini et al [ | ]2019 | Yes | Yes | No | 75 |
Gibson et al [ | ]2021 | No | Yes | No | 58 |
Jeong et al [ | ]2018 | No | Yes | No | 67 |
Zhao et al [ | ]2015 | No | Yes | No | 58 |
Zhao and Henriksson [ | ]2016 | No | Yes | No | 42 |
Segal et al [ | ]2019 | No | No | Yes | 75 |
Boland et al [ | ]2017 | No | No | No | 67 |
Wang et al [ | ]2021 | No | No | No | 58 |
Li et al [ | ]2022 | No | No | No | 58 |
Jin et al [ | ]2020 | No | No | No | 67 |
Hansen et al [ | ]2016 | No | No | No | 50 |
Mosa et al [ | ]2021 | No | No | No | 67 |
Herrin et al [ | ]2021 | No | No | No | 50 |
Pichardo et al [ | ]2022 | No | No | No | 67 |
Puzhko et al [ | ]2021 | No | No | No | 58 |
Souissi et al [ | ]2017 | No | No | Yes | 42 |
Personeni et al [ | ]2017 | No | No | No | 42 |
Zhou et al [ | ]2020 | Yes | No | No | 67 |
Goyal et al [ | ]2023 | No | No | No | 58 |
Wang et al [ | ]2023 | No | Yes | No | 58 |
Hughes et al [ | ]2023 | No | Yes | No | 58 |
Sharma et al [ | ]2023 | No | Yes | No | 67 |
Akimoto et al [ | ]2023 | No | Yes | No | 58 |
Zhang et al [ | ]2022 | No | Yes | No | 58 |
Risk of Bias in the Included Studies
provides an overview of the distribution of biases across the included studies. Notably, of the 36 studies, 21 (58%) included selection biases, and 17 (47%) included confounding biases. Algorithmic biases were identified in 14 (39%) of the 36 studies. Table S5 in presents the detailed categorization of the included studies in the different risk-of-bias categories.
Bias categories and subcategories | References | ||
Data-related biases | |||
Selection bias | |||
Underrepresentation of certain demographic groups | [ | , , , , , - , , , , , - , - ]||
Overrepresentation of adverse drug events from specific health care systems or regions | [ | , , , , , , - , , , , - , - ]||
Measurement bias | |||
Inconsistent adverse drug event reporting practices | [ | , , , , , , , , ]||
Variations in diagnostic criteria or coding practices for medical conditions | [ | , , , , , , ]||
Temporal bias | |||
Changes in prescribing patterns or drug formulations over time | [ | , , , , , , , , , , ]||
Seasonal variations in disease prevalence or reporting behaviors | [ | , , , , , ]||
Algorithm-related biases | |||
Algorithmic bias | |||
Differential performance in adverse drug event detection across patient subgroups | [ | , , , , , - , , , , - ]||
Biased risk assessments for certain medications or populations | [ | , , , , - , , , , - ]||
Implicit bias | |||
Overlooking potential drug interactions more common in specific ethnic groups | [ | , , , , , ]||
Underestimating the severity of side effects reported by certain demographics | [ | , , , , , ]||
Deployment and interpretation biases | |||
Confounding bias | |||
Failing to consider comorbidities when assessing drug safety profiles | [ | , , , , , , , , , - , ]||
Not accounting for polypharmacy effects in adverse drug event analysis | [ | , , , , , , , , , , - , ]||
Automation bias | |||
Overlooking rare or unusual adverse drug events not flagged by AI systems | [ | , , , ]||
Reduced critical evaluation of AI-generated safety signals by human experts | [ | , , , ]
Interesting Results of Individual Studies
It is important to mention that 6% (2/36) of the studies successfully combined AI and a self-controlled case series (SCCS) model for ADR detection. Morel et al [
] introduced the convolutional SCCS (ConvSCCS) model in which the SCCS model is enriched with a convolutional neural network. This allows the ConvSCCS model to consider a few longitudinal data dimensions (eg, drug exposure) from observational data and predict a potential ADR without a prior definition of risk windows, which is mandatory in SCCS models. The ConvSCCS model was tested in glucose-lowering drugs and the risk of bladder cancer case study. Another interesting advantage shown by the results is that the ConvSCCS model is useful for analyzing high-dimensional data while requiring minimal data preprocessing. Zhang et al [ ] developed the neural SCCS (NSCCS) model to detect probable drug interactions and control for time-invariant confounders. The NSCCS model was tested in the OMOP-CDM reference dataset [ ]. Both the ConvSCCS and NSCCS models outperformed traditional SCCS statistical models in comparative analyses; the ConvSCCS model demonstrated superior precision and computational speed, while the NSCCS model achieved an area under the receiver operating characteristic curve score of 0.779 [ ].Furthermore, Kidwai-Khan et al [
] focused on improving the prediction of preventable adverse events by integrating in an AI decision support tool, EHRs with genetic data (the presence or absence of genes contraindicated with a person’s medication). This is the only study in this review that combined EHRs with genetic data and 1 (25%) of the 4 studies that used XAI methods. In addition, all AI models achieved high evaluation scores (>95%).A few ADR prediction studies (3/36, 8%) introduced innovative ideas on feature preprocessing. Jeong et al [
] developed an ML prediction model in which the features are calculated from algorithms such as the “comparison of extreme laboratory test” results, “comparison of extreme abnormality ratio”, and “prescription pattern around clinical events” to help determine whether a drug-laboratory event pair is associated. A different approach was proposed by Wang et al [ ] who addressed problems with low-quality observational data (eg, missing data) by creating patient embeddings and treating patients “as bags with the various number of feature-value pairs, called instances.” This method led to the development of the final AI model (AMI-Net3), which achieved exceptional performance. Chen et al [ ] also proposed an embedding methodology, called “physiological signal embeddings.” This study proved that training deep embedding models on physiological signals could lead to better forecasts of adverse outcomes. In addition, this methodology enables data transferability through the physiological signal embeddings models.Only 1 (3%) of the 36 studies developed an application for the prediction of ADRs. Mosa et al [
] leveraged the interoperability of a decision tree ML model and, based on their results, designed a rule-based mobile app to assess the risk of specific ADRs and indications.A completely different approach to ADR prediction was introduced by Liu et al [
]. In this study, the authors applied an ML method to develop a prediction model for osteoarthritis ADRs in analgesic drugs. Afterward, the authors used explainability techniques to identify patients who might be prescribed analgesic drugs without the risk of osteoarthritis ADRs. The diversity of this study is addressed in a different scope: instead of predicting ADRs based on a patient’s medical history, the model focuses on identifying the characteristics that make the patient suitable for a medication, specifically by considering the presence or absence of an ADR.Recently, the causal ML paradigm was introduced into pharmacovigilance through the studies of Wang et al [
] and Zhang et al [ ], who applied causal inference with average treatment effects and causal discovery with directed acyclic graphs, respectively. Wang et al [ ] used causal ML models to make a representation of a randomized clinical trial with EHR data. Their results successfully identified both well-known and new medications that could cause the suspected ADR in their case study. Furthermore, Zhang et al [ ] created a causal graph for a drug-event combination and compared the results from 2 causal discovery algorithms. Their results showcased the causal discovery algorithms’ abilities to explore the mechanisms of the suspect drug that could lead to a potential ADR, uncovering previously unknown causal links.Only 1 (3%) study focused on the use of symbolic AI [
] compared to those investigating the use of ML (35/36, 97%) and 1 study combine symbolic and nonsymbolic AI. Notably, Pichardo et al [ ] stand out for integrating ontologies and ML, namely combining symbolic and nonsymbolic AI. The objective of this study is to examine the performance of a clinically informed framework for the prediction of short-term ADRs.Furthermore, very few studies (5/36, 14%) focused on the clinical evaluation of the proposed ML approach. Segal et al [
] presented a clinical decision support system designed to provide medication error alerts to prevent ADRs, demonstrating significant results—40% of the prescriptions were altered based on these alerts. Herrin et al [ ] compared the effectiveness of their proposed ML scheme to that of an established clinical practice, specifically the HAS-BLED approach, to evaluate a patient’s risk of gastrointestinal bleeding.Synthesized Findings
Although the included studies followed widely used approaches in AI and pharmacovigilance to predict potential ADRs, primarily using EHR data, such as predicting ADR outcomes using well-known ML model architectures, the aforementioned studies follow fundamental methodologies. In terms of data, it describes several interesting processes such as the creation of patient embeddings and the application of the “comparison of extreme laboratory test” results, “comparison of extreme abnormality ratio”, and “prescription pattern around clinical event” algorithms for calculating input data for the ML model. Furthermore, it is described a variety of innovative AI algorithms such as SCCS models, causal ML, and symbolic AI. Finally, out-of-the-box pharmacovigilance approaches were followed, such as identifying patients suitable for specific treatments based on their ADR profiles.
Discussion
The findings of this study enabled us to identify innovative ideas, spot existing limitations, and propose potential directions for future work in this field.
Principal Findings
In summary, the number of studies on AI methodologies applied to RWD for pharmacovigilance purposes has significantly increased in the last 5 years—most of the included studies (28/36, 78%) were published after 2019, with the United States contributing the most publications in this field (19/36, 53%).
Comparing this review study with 3 recent reviews from 2022 to 2023 in the same field, we conclude that only the study by Kaas-Hansen et al [
] could be considered a study with the exact focus with our review. Their review included only 7 scientific papers because they restricted their selection to studies published between 2015 and 2021 and involving >1000 patient records. Their findings are similar to ours in terms of dominant AI solutions (classification) and the type of RWD used (EHR). Finally, it is essential to note that their review, like ours, highlights the limited adoption of widely used common data models, such as the OMOP-CDM.In terms of risk of bias, selection biases were due to the fact that most of the studies (15/36, 42%) did not include patients from >1 data source (regional hospital [
] or insurance claims database [ ]) in their models. Regarding confounding, while high-dimensional RWD offer a significant amount of information, they also contain substantial noise and null values. Sequentially, features that could be potential confounders are eliminated from the final dataset.Pharmacovigilance
The reviewed papers focused on using ML to detect ADRs to confirm whether previously known ADRs could have been identified using RWD. Another major theme identified was the prediction of ≥1 ADRs based on the classification of patients with different characteristics, ultimately aiming to support personalized ADR prevention. However, there was a lack of studies investigating new, potential pharmacovigilance signals. Regarding the investigated ADRs, there was a slightly higher interest in chemotherapy drugs for different types of cancers due to the high incidence of serious reactions associated with this treatment.
Finally, it is important to highlight that only 4 (11%) of the 36 studies in this review were tested in real-world clinical environments, which leads us to conclude that AI models may lack generalizability or that health care professionals may lack trust in AI models. By contrast, the trustworthy AI evaluation based on the FUTURE-AI guidelines proved that only a few studies (4/36, 11%) failed to satisfy half of the criteria, indicating relatively high research quality.
RWD Preprocessing
In terms of RWD, EHRs were the most commonly used data source. EHRs are multidimensional, offering data that could be crucial for detecting postmarketing ADRs. At least in principle, EHRs could serve as an invaluable data source for investigating potential drug synergies or interactions across diverse populations. Furthermore, the variety of information in patient records could be an advantage in the creation of multimodal datasets, for example, by integrating biological, signaling pathway, and drug information databases.
However, the use of EHRs comes with significant burdens because they contain sensitive personal data, leading to limited access. Medical Information Mart for Intensive Care is the only openly available EHR dataset for researchers, but it is not commonly used in pharmacovigilance (only 2/36, 6% studies used it).
Moreover, it is important to mention that RWD preprocessing is challenging due to its complexity and real-world nature (biases, errors, gaps, noise, etc). Consequently, less than half of the articles (17/36, 47%) described in detail the data preprocessing step in their pipelines.
Another noteworthy outcome is that widely adopted data models such as the OMOP-CDM, Informatics for Integrating Biology & the Bedside, and Sentinel appeared sporadically in the studies. This could be attributed to the fact that the use of EHR data and AI models is relatively new. However, it should be noted that initiatives in this direction are emerging (eg, the Assessment of Pre-trained Observational Large Longitudinal models in Observational Health Data Sciences and Informatics initiative [
]).Finally, it should be noted that RWD have a very substantial longitudinal dimension and questionable quality (due to gaps, errors, etc). As such, leveraging RWD for pharmacovigilance purposes requires the development of new approaches that focus on using time-related sequential information. While several attempts have been made to exploit this temporal aspect of RWD [
, - ], validating AI and ML algorithms focusing on time-series rationale for pharmacovigilance signal detection remains a critical issue.AI Models
The detection of potential pharmacovigilance signals is a challenging procedure. As a result, the development of ML models to support the detection of ADR signals could have a significant impact. We can outline 2 major approaches for ADR signal detection. The first focuses on creating an AI tool that could discover unknown relationships between drugs and conditions, highlighting potential causal associations. The second approach emphasizes AI pipelines tailored to specific ADRs, where the input data for the final ML model are preprocessed based on prior medical knowledge of the drug-event combination.
The AI models identified in this review are generally complex, with ensemble methods such as RF being the most commonly used. A significant number of studies also applied ANNs. As RWD contain a substantial amount of diverse information, the relationships between different features may not be linear. Hence, the use of black box models (eg, ensemble methods and ANNs) is essential for discovering more complicated associations in a dataset beyond linear relationships.
On the basis of the review papers, there is a noticeable lack of use of XAI models (ie, local interpretable model-agnostic explanations and Shapley additive explanations models). Health care professionals highlight the necessity to understand the motifs between AI models’ tasks to accept the decisions made by the algorithms. This could not only lead to the biological translation of the results based on existing knowledge but also reveal new information about a disease, a medication, and so on. In terms of pharmacovigilance, XAI models applied to RWD could bring evidence about unknown confounders in an ADR and provide more informative results for pharmacovigilance experts regarding the causes of a potential pharmacovigilance signal. Although XAI methods’ results are tested extensively in the health care domain, we found that only 6 recent studies (4/36, 11% are included in this review) had applied them in the pharmacovigilance domain [
, , , , , ] (4 in 2021, 2 in 2022). Another novel approach discussed extensively in the explainability field is the newly introduced causal ML or causal deep learning algorithm, which combines AI and causal inference to uncover underlying cause-and-effect relationships between variables. The complexity of RWD presents a challenge that causal ML could potentially solve more efficiently by providing meaningful explanations of the causal relationships between variables [ ]. These innovative AI models could serve as a good hypothesis for future work because they seek and present the relationships between different variables in RWD sources with a more informative structure than traditional AI models. They have already been applied efficiently in pharmacological treatment patterns [ ]. In this review, only 3 (8%) of the 36 studies applied causal deep learning to EHR data [ , , ].Finally, a major problem identified based on the SR findings is the lack of code availability. This issue hinders the reproducibility of the models, preventing further testing on different datasets and raising questions about the developed AI models’ robustness and generalizability.
Strengths and Limitations
The strengths of this review include the use of a considerable number of studies (n=36), providing a thorough knowledge of the specific scientific field. Besides, we compared our findings with those of the most recent review in the same field and analyzed the differences.
Nevertheless, this systematic review has several limitations. First, we only included articles from the MEDLINE database. As such, we may have excluded other existing AI approaches to structural RWD in the field of pharmacovigilance that could be available in AI databases. Second, because of the variation in the articles’ methodologies, we were unable to conduct a meta-analysis of the quantitative results. Finally, it is important to mention the limited number of symbolic AI studies in this review (4/36, 11%) [
, , , ]. The construction of knowledge graphs (KGs) usually requires the use of text mining procedures such as NLP and focused on real text such as clinical notes. As we excluded NLP studies from our query, we assume that this contributed to the small number of symbolic AI articles (n=36) included in our review.Current Gaps and Potential Future Work Paths
Detecting new pharmacovigilance signals using ML approaches requires evidence of a causal association between the suspect drug and the reaction. XAI models can assist pharmacovigilance professionals in this process. To this end, further investigation into causal ML and causal deep learning approaches could be a highly impactful line of research for identifying pharmacovigilance signals from RWD.
Another gap identified in this SR that could indicate future work paths could be the use of multitask learning approaches. Multitask learning is an ML methodology that takes as input 1 dataset to execute multiple prediction tasks. RWD, such as EHR data, are rich data sources that could support >1 task (eg, pharmacovigilance and pharmacoepidemiology); for instance, a multitask learning model could predict an adverse drug event, the severity of an adverse drug event, and the likelihood of the same adverse drug event occurring with other drugs in a patient.
Furthermore, combining ML approaches with symbolic AI is a line of work that offers further potential for exploration. Combining ML with ontologies and automatic reasoning upon KGs could enable new AI approaches (eg, neurosymbolic AI) and provide new insights based on well-established expert knowledge formed as a KG. Moreover, using ontologies and KGs could support integration with other kinds of data sources (eg, data sources containing low-level biochemical or pharmacokinetics and pharmacodynamics information and signaling pathway information).
Finally, exploiting the currently formed federated data networks could also be an interesting area for future research; for example, the European Health Data & Evidence Network is currently setting up a network of >180 data partners across Europe, using the OMOP-CDM as the main data model [
]. The adoption of the OMOP-CDM and the potential exploitation of such data networks would significantly enhance the prospects of potential AI models used for pharmacovigilance.Conclusions
In this paper, we reviewed scientific papers focusing on AI approaches to structured RWD for pharmacovigilance purposes. It should be noted, as a key finding, that most models are designed not for pharmacovigilance signal detection but for personalized ADR prediction. Furthermore, XAI methods and causal ML and causal deep learning are not investigated in depth. Moreover, there are no identified gold standard methodologies for data preprocessing of structured RWD for pharmacovigilance. Finally, an evaluation of the already developed AI models in external data is difficult because of code unavailability and a lack of data access.
Therefore, there is an essential need for more informative XAI models that can be validated on external datasets and for a more detailed description of RWD preprocessing pipelines and methods to examine potential pharmacovigilance signals in clinical practice. Implementing AI approaches in RWD analysis could tackle the problems of pharmacovigilance signaling underreporting and support the vision of personalized ADR management.
Acknowledgments
This work was funded by the Agence nationale de sécurité du médicament et des produits de santé (2016s076) and was supported by a PhD contract with Sorbonne University.
Data Availability
All data generated or analyzed during this study are included in this published paper and its supplementary information files.
Conflicts of Interest
None declared.
Supplementary tables.
XLSX File (Microsoft Excel File), 5587 KBPRISMA-ScR checklist.
DOCX File , 84 KBReferences
- Pharmacovigilance: ensuring the safe use of medicines. World Health Organization. URL: https://www.who.int/publications/i/item/WHOEDM2004.8 [accessed 2024-04-29]
- Just KS, Dormann H, Böhme M, Schurig M, Schneider KL, Steffens M, et al. Personalising drug safety-results from the multi-centre prospective observational study on Adverse Drug Reactions in Emergency Departments (ADRED). Eur J Clin Pharmacol. Mar 12, 2020;76(3):439-448. [CrossRef] [Medline]
- Formica D, Sultana J, Cutroneo PM, Lucchesi S, Angelica R, Crisafulli S, et al. The economic burden of preventable adverse drug reactions: a systematic review of observational studies. Expert Opin Drug Saf. Jul 2018;17(7):681-695. [CrossRef] [Medline]
- Alatawi YM, Hansen RA. Empirical estimation of under-reporting in the U.S. Food and Drug Administration Adverse Event Reporting System (FAERS). Expert Opin Drug Saf. Jul 09, 2017;16(7):761-767. [CrossRef] [Medline]
- Patadia VK, Schuemie MJ, Coloma PM, Herings R, van der Lei J, Sturkenboom M, et al. Can electronic health records databases complement spontaneous reporting system databases? A historical-reconstruction of the association of Rofecoxib and acute Myocardial infarction. Front Pharmacol. 2018;9:594. [FREE Full text] [CrossRef] [Medline]
- Gauffin O, Brand JS, Vidlin SH, Sartori D, Asikainen S, Català M, et al. Supporting pharmacovigilance signal validation and prioritization with analyses of routinely collected health data: lessons learned from an EHDEN network study. Drug Saf. Dec 07, 2023;46(12):1335-1352. [FREE Full text] [CrossRef] [Medline]
- Data analysis and real world interrogation network (DARWIN EU). European Medicines Agency. URL: https://www.ema.europa.eu/en/about-us/how-we-work/big-data/data-analysis-real-world-interrogation-network-darwin-eu [accessed 2024-04-29]
- The FDA's sentinel initiative. HealthAffairs. URL: https://www.healthaffairs.org/content/briefs/fda-s-sentinel-initiative#:~:text=As%20of%20February%202015%2C%20queries,communication%20in%20just%20four%20cases [accessed 2024-04-29]
- Pfaff ER, Girvin AT, Bennett TD, Bhatia A, Brooks IM, Deer RR, et al. N3C Consortium. Identifying who has long COVID in the USA: a machine learning approach using N3C data. Lancet Digit Health. Jul 2022;4(7):e532-e541. [FREE Full text] [CrossRef] [Medline]
- Yang Y, Yuan Y, Zhang G, Wang H, Chen YC, Liu Y, et al. Artificial intelligence-enabled detection and assessment of Parkinson's disease using nocturnal breathing signals. Nat Med. Oct 22, 2022;28(10):2207-2215. [FREE Full text] [CrossRef] [Medline]
- Komorowski M, Celi LA, Badawi O, Gordon AC, Faisal AA. The artificial intelligence clinician learns optimal treatment strategies for sepsis in intensive care. Nat Med. Nov 22, 2018;24(11):1716-1720. [FREE Full text] [CrossRef] [Medline]
- Page MJ, Moher D, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, et al. PRISMA 2020 explanation and elaboration: updated guidance and exemplars for reporting systematic reviews. BMJ. Mar 29, 2021;372:n160. [FREE Full text] [CrossRef] [Medline]
- Ouzzani M, Hammady H, Fedorowicz Z, Elmagarmid A. Rayyan-a web and mobile app for systematic reviews. Syst Rev. Dec 05, 2016;5(1):210. [FREE Full text] [CrossRef] [Medline]
- Natsiavas P, Malousi A, Bousquet C, Jaulent MC, Koutkias V. Computational advances in drug safety: systematic and mapping review of knowledge engineering based approaches. Front Pharmacol. May 17, 2019;10:415. [FREE Full text] [CrossRef] [Medline]
- Lekadir K, Feragen A, Fofanah AJ, Frangi AF, Buyx A, Emelie A, et al. FUTURE-AI: international consensus guideline for trustworthy and deployable artificial intelligence in healthcare. arXiv. Preprint posted online August 11, 2023. [FREE Full text]
- Lavin A, Gilligan-Lee CM, Visnjic A, Ganju S, Newman D, Ganguly S, et al. Technology readiness levels for machine learning systems. Nat Commun. Oct 20, 2022;13(1):6039. [FREE Full text] [CrossRef] [Medline]
- Personeni G, Bresso E, Devignes MD, Dumontier M, Smaïl-Tabbone M, Coulet A. Discovering associations between adverse drug events using pattern structures and ontologies. J Biomed Semantics. Aug 22, 2017;8(1):29. [FREE Full text] [CrossRef] [Medline]
- Anastopoulos IN, Herczeg CK, Davis KN, Dixit AC. Multi-drug featurization and deep learning improve patient-specific predictions of adverse events. Int J Environ Res Public Health. Mar 05, 2021;18(5):2600. [FREE Full text] [CrossRef] [Medline]
- Bagattini F, Karlsson I, Rebane J, Papapetrou P. A classification framework for exploiting sparse multi-variate temporal features with application to adverse drug event detection in medical records. BMC Med Inform Decis Mak. Jan 10, 2019;19(1):7. [FREE Full text] [CrossRef] [Medline]
- Segal G, Segev A, Brom A, Lifshitz Y, Wasserstrum Y, Zimlichman E. Reducing drug prescription errors and adverse drug events by application of a probabilistic, machine-learning based clinical decision support system in an inpatient setting. J Am Med Inform Assoc. Dec 01, 2019;26(12):1560-1565. [FREE Full text] [CrossRef] [Medline]
- Ward IR, Wang L, Lu J, Bennamoun M, Dwivedi G, Sanfilippo FM. Explainable artificial intelligence for pharmacovigilance: what features are important when predicting adverse outcomes? Comput Methods Programs Biomed. Nov 2021;212:106415. [CrossRef] [Medline]
- Zhang W, Peissig P, Kuang Z, Page D. Adverse drug reaction discovery from electronic health records with deep neural networks. Proc ACM Conf Health Inference Learn (2020). Apr 2020;2020:30-39. [FREE Full text] [CrossRef] [Medline]
- Kim Y, Jang JH, Park N, Jeong NY, Lim E, Kim S, et al. Machine learning approach for active vaccine safety monitoring. J Korean Med Sci. Aug 09, 2021;36(31):e198. [FREE Full text] [CrossRef] [Medline]
- Morel M, Bacry E, Gaïffas S, Guilloux A, Leroy F. ConvSCCS: convolutional self-controlled case series model for lagged adverse event detection. Biostatistics. Oct 01, 2020;21(4):758-774. [CrossRef] [Medline]
- Zou B, Mi X, Tighe PJ, Koch GG, Zou F. On kernel machine learning for propensity score estimation under complex confounding structures. Pharm Stat. Jul 22, 2021;20(4):752-764. [FREE Full text] [CrossRef] [Medline]
- Liu L, Yu Y, Fei Z, Li M, Wu FX, Li HD, et al. An interpretable boosting model to predict side effects of analgesics for osteoarthritis. BMC Syst Biol. Nov 22, 2018;12(Suppl 6):105. [FREE Full text] [CrossRef] [Medline]
- Zhu X, Hu J, Xiao T, Huang S, Shang D, Wen Y. Integrating machine learning with electronic health record data to facilitate detection of prolactin level and pharmacovigilance signals in olanzapine-treated patients. Front Endocrinol (Lausanne). Oct 13, 2022;13:1011492. [FREE Full text] [CrossRef] [Medline]
- Kidwai-Khan F, Rentsch CT, Pulk R, Alcorn C, Brandt CA, Justice AC. Pharmacogenomics driven decision support prototype with machine learning: a framework for improving patient care. Front Big Data. Nov 15, 2022;5:1059088. [FREE Full text] [CrossRef] [Medline]
- Sharma V, Kulkarni V, Jess E, Gilani F, Eurich D, Simpson SH, et al. Development and validation of a machine learning model to estimate risk of adverse outcomes within 30 days of opioid dispensation. JAMA Netw Open. Dec 01, 2022;5(12):e2248559. [FREE Full text] [CrossRef] [Medline]
- On J, Park HA, Yoo S. Development of a prediction models for chemotherapy-induced adverse drug reactions: a retrospective observational study using electronic health records. Eur J Oncol Nurs. Feb 2022;56:102066. [CrossRef] [Medline]
- Datta A, Flynn NR, Barnette DA, Woeltje KF, Miller GP, Swamidass SJ. Machine learning liver-injuring drug interactions with non-steroidal anti-inflammatory drugs (NSAIDs) from a retrospective electronic health record (EHR) cohort. PLoS Comput Biol. Jul 6, 2021;17(7):e1009053. [FREE Full text] [CrossRef] [Medline]
- Gibson TB, Nguyen MD, Burrell T, Yoon F, Wong J, Dharmarajan S, et al. Electronic phenotyping of health outcomes of interest using a linked claims-electronic health record database: findings from a machine learning pilot project. J Am Med Inform Assoc. Jul 14, 2021;28(7):1507-1517. [FREE Full text] [CrossRef] [Medline]
- Jeong E, Park N, Choi Y, Park RW, Yoon D. Machine learning model combining features from algorithms with different analytical methodologies to detect laboratory-event-related adverse drug reaction signals. PLoS One. 2018;13(11):e0207749. [FREE Full text] [CrossRef] [Medline]
- Zhao J, Henriksson A, Kvist M, Asker L, Boström H. Handling temporality of clinical events for drug safety surveillance. AMIA Annu Symp Proc. 2015;2015:1371-1380. [FREE Full text] [Medline]
- Zhao J, Henriksson A. Learning temporal weights of clinical events using variable importance. BMC Med Inform Decis Mak. Jul 21, 2016;16 Suppl 2(Suppl 2):71. [FREE Full text] [CrossRef] [Medline]
- Boland MR, Polubriaginof F, Tatonetti NP. Development of a machine learning algorithm to classify drugs of unknown fetal effect. Sci Rep. Oct 09, 2017;7(1):12839. [FREE Full text] [CrossRef] [Medline]
- Wang Z, Poon J, Wang S, Sun S, Poon S. A novel method for clinical risk prediction with low-quality data. Artif Intell Med. Apr 2021;114:102052. [CrossRef] [Medline]
- Li C, Chen L, Chou C, Ngorsuraches S, Qian J. Using machine learning approaches to predict short-term risk of cardiotoxicity among patients with colorectal cancer after starting fluoropyrimidine-based chemotherapy. Cardiovasc Toxicol. Feb 18, 2022;22(2):130-140. [CrossRef] [Medline]
- Jin S, Kostka K, Posada JD, Kim Y, Seo SI, Lee DY, et al. Prediction of major depressive disorder following beta-blocker therapy in patients with cardiovascular diseases. J Pers Med. Dec 18, 2020;10(4):288. [FREE Full text] [CrossRef] [Medline]
- Hansen PW, Clemmensen L, Sehested TS, Fosbøl EL, Torp-Pedersen C, Køber L, et al. Identifying drug-drug interactions by data mining: a pilot study of warfarin-associated drug interactions. Circ Cardiovasc Qual Outcomes. Nov 2016;9(6):621-628. [CrossRef] [Medline]
- Mosa AS, Rana MK, Islam H, Hossain AK, Yoo I. A smartphone-based decision support tool for predicting patients at risk of chemotherapy-induced nausea and vomiting: retrospective study on app development using decision tree induction. JMIR Mhealth Uhealth. Dec 02, 2021;9(12):e27024. [FREE Full text] [CrossRef] [Medline]
- Herrin J, Abraham NS, Yao X, Noseworthy PA, Inselman J, Shah ND, et al. Comparative effectiveness of machine learning approaches for predicting gastrointestinal bleeds in patients receiving antithrombotic treatment. JAMA Netw Open. May 03, 2021;4(5):e2110703. [FREE Full text] [CrossRef] [Medline]
- Pichardo D, Michael R, Mercer M, Korina N, Onukwugha E. Utility of a clinically guided data-driven approach for predicting breast cancer complications: an application using a population-based claims data set. JCO Clin Cancer Inform. Nov 2022;6:e2100191. [FREE Full text] [CrossRef] [Medline]
- Puzhko S, Schuster T, Barnett TA, Renoux C, Munro K, Barber D, et al. Difference in patterns of prescribing antidepressants known for their weight-modulating and cardiovascular side effects for patients with obesity compared to patients with normal weight. J Affect Disord. Dec 01, 2021;295:1310-1318. [CrossRef] [Medline]
- Souissi SB, Abed M, Elhiki L, Fortemps P, Pirlot M. Reducing the toxicity risk in antibiotic prescriptions by combining ontologies with a multiple criteria decision model. AMIA Annu Symp Proc. 2017;2017:1625-1634. [FREE Full text] [Medline]
- Zhou Y, Hou Y, Hussain M, Brown SA, Budd T, Tang WH, et al. Machine learning-based risk assessment for cancer therapy-related cardiac dysfunction in 4300 longitudinal oncology patients. J Am Heart Assoc. Dec 2020;9(23):e019628. [FREE Full text] [CrossRef] [Medline]
- Goyal J, Ng DQ, Zhang K, Chan A, Lee J, Zheng K, et al. Using machine learning to develop a clinical prediction model for SSRI-associated bleeding: a feasibility study. BMC Med Inform Decis Mak. Jun 11, 2023;23(1):105. [FREE Full text] [CrossRef] [Medline]
- Wang Y, Ma J, Ma S, Wang J, Li J. Causal evaluation of post-marketing drugs for drug-induced liver injury from electronic health records. Annu Int Conf IEEE Eng Med Biol Soc. Jul 2023;2023:1-4. [CrossRef] [Medline]
- Hughes JH, Tong DM, Burns V, Daly B, Razavi P, Boelens JJ, et al. Clinical decision support for chemotherapy-induced neutropenia using a hybrid pharmacodynamic/machine learning model. CPT Pharmacometrics Syst Pharmacol. Nov 10, 2023;12(11):1764-1776. [FREE Full text] [CrossRef] [Medline]
- Sharma V, Joon T, Kulkarni V, Samanani S, Simpson SH, Voaklander D, et al. Predicting 30-day risk from benzodiazepine/Z-drug dispensations in older adults using administrative data: a prognostic machine learning approach. Int J Med Inform. Oct 2023;178:105177. [CrossRef] [Medline]
- Akimoto H, Hayakawa T, Nagashima T, Minagawa K, Takahashi Y, Asai S. Detection of potential drug-drug interactions for risk of acute kidney injury: a population-based case-control study using interpretable machine-learning models. Front Pharmacol. May 23, 2023;14:1176096. [FREE Full text] [CrossRef] [Medline]
- Zhang J, Kummerfield E, Hultman G, Drawz PE, Adam TJ, Simon G, et al. Application of causal discovery algorithms in studying the nephrotoxicity of remdesivir using longitudinal data from the EHR. AMIA Annu Symp Proc. 2022;2022:1227-1236. [FREE Full text] [Medline]
- Stang PE, Ryan PB, Racoosin JA, Overhage JM, Hartzema AG, Reich C, et al. Advancing the science for active surveillance: rationale and design for the Observational Medical Outcomes Partnership. Ann Intern Med. Nov 02, 2010;153(9):600-606. [CrossRef] [Medline]
- Chen H, Lundberg SM, Erion G, Kim JH, Lee SI. Forecasting adverse surgical events using self-supervised transfer learning for physiological signals. NPJ Digit Med. Dec 08, 2021;4(1):167. [FREE Full text] [CrossRef] [Medline]
- Kaas-Hansen BS, Gentile S, Caioli A, Andersen SE. Exploratory pharmacovigilance with machine learning in big patient data: a focused scoping review. Basic Clin Pharmacol Toxicol. Mar 03, 2023;132(3):233-241. [FREE Full text] [CrossRef] [Medline]
- Schuemie M, Chen Y, Fridgeirsson E, Kim C, Reps J, Suchard M, et al. Assessment of pre-trained observational large longitudinal models in OHDSI (APOLLO). Observational Health Data Sciences and Informatics. URL: https://www.ohdsi.org/wp-content/uploads/2023/10/110_Schuemie-BriefReport.pdf [accessed 2024-04-29]
- Sharma V, Kulkarni V, Eurich DT, Kumar L, Samanani S. Safe opioid prescribing: a prognostic machine learning approach to predicting 30-day risk after an opioid dispensation in Alberta, Canada. BMJ Open. May 26, 2021;11(5):e043964. [FREE Full text] [CrossRef] [Medline]
- Prosperi M, Guo Y, Sperrin M, Koopman JS, Min JS, He X, et al. Causal inference and counterfactual prediction in machine learning for actionable healthcare. Nat Mach Intell. Jul 13, 2020;2(7):369-375. [CrossRef]
- Belthangady C, Giampanis S, Jankovic I, Stedden W, Alves P, Chong S, et al. Causal deep learning reveals the comparative effectiveness of antihyperglycemic treatments in poorly controlled diabetes. Nat Commun. Nov 14, 2022;13(1):6921. [FREE Full text] [CrossRef] [Medline]
- Jiang G, Liu H, Solbrig HR, Chute CG. Mining severe drug-drug interaction adverse events using semantic web technologies: a case study. BioData Min. Mar 25, 2015;8(1):12. [FREE Full text] [CrossRef] [Medline]
- Pacaci A, Gonul S, Sinaci AA, Yuksel M, Laleci Erturkmen GB. A semantic transformation methodology for the secondary use of observational healthcare data in Postmarketing safety studies. Front Pharmacol. Apr 30, 2018;9:435. [FREE Full text] [CrossRef] [Medline]
- Home page. The European Health Data & Evidence Network. URL: https://www.ehden.eu/ [accessed 2024-04-29]
Abbreviations
ADR: adverse drug reaction |
AI: artificial intelligence |
ANN: artificial neural network |
ConvSCCS: convolutional self-controlled case series |
DARWIN: Data Analysis and Real World Interrogation Network |
EHR: electronic health record |
FUTURE-AI: Fairness, Universality, Traceability, Usability, Robustness, and Explainability–Artificial Intelligence |
ICSR: individual case safety report |
KG: knowledge graph |
ML: machine learning |
NLP: natural language processing |
NSCCS: neural self-controlled case series |
OMOP-CDM: Observational Medical Outcomes Partnership Common Data Model |
PRISMA: Preferred Reporting Items for Systematic Reviews and Meta-Analyses |
PRISMA-ScR: Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews |
RF: random forest |
RWD: real-world data |
SCCS: self-controlled case series |
XAI: explainable artificial intelligence |
Edited by A Coristine; submitted 28.02.24; peer-reviewed by J Fossouo Tagne, T Goto, C Hohl; comments to author 15.08.24; revised version received 03.10.24; accepted 27.10.24; published 30.12.24.
Copyright©Stella Dimitsaki, Pantelis Natsiavas, Marie-Christine Jaulent. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 30.12.2024.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research (ISSN 1438-8871), is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.