Original Paper
Abstract
Background: Sensitive and interpretable machine learning (ML) models can provide valuable assistance to clinicians in managing patients with heart failure (HF) at discharge by identifying individual factors associated with a high risk of readmission. In this cohort study, we delve into the factors driving the potential utility of classification models as decision support tools for predicting readmissions in patients with HF.
Objective: The primary objective of this study is to assess the trade-off between using deep learning (DL) and traditional ML models to identify the risk of 100-day readmissions in patients with HF. Additionally, the study aims to provide explanations for the model predictions by highlighting important features both on a global scale across the patient cohort and on a local level for individual patients.
Methods: The retrospective data for this study were obtained from the Regional Health Care Information Platform in Region Halland, Sweden. The study cohort consisted of patients diagnosed with HF who were over 40 years old and had been hospitalized at least once between 2017 and 2019. Data analysis encompassed the period from January 1, 2017, to December 31, 2019. Two ML models were developed and validated to predict 100-day readmissions, with a focus on the explainability of the model’s decisions. These models were built based on decision trees and recurrent neural architecture. Model explainability was obtained using an ML explainer. The predictive performance of these models was compared against 2 risk assessment tools using multiple performance metrics.
Results: The retrospective data set included a total of 15,612 admissions, and within these admissions, readmission occurred in 5597 cases, representing a readmission rate of 35.85%. It is noteworthy that a traditional and explainable model, informed by clinical knowledge, exhibited performance comparable to the DL model and surpassed conventional scoring methods in predicting readmission among patients with HF. The evaluation of predictive model performance was based on commonly used metrics, with an area under the precision-recall curve of 66% for the deep model and 68% for the traditional model on the holdout data set. Importantly, the explanations provided by the traditional model offer actionable insights that have the potential to enhance care planning.
Conclusions: This study found that a widely used deep prediction model did not outperform an explainable ML model when predicting readmissions among patients with HF. The results suggest that model transparency does not necessarily compromise performance, which could facilitate the clinical adoption of such models.
doi:10.2196/46934
Keywords
Introduction
Unscheduled rehospitalization has garnered significant research attention due to its high cost and its association with unfavorable patient outcomes [
, ]. In particular, the rehospitalization of patients with heart failure (HF) has been a focal point of the investigation, given the alarmingly high unscheduled readmission rates, which recent studies have reported to be around 30% within 3 months of hospital discharge [ - ]. In the context of value-based care, health care providers are actively working to minimize readmission rates and enhance patient care outcomes. Reducing readmissions involves implementing measures both during a patient’s hospital stay and in the postdischarge phase to ensure adherence to care plans and optimize treatment [ ].Numerous risk factors have been identified as being related to high-risk readmissions in patients with HF. Specifically, there are correlations with serum potassium levels, N-terminal prohormone of brain natriuretic peptide (NT-proBNP), and suboptimal medication adherence [
, , ]. Moreover, HF is accompanied by several significant comorbidities, including hypertension, diabetes, chronic kidney disease, and atrial fibrillation, all of which impact both the management and prognosis of the condition [ , ].Machine learning (ML) techniques, especially deep learning (DL), have proven to be highly effective for readmission prediction tasks. This is because the data inputs involved are often intricate and may not be readily discernible by physicians [
- ]. DL methods, in particular, require substantial amounts of data and computational power to uncover latent patterns, eliminating the necessity for domain-specific knowledge. Nevertheless, DL models are often associated with limited explainability owing to their sheer complexity [ ]. Conversely, the abundance of variables within electronic health care records (EHRs) poses a challenge for traditional ML methods, often referred to as shallow ML techniques. Consequently, the feature engineering step becomes essential for extracting meaningful features from raw data, often with the assistance of domain-specific knowledge.Recently, shallow ML has exhibited the potential to surpass state-of-the-art DL techniques when accompanied by effective feature engineering [
- ]. Consequently, the shallow ML approach not only requires less training time but also yields discernible features. Model features would be more straightforward to grasp, offering greater opportunities to interpret the outcomes. Nevertheless, it is crucial that the feature engineering step is performed in collaboration with continuous feedback from clinical experts, ensuring the selection of pertinent variables for modeling the prediction task. Clinicians provide essential guidelines on which data hold clinical significance and how to classify those data. This clinical guidance is crucial for making decisions, such as determining which laboratory values to prioritize and whether transforming continuous values into categories, such as high, normal, or low, can enhance the performance of the ML model.ML has the potential to aid clinicians in the management of patients with HF at the point of discharge by predicting those at an elevated risk of readmission [
]. However, a lingering question is how to quantify the practical utility of these models in actual clinical practice. Previous studies have demonstrated the potential applicability of their developed models in achieving cost savings [ ]. Nonetheless, when it comes to integrating a model into the clinical workflow, it is not just the model’s discrimination ability that holds significance, but also its applicability in real clinical scenarios. To instill confidence in clinicians, it is essential to provide interpretability of the model results, elucidating the individual variables that influence the model’s decisions. This enables clinicians to verify the clinical implications of patient conditions [ ].This study aims to explore the potential of using ML to forecast the risk of readmission due to HF deterioration within 100 days after discharge. This investigation involves a comparison between shallow and deep models applied to real-world health care data. Furthermore, our objective is to enhance the interpretability of these models to ensure they are clinically reliable and actionable.
Methods
Data Source
This study is based on retrospective data from the Regional Health Care Information Platform in Region Halland, an integrated care system located in southwestern Sweden [
]. The system includes data from both the primary and secondary health care levels, including all prescribed medications, clinical examination results (eg, laboratory assessments and radiological examinations), and care delivery resources. Diagnoses (including procedures) and medications were presented according to standard schemas: International Classification of Diseases, revision 10, Sweden (ICD-10-SE) and Anatomical Therapeutic Chemical (ATC) classification system codes, respectively.Ethics Considerations
The study was approved by the Swedish Ethical Review Board, Stockholm Department 2 Medicine (registration number 2020-00455). Informed consent was waived because this was a retrospective study that the Swedish Ethical Review Board approved. All the methods in this study were performed in accordance with relevant guidelines and regulations.
Study Population
The cohort for this study comprised individuals who had been diagnosed with HF based on ICD-10-SE codes (I11.0, I42, I43, and I50; see Table S4 in
), were residents receiving care within the Region Halland, and met specific inclusion criteria (ie, aged ≥40 years and had experienced at least one hospital admission after their HF diagnosis between January 1, 2017, and December 31, 2019). We considered all-cause hospitalization. In our study, for each admission within the cohort, we took into account all of the patient’s prior admissions within a 5-year period preceding the current admission (referred to as the lookback period). It is important to note that these previous admissions were not treated as events within our study; instead, they were exclusively used as historical data.We excluded the following from our analysis: hospitalizations that occurred before the patient’s initial HF diagnosis, hospitalizations of patients younger than 40 years of age at the index admission, hospitalizations in which patients passed away before discharge, and hospitalizations in which patients passed away within 100 days following discharge. Additionally, hospitalizations with a length of stay exceeding 31 days were excluded, as they exhibited a higher likelihood of being influenced by multiple coexisting diseases, as well as factors such as frailty and other medical conditions. The study cohort consisted of 6040 patients, with 2728 (45.02%) being women. These patients collectively contributed to a total of 20,598 hospital admissions. However, after implementing our exclusion criteria, as outlined in Figure S1 in
, we considered 15,612 (75.79%) of these admissions for our analysis.Patient Variables and Data Definitions
We collected variables from the Regional Health Care Information Platform that can have a significant influence on readmission prediction as shown in
- . The variables considered were patient demographics, comorbidities, medications, and laboratory results. The choice of variables was made through a collaborative process involving both clinical expertise and data science knowledge. For example, we selected comorbidities including hypertension, chronic kidney disease, and atrial fibrillation. We also applied the Charlson Comorbidity Index (CCI) to compute a score associated with different medical comorbid conditions, such as liver disease, diabetes, and obstructive pulmonary disease [ , ]. These comorbidity variables consider the complete patient history, including all visits documented in the EHR system, encompassing specialist and primary care visits. We used ICD-10-SE codes to calculate these comorbidity features, and additional information can be found in Table S2 in .Variables | Total (N=15,612) | Not readmitted (n=10,015) | Readmitted (n=5597) | P value | |||||
Demographics at index | |||||||||
Sex: Female, n (%) | 6905 (44.23) | 4472 (44.65) | 2433 (43.47) | .34 | |||||
Age (years), mean (SD) | 79.1 (10.4) | 79 (10.5) | 79.3 (10.3) | .13 | |||||
Duration of heart failure when admitted (days), mean (SD) | 1149.9 (1036.6) | 1133.2 (1032.3) | 1179.9 (1043.5) | <.001 | |||||
Comorbidities, n (%) | |||||||||
Hypertension | 12,948 (82.94) | 8219 (82.07) | 4729 (84.49) | .11 | |||||
Ischemic heart disease | 8275 (53.00) | 5132 (51.24) | 3143 (56.16) | <.001 | |||||
Cerebrovascular insult | 3878 (24.84) | 2397 (23.93) | 1481 (26.46) | .002 | |||||
Valvular heart disease | 4303 (27.56) | 2563 (25.59) | 1740 (31.09) | <.001 | |||||
Peripheral artery disease | 1470 (9.42) | 863 (8.62) | 607 (10.85) | <.001 | |||||
Atrial fibrillation | 10,001 (64.06) | 6047 (60.38) | 3954 (70.64) | <.001 | |||||
Diabetes mellitus | 4828 (30.92) | 2944 (29.40) | 1884 (33.66) | <.001 | |||||
Chronic obstructive pulmonary disease | 3727 (23.87) | 2088 (20.85) | 1639 (29.28) | <.001 | |||||
Chronic kidney disease | 6007 (38.48) | 3429 (34.24) | 2578 (46.06) | <.001 | |||||
Charlson Comorbidity Index, mean (SD) | 3.0 (1.9) | 2.7 (1.9) | 3.5 (1.8) | <.001 |
Variables | Total (N=15,612) | Not readmitted (n=10,015) | Readmitted (n=5597) | P value | |||||
N-terminal prohormone of brain natriuretic peptide, n (%) | |||||||||
HFb likely | 4439 (28.43) | 2494 (24.90) | 1945 (34.75) | <.001 | |||||
Gray zone for HF | 1261 (8.08) | 839 (8.38) | 422 (7.54) | .08 | |||||
HF unlikely | 222 (1.42) | 169 (1.69) | 53 (0.95) | <.001 | |||||
Sodium, n (%) | |||||||||
High (value>145) | 359 (2.30) | 190 (1.90) | 169 (3.02) | <.001 | |||||
Normal (137≤value≤145) | 8730 (55.92) | 5539 (55.31) | 3191 (57.01) | .17 | |||||
Low (<137) | 2171 (13.91) | 1297 (12.95) | 874 (15.62) | <.001 | |||||
Potassiumc (mmol/L), n (%) | |||||||||
High (value>4.7) | 940 (6.02) | 549 (5.48) | 391 (6.99) | <.001 | |||||
Normal (3.5≤value≤4.7) | 11,796 (75.56) | 7497 (74.86) | 4299 (76.81) | .18 | |||||
Low (value<3.5) | 1236 (7.92) | 773 (7.72) | 463 (8.27) | .24 | |||||
Ferritin <100 ng/mL, n (%) | |||||||||
Abnormal | 606 (3.88) | 349 (3.48) | 257 (4.59) | <.001 | |||||
Normal | 1109 (7.10) | 668 (6.67) | 441 (7.88) | .007 | |||||
Estimated glomerular filtration rate, n (%) | |||||||||
Extreme (value<30) | 2437 (15.61) | 1305 (13.03) | 1132 (20.23) | <.001 | |||||
Abnormal (30≤value<60) | 6946 (44.49) | 4444 (44.37) | 2502 (44.70) | .77 | |||||
Normal (60≤value) | 4358 (27.91) | 2937 (29.33) | 1421 (25.39) | <.001 |
aUnreported laboratory values were not included in the statistics presented.
bHF: heart failure.
cSome cases during 2017 were evaluated with 3.2 as the starting range to have a normal value. This applies to the low range as well.
Variables | Total (N=15,612) | Not readmitted (n=10,015) | Readmitted (n=5597) | P value | ||||
Discharge | Medication history within 1 year | Discharge | Medication history within 1 year | Discharge | Medication history within 1 year | Discharge | Medication history within 1 year | |
β-blockers, n (%) | 7915 (50.70) | 9063 (58.05) | 5228 (52.20) | 5755 (57.46) | 2687 (48.01) | 3308 (59.10) | <.001 | .20 |
Angiotensin-converting enzyme inhibitors, n (%) | 3270 (20.95) | 4291 (27.49) | 2297 (22.94) | 2784 (27.80) | 973 (17.38) | 1507 (26.93) | <.001 | .32 |
Angiotensin receptor blockers, n (%) | 1979 (12.68) | 2857 (18.30) | 1392 (13.90) | 1877 (18.74) | 587 (10.49) | 980 (17.51) | <.001 | .08 |
Angiotensin receptor neprilysin inhibitors, n (%) | 302 (1.93) | 390 (2.50) | 204 (2.04) | 241 (2.41) | 98 (1.75) | 149 (2.66) | .22 | .33 |
Mineralocorticoid receptor antagonists, n (%) | 3764 (24.11) | 4408 (28.23) | 2401 (23.97) | 2544 (25.40) | 1363 (24.35) | 1864 (33.30) | .64 | <.001 |
Loop diuretics, n (%) | 7264 (46.53) | 8411 (53.88) | 4588 (45.81) | 5080 (50.72) | 2676 (47.81) | 3331 (59.51) | .08 | <.001 |
Digoxin, n (%) | 1110 (7.11) | 1491 (9.55) | 733 (7.32) | 901 (9.00) | 377 (6.74) | 590 (10.54) | .19 | .003 |
SGLT-2b, n (%) | 127 (0.81) | 182 (1.17) | 102 (1.02) | 134 (1.34) | 25 (0.45) | 48 (0.86) | <.001 | .008 |
aThe statistics are based on prescriptions documented in the system. There is a possibility that patients already have the medication and have no need to get new prescriptions at discharge (ie, medication history within 1 year>discharge).
bSGLT-2: sodium-glucose cotransporter-2.
Variables | Total (N=15,612) | Not readmitted (n=10,015) | Readmitted (n=5597) | P value | |
Administration features (lookback period=5 years) | |||||
Number of all-cause hospitalizations, mean (SD) | 5.7 (7.8) | 4.4 (5.5) | 8.1 (10.3) | <.001 | |
Number of emergency visits (in the last 6 months), mean (SD) | 2.3 (2.3) | 1.9 (1.8) | 3.1 (3.0) | <.001 | |
Number of procedures per admission, mean (SD) | 2.3 (2.7) | 2.4 (2.7) | 2.3 (2.6) | .02 | |
Total length of stay during the lookback period per patient (days), mean (SD) | 26.0 (32.7) | 20.0 (24.5) | 36.8 (41.6) | <.001 | |
Hospitalization type (index admission) | |||||
Length of stay for index admission, mean (SD) | 5.2 (5.2) | 5.0 (5.1) | 5.5 (5.2) | <.001 |
We collected the most recent results of specific laboratory tests conducted before discharge, including NT-proBNP [
], sodium, potassium, ferritin, and estimated glomerular filtration rate. Notably, NT-proBNP values were categorized into normal, gray zone, and elevated ranges [ ]. The rest of the laboratory values were categorized into normal and abnormal ranges according to international clinical recommendations (eg, extreme, low, and high), focusing on extracting the correlations between abnormal laboratory values and readmission risk. For the missing laboratory values, we adopted the assumption that they were within the normal range, a methodology consistent with previous studies [ , ]. Furthermore, we gathered information about medications prescribed at the time of discharge, including β-blockers, renin-angiotensin-aldosterone system inhibitors, mineralocorticoid receptor antagonists, loop diuretics, and sodium-glucose cotransporter-2 (SGLT-2) inhibitors. Detailed specifications and ATC codes can be found in Table S3 in . In addition, we considered the patient’s medication history by introducing variables to indicate whether each medication had been prescribed at discharge within the previous year for each admission.We also incorporated a few administrative features related to hospital utilization during the lookback period. These included the total length of stay for inpatient care, the count of outpatient visits (including hospital-associated and primary health care visits), and the number of all-cause hospitalizations.
- display the baseline characteristics of the HF cohort and the engineered features used for training the ML models. It is worth noting that the same patient cohort has been previously investigated, and the clinical variables used in this study have been identified as being linked to the estimation of 100-day readmission risk [ ].The target label, termed “readmission risk,” is defined as a binary variable. A default value of 0 signifies low risk. However, this label is set to 1 if a patient experiences an unplanned admission within 100 days following their discharge. It is important to note that cases in which patients were discharged to another health care facility or in-hospital transfers were not regarded as unplanned readmissions.
Modeling Strategies
We conducted a comparison between the 2 modeling approaches. Our initial model was constructed using CatBoost (Yandex), representing a shallow model developed through traditional ML techniques. CatBoost uses gradient boosting decision trees, renowned for their capacity to handle both categorical and continuous features [
]. The CatBoost model generates predictions by using a series of decision trees, rendering it a more explainable model. By contrast, deep models, specifically recurrent neural networks such as long short-term memory (LSTM) networks [ ], have garnered significant attention due to their ability to model the sequential nature inherent in EHRs [ , , ]. Hence, our second model is constructed as an LSTM network, as depicted in Figure S2 in . The LSTM model is designed to learn intricate nonlinear associations between the model variables and outcomes, necessitating the establishment of a surrogate model for providing explanations. In this research, we opted for the LSTM network architecture for DL. Nonetheless, it is worth noting that sequential information regarding patient visits can also be effectively modeled using other recurrent neural network architectures [ ].illustrates the various levels of analysis conducted in this study. As we used 2 distinct data modeling strategies, we organized the data into 2 different formats. For the shallow model, the input data were structured as a single record containing a list of engineered features, computed based on predefined variables outlined in - . Conversely, the input for the DL model comprised raw structured EHR-based data for a sequence of 5 admissions, consisting of the index admission and the 4 prior ones. For each of these admissions, we treated the associated diagnoses as a list of ICD-10-SE codes. Similarly, we regarded procedures, medications, and abnormal laboratory test results as lists in the DL model. However, the shallow model involved an additional preliminary step, the feature engineering phase, in which we computed the values of engineered features. The primary objective of this phase is to convert raw data into a collection of computed features, which are utilized as input for the CatBoost model. Subsequently, we progress through the next 3 phases for both models: input provision, model calibration, and the extraction of explanations for model decisions.
Apart from the models we developed, we also assessed the LACE (Length of stay, Acuity of index admission, CCI, and number of emergency department [ED] visits in the last 6 months) index as an evaluation tool. The LACE index is a tool used for predicting unscheduled readmission, utilizing readily available clinical predictors that can be calculated before a patient’s discharge [
]. The LACE index is formulated using 4 variables to estimate the risk of either death or non-elective readmission following a patient’s discharge from the hospital. These variables are the length of stay, acuity of the index admission, the CCI, and the count of ED visits within the last 6 months [ ].Model Explainability
We used the Shapley Additive Explanations (SHAP) technique to offer insights into model decisions by revealing the most important features for readmission prediction. The SHAP functions as a model-agnostic explanation tool, providing both local and global explanations. The SHAP uses coalitional game theory and offers a model-agnostic approach for computing both the overall behavior of the model and local explanations for specific readmissions [
]. The global explanations furnish a roster of critical features that either contribute to or detract from the risk of readmission. The technique also assigns a score to each prediction through the calculation of additive feature importance. This score can highlight the risk factors that are either positively or negatively influencing the readmission risk for a specific admission. Additionally, clinicians evaluated the explanations offered by each model from a clinical perspective. In particular, clinicians provided feedback on the most critical features identified by SHAP in the context of their clinical assessment of patient conditions. They assessed whether these features could serve as indicators or warnings of undesirable outcomes.Statistical Analysis
We divided the data into 2 segments. The first subset, spanning from January 1, 2017, to August 31, 2019, consisting of 13,800/15,612 admissions (88.39%), was used for model training. The second, more recent subset, spanning from September 1, 2019, to December 31, 2019, containing 1812/15,612 admissions (11.61%), was reserved for holdout testing. We fine-tuned the hyperparameters of each model using the grid search method, which conducts a thorough exploration of a predefined set of hyperparameter combinations. To train the models, we implemented a stratified 10-fold cross-validation approach, where the training data were subdivided into 10 segments. In each iteration, a fold consisting of randomly chosen 90% (12,420/13,800) admissions of the training data was used for model training, while the remaining admissions (1380/13,800, 10%) were used for validation. This process was repeated 10 times, with a different 10% of the data reserved for validation in each iteration. We used a sensitivity-based training strategy to optimize the sensitivity of the trained models, making them particularly relevant in clinical operations. Of the 10-fold cross-validation training process, we selected the model that achieved the highest sensitivity while maintaining a minimum specificity of 50%.
The assessment of predictive model performance is founded on widely used performance metrics, including sensitivity, specificity, F1-score, receiver operating characteristic curves, the area under the receiver operating characteristic curve (AUC), and the area under the precision-recall curve, all of which are reported with 95% CIs. Additionally, we calculated the number needed to screen, the number needed to diagnose, and the number needed to predict [
].Results
Patient Variables and Data Definitions
The features of admissions, encompassing both unscheduled readmissions and planned admissions within the following 100 days after discharge, are detailed in
- . As indicated, the instances of readmitted patients exhibited a notably higher prevalence of comorbidities in comparison to those cases that did not result in readmission.Model Evaluation
presents the performance achieved by the models, with the characteristics of the training and holdout test cohorts detailed in Table S1 in . It is worth noting that CatBoost outperformed LSTM in both the validation and holdout test sets, achieving superior sensitivity and specificity. The sensitivity-based training approach led to an increased number of false positives, which are misclassified admissions labeled as unplanned readmissions, in both models, consequently resulting in lower specificity. Concerning the other metrics, the performance of both models is nearly identical. In both cases, the numbers needed to screen/diagnose/predict were approximately 4-5. Notably, the models we developed exhibited superior performance when compared with the LACE index. We used a threshold of 10 to identify high-risk readmissions based on the LACE scores index [ ].
Metric | Deep (LSTMb) model | Shallow (CatBoost) model | LACEc score | |
Validationd | ||||
Sensitivity, mean (SD) | 0.83 (0.019) | 0.84 (0.016) | 0.38 (0.026) | |
Specificitye | 0.51 (0.022) | 0.52 (0.025) | 0.77 (0.017) | |
AUCf | 0.67 (0.008) | 0.68 (0.014) | 0.57 (0.015) | |
F1-score | 0.61 (0.017) | 0.62 (0.021) | 0.42 (0.026) | |
AUPRCg | 0.69 (0.014) | 0.70 (0.016) | 0.54 (0.026) | |
Holdouth | ||||
Sensitivity (95% CI) | 0.78 (0.72-0.78) | 0.83 (0.79-0.86) | 0.35 (0.31-0.38) | |
Specificitye (95% CI) | 0.53 (0.53-0.58) | 0.50 (0.47-0.52) | 0.78 (0.76-0.80) | |
AUC (95% CI) | 0.65 (0.63-0.67) | 0.66 (0.64-0.68) | 0.56 (0.54-0.58) | |
F1-score (95% CI) | 0.58 (0.55-0.61) | 0.60 (0.57-0.63) | 0.39 (0.36-0.42) | |
AUPRC (95% CI) | 0.66 (0.63-0.68) | 0.68 (0.66-0.70) | 0.51 (0.48-0.54) | |
Number needed to screen (95% CI) | 4.33 (3.43-4.95) | 4.49 (3.81-5.68) | 3.84 (3.27-4.59) | |
Number needed to diagnose (95% CI) | 3.24 (2.81-3.77) | 3.06 (2.77-3.54) | 7.90 (5.90-12.14) | |
Number needed to predict (95% CI) | 3.50 (3.04-4.15) | 3.20 (2.89-3.62) | 6.79 (5.08-10.65) |
aItalicized values represent the best model over that metric.
bLSTM: long short-term memory.
cLACE: Length of stay, Acuity of index admission, CCI (Charlson Comorbidity Index), and number of ED (emergency department) visits in the last 6 months.
dThe mean and SD of different metrics obtained during 10-fold cross-validation training are presented.
eThe reason behind lower specificity ratios for both LSTM and CatBoost models is due to sensitivity-based training; hence a lower specificity (minimum of 0.50) is accepted to reach higher sensitivity.
fAUC: area under the receiver operating characteristic curve.
gAUPRC: area under the precision-recall curve.
hThe performance on the holdout test cohort for the model that achieved the highest sensitivity with a minimum specificity of 50% after 10-fold cross-validation training is reported.
As aforementioned, we opted for sensitivity over specificity to be relevant in clinical operations, thus, our models generate more false positives than the LACE index. The LACE index is not trainable compared with the proposed CatBoost and LSTM models, yet it is commonly used to estimate readmission risks in general. However, a study on patients with HF showed that high LACE scores were associated with higher rates of ED revisits after hospital discharge but they had less accuracy in predicting readmissions [
].The lower specificity ratios observed for both LSTM and CatBoost models are attributed to the sensitivity-based training approach we used. This approach prioritizes achieving a higher sensitivity, and as a result, accepting a lower specificity (with a minimum of 0.50) is part of the strategy to achieve this goal.
Model Explainability
provides an overview of the variables of importance identified by the CatBoost and LSTM models, categorized into several main groups. Given that we used 2 distinct input formats for these models, our interest lies in understanding the commonalities and disparities in what they have identified. Both models concurred in identifying the most critical variables for predicting readmission. These variables included a higher number of previous readmissions, emergency admissions, longer lengths of stay, and age. By contrast, the history of medications was assigned a lower level of importance by both models. Furthermore, both models highlighted extreme values of NT-proBNP as an influential variable pushing toward readmission. Additionally, both models indicated that patients prescribed loop diuretics at discharge were at a higher risk of readmission. However, CatBoost identified angiotensin-converting enzyme inhibitors, mineralocorticoid receptor antagonists, β-blockers, and angiotensin receptor neprilysin inhibitor treatments at discharge as factors associated with a lower risk of readmission.
For the CatBoost model, the CCI was identified as the most critical feature. This feature, however, was not available for the LSTM model. Conversely, the LSTM model placed greater importance on diagnosis codes compared with other clinical information such as abnormal laboratory test results and medication codes. Interestingly, in both models, patient age was associated with an increased risk of readmission, while the duration of HF appeared to reduce this risk. You can find the top 28 features highlighted by both models in Figures S3 and S4 in
.Local explanations are valuable for enabling health care professionals to comprehend the rationale behind model predictions and make informed decisions. In
, local explanations are provided for a particular admission where both models correctly classified it as a high-risk readmission. In the case presented, the 2 most important features were identical in both models: the number of previous unscheduled readmissions and the number of ED visits. Additionally, both models found that the presence of chronic kidney disease and abnormal laboratory test values of NT-proBNP (indicative of HF) contributed to an increased risk of readmission. Conversely, a short length of stay, specifically 1 day in this instance, exerted a negative impact on the risk of readmission. Additional examples, where the models disagree on the target label, can be found in Figures S5 and S6 in .Discussion
Principal Findings
Our primary motivation for this study is to develop models that can effectively adapt to patterns within patients with HF data. Given the absence of prior research demonstrating that ML models can substantially reduce readmission, our study focuses on the comparison of shallow and DL models in the context of patients with HF data. This study encompassed a large HF cohort, with a readmission rate of 5597/15,612 (35.85%) per year, which aligns with findings from comparative studies [
, , ]. Notably, the more interpretable shallow model exhibited superior performance compared with the deep model, and both models significantly outperformed conventional scores in terms of the numbers needed to screen for identifying readmission within 100 days. Consequently, this study does not suggest that achieving enhanced prediction performance necessitates sacrificing model transparency.The results indicate that the shallow model, while more interpretable, performs as effectively as the deep model. However, it is important to note that the shallow model necessitates a feature engineering process to derive useful features from raw EHR data. By contrast, the input to the deep model closely resembles the original EHR data, enabling a comparison of the patterns learned from raw data against those generated by engineered features calculated with the guidance of expert knowledge. Interestingly, the explainability results reveal that there is an overlap in the most critical features identified by both models.
Good explanations can serve as an advantage in building trust among clinicians and guiding interventions. Understanding the features that influenced the prediction is crucial, as previous reports have emphasized the importance of comprehending the models to establish trust in them [
, ]. The selection of variables used for model training was made to support generalizability and the potential transfer of the model to new settings. ICD-10-SE codes, ATC codes, and standard laboratory values are expected to be available in most acute care venues, enhancing the applicability of the model.Model Explainability and Clinical Insights
In addition to clinically informative variables, administrative features exhibit significant importance in both models, consistent with findings from prior studies [
, , ]. It appears that a history of previous hospital admissions, readmissions, and visits to the ED are associated with an elevated risk of readmission. Both models underscore the connection between patients with frequent inpatient episodes, often indicative of a high disease burden, and features that are not easily altered, such as age and comorbidity. Nevertheless, these features serve as crucial warning signals that should be considered when planning hospital discharge. As a result, the feature engineering process in this study was continually refined to encompass additional clinically relevant features. Given that previous readmissions and ED visits are robust indicators of potential new events, the factors contributing to past unplanned health care encounters could serve as key targets for action. Consequently, further investigation into these factors, with an emphasis on causal inference, is warranted to improve patient outcomes and reduce readmission rates.Previous studies have demonstrated the positive impact of HF medication in preventing readmission [
]. In addition to medical treatment, factors such as continuity and availability of care are valuable tools for these patients. Insights revealing weak associations, such as the finding that guideline-compliant pharmaceutical treatment is only weakly linked to readmission risk, should emphasize the importance of focusing on nonmedical strategies to prevent readmissions alongside optimizing drug regimens, with an emphasis on drugs known to reduce readmission rates [ - ]. We should also not overlook previous findings that sodium and potassium levels, as well as NT-proBNP, are associated with the outcomes, as confirmed in this report. Moreover, the use of β-blockers and other first-line treatments can have an impact on patient outcomes. These factors should continue to be considered in patient care and intervention strategies [ , , , ].Our focus on readmission serves a dual purpose: it acts as an indicator of health outcomes and, simultaneously, as an encounter that contributes to resource utilization, potentially conflicting with other health care priorities. The need to screen (and subsequently intervene with) approximately 4.5 patients to identify and target 1 potentially preventable readmission should be recognized as a compelling reason for further interventional studies, where the cost of the intervention is assessed. One potential intervention approach could involve care planning tailored to the identified drivers of readmission probability as determined in this study.
Limitations
In this study, we did not encompass aspects of care beyond hospitals and primary health care. As readmissions were only weakly associated with pharmaceutical treatment and several other predictive factors were not easily amenable to intervention, strategies to reduce readmissions may need to concentrate on home care and enhancing patient comfort. However, it is important to note that these aspects were not within the scope of this study.
Moreover, the number of patients on SGLT-2 inhibitors in our study was too small to draw definitive conclusions about their impact on readmissions. However, other studies suggest that this medication warrants further investigation in real-world health care settings. It is important to emphasize that the effectiveness of treatment with drugs such as SGLT-2 inhibitors is not questioned according to guidelines and remains one of the major tools for improving patient outcomes.
Comparison With Prior Work
A broad spectrum of predictive modeling methods, including statistical approaches and conventional ML and DL methods, have been used to tackle the readmission prediction problem [
]. There is a prevailing trend in the literature toward the utilization of traditional statistical and conventional ML models. These models are favored because they offer greater transparency, are easier to understand, and have been shown to outperform DL models in certain contexts [ , , ]. Prior studies have often concentrated on comparing the predictive performance of different models, often neglecting to investigate their interpretability and practical usefulness in real clinical settings. However, with the rapid advancements in explainable artificial intelligence and the growing adoption of model-agnostic interpretation techniques, even complex DL models can offer both local and global explanations, as demonstrated in this study.Conclusions
In clinical prediction models, there can be a trade-off between precision and explainability. When models exhibit similar prediction performance, the less opaque or more interpretable model is often favored. In the presented case, the shallow ML model performs at a similar level to a deeper model and significantly outperforms traditional scoring methods. Achieving explainability requires the collaborative effort of clinicians through feature engineering. However, it is worth noting that many of the features driving the predictions in this study may not be easily actionable. Enhancing compliance with treatment guidelines can lead to improvements, but for more substantial impacts, exploring new drugs and interventions beyond the scope of this study should also be considered.
Acknowledgments
This analysis was supported by a grant from AstraZeneca (number 68196919). The funder was not involved in the research work.
Data Availability
The real-world data used in the analysis are protected to comply with Swedish integrity-protecting regulations, and thus, data cannot be accessible to the public due to the requirements of the Swedish Health and Medical Services Act.
Authors' Contributions
AS, BA, KE, OH, and ML participated in the conceptualization, methodology, formal analysis, investigation, writing—original draft, and writing—review and editing. OH contributed to data curation, software analysis, and data visualization.
Conflicts of Interest
None declared.
Data analysis and classification codes.
DOCX File , 518 KBReferences
- Lesyuk W, Kriza C, Kolominsky-Rabas P. Cost-of-illness studies in heart failure: a systematic review 2004-2016. BMC Cardiovasc Disord. May 02, 2018;18(1):74. [FREE Full text] [CrossRef] [Medline]
- Yasin Z, Anderson P, Lingman M, Kwatra J, Ashfaq A, Slutzman JE, et al. Receiving care according to national heart failure guidelines is associated with lower total costs: an observational study in Region Halland, Sweden. Eur Heart J Qual Care Clin Outcomes. May 03, 2021;7(3):280-286. [CrossRef] [Medline]
- Wideqvist M, Cui X, Magnusson C, Schaufelberger M, Fu M. Hospital readmissions of patients with heart failure from real world: timing and associated risk factors. ESC Heart Fail. Apr 2021;8(2):1388-1397. [FREE Full text] [CrossRef] [Medline]
- Gracia E, Singh P, Collins S, Chioncel O, Pang P, Butler J. The vulnerable phase of heart failure. Am J Ther. 2018;25(4):e456-e464. [FREE Full text] [CrossRef] [Medline]
- Greene SJ, Fonarow GC, Vaduganathan M, Khan SS, Butler J, Gheorghiade M. The vulnerable phase after hospitalization for heart failure. Nat Rev Cardiol. Apr 2015;12(4):220-229. [CrossRef] [Medline]
- Cui X, Thunström E, Dahlström U, Zhou J, Ge J, Fu M. Trends in cause-specific readmissions in heart failure with preserved vs. reduced and mid-range ejection fraction. ESC Heart Fail. Oct 2020;7(5):2894-2903. [FREE Full text] [CrossRef] [Medline]
- Adamo M, Gardner RS, McDonagh TA, Metra M. The 'Ten Commandments' of the 2021 ESC Guidelines for the diagnosis and treatment of acute and chronic heart failure. Eur Heart J. Feb 10, 2022;43(6):440-441. [CrossRef] [Medline]
- Allam A, Nagy M, Thoma G, Krauthammer M. Neural networks versus logistic regression for 30 days all-cause readmission prediction. Sci Rep. Jun 26, 2019;9(1):9277. [FREE Full text] [CrossRef] [Medline]
- Min X, Yu B, Wang F. Predictive modeling of the hospital readmission risk from patients' claims data using machine learning: a case study on COPD. Sci Rep. Feb 20, 2019;9(1):2362. [FREE Full text] [CrossRef] [Medline]
- Chen Z, Lai C, Ren J. Hospital readmission prediction based on long-term and short-term information fusion. Applied Soft Computing. Nov 2020;96:106690. [CrossRef]
- Miswan NH, Chan CS, Ng CG. Hospital readmission prediction based on improved feature selection using grey relational analysis and LASSO. GS. May 04, 2021;11(4):796-812. [CrossRef]
- Bote-Curiel L, Muñoz-Romero S, Gerrero-Curieses A, Rojo-Álvarez JL. Deep learning and big data in healthcare: a double review for critical beginners. Applied Sciences. Jun 06, 2019;9(11):2331. [CrossRef]
- Jiang W, Siddiqui S, Barnes S, Barouch LA, Korley F, Martinez DA, et al. Readmission risk trajectories for patients with heart failure using a dynamic prediction approach: retrospective study. JMIR Med Inform. Sep 16, 2019;7(4):e14756. [FREE Full text] [CrossRef] [Medline]
- Teo K, Yong CW, Chuah JH, Hum YC, Tee YK, Xia K, et al. Current trends in readmission prediction: an overview of approaches. Arab J Sci Eng. Aug 16, 2021;48(8):1-18. [FREE Full text] [CrossRef] [Medline]
- Liu C, Chen Z, Kuo S, Lin T. Does AI explainability affect physicians' intention to use AI? Int J Med Inform. Dec 2022;168:104884. [CrossRef] [Medline]
- Ashfaq A, Lönn S, Nilsson H, Eriksson JA, Kwatra J, Yasin ZM, et al. Data resource profile: regional healthcare information platform in Halland, Sweden. Int J Epidemiol. Jun 01, 2020;49(3):738-739f. [CrossRef] [Medline]
- Charlson M, Szatrowski TP, Peterson J, Gold J. Validation of a combined comorbidity index. J Clin Epidemiol. Nov 1994;47(11):1245-1251. [CrossRef] [Medline]
- Shuvy M, Zwas DR, Keren A, Gotsman I. The age-adjusted Charlson Comorbidity Index: a significant predictor of clinical outcome in patients with heart failure. Eur J Intern Med. Mar 2020;73:103-104. [CrossRef] [Medline]
- Mueller C, McDonald K, de Boer RA, Maisel A, Cleland JG, Kozhuharov N, et al. Heart Failure Association of the European Society of Cardiology. Heart Failure Association of the European Society of Cardiology practical guidance on the use of natriuretic peptide concentrations. Eur J Heart Fail. Jun 20, 2019;21(6):715-731. [FREE Full text] [CrossRef] [Medline]
- Wang D, Willis DR, Yih Y. The pneumonia severity index: assessment and comparison to popular machine learning classifiers. Int J Med Inform. Jul 2022;163:104778. [CrossRef] [Medline]
- Ashfaq A, Sant'Anna A, Lingman M, Nowaczyk S. Readmission prediction using deep learning on electronic health records. J Biomed Inform. Sep 2019;97:103256. [FREE Full text] [CrossRef] [Medline]
- Davidge J, Halling A, Ashfaq A, Etminani K, Agvall B. Clinical characteristics at hospital discharge that predict cardiovascular readmission within 100 days in heart failure patients - an observational study. Int J Cardiol Cardiovasc Risk Prev. Mar 2023;16:200176. [FREE Full text] [CrossRef] [Medline]
- Prokhorenkova L, Gusev G, Vorobev A, Dorogush AV, Gulin A. CatBoost: unbiased boosting with categorical features. In: NIPS'18: Proceedings of the 32nd International Conference on Neural Information Processing Systems. Red Hook, NY. Curran Associates Inc; Presented at: 32nd International Conference on Neural Information Processing Systems; December 3-8, 2018, 2018;6639-6649; Montréal, Canada.
- Hochreiter S, Schmidhuber J. Long short-term memory. Neural Comput. Nov 15, 1997;9(8):1735-1780. [CrossRef] [Medline]
- Wang H, Robinson RD, Johnson C, Zenarosa NR, Jayswal RD, Keithley J, et al. Using the LACE index to predict hospital readmissions in congestive heart failure patients. BMC Cardiovasc Disord. Aug 07, 2014;14:97. [FREE Full text] [CrossRef] [Medline]
- van Walraven C, Dhalla IA, Bell C, Etchells E, Stiell IG, Zarnke K, et al. Derivation and validation of an index to predict early death or unplanned readmission after discharge from hospital to the community. CMAJ. Apr 06, 2010;182(6):551-557. [FREE Full text] [CrossRef] [Medline]
- Lundberg SM, Erion G, Chen H, DeGrave A, Prutkin JM, Nair B, et al. From local explanations to global understanding with explainable AI for trees. Nat Mach Intell. Jan 17, 2020;2(1):56-67. [FREE Full text] [CrossRef] [Medline]
- Larner A. Number needed to diagnose, predict, or misdiagnose: useful metrics for non-canonical signs of cognitive status? Dement Geriatr Cogn Dis Extra. 2018;8(3):321-327. [CrossRef] [Medline]
- Sharma V, Kulkarni V, McAlister F, Eurich D, Keshwani S, Simpson SH, et al. Predicting 30-day readmissions in patients with heart failure using administrative data: a machine learning approach. J Card Fail. May 2022;28(5):710-722. [FREE Full text] [CrossRef] [Medline]
- Strömberg A, Mårtensson J, Fridlund B, Levin LA, Karlsson JE, Dahlström U. Nurse-led heart failure clinics improve survival and self-care behaviour in patients with heart failure: results from a prospective, randomised trial. Eur Heart J. Jun 2003;24(11):1014-1023. [CrossRef] [Medline]
- Agvall B, Alehagen U, Dahlström U. The benefits of using a heart failure management programme in Swedish primary healthcare. Eur J Heart Fail. Feb 27, 2013;15(2):228-236. [FREE Full text] [CrossRef] [Medline]
- Gandhi S, Mosleh W, Sharma UC, Demers C, Farkouh ME, Schwalm J. Multidisciplinary heart failure clinics are associated with lower heart failure hospitalization and mortality: systematic review and meta-analysis. Can J Cardiol. Oct 2017;33(10):1237-1244. [CrossRef] [Medline]
Abbreviations
ATC: Anatomical Therapeutic Chemical |
AUC: area under the receiver operating characteristic curve |
AUPRC: area under the precision-recall curve |
CCI: Charlson Comorbidity Index |
DL: deep learning |
ED: emergency department |
EHR: electronic health record |
HF: heart failure |
ICD-10-SE: International Classification of Diseases, revision 10, Sweden |
LACE: Length of stay, Acuity of index admission, CCI, and number of ED visits in the last 6 months |
LSTM: long short-term memory |
ML: machine learning |
NT-proBNP: N-terminal prohormone of brain natriuretic peptide |
SGLT-2: sodium-glucose cotransporter-2 |
SHAP: Shapley Additive Explanations |
Edited by A Mavragani; submitted 02.03.23; peer-reviewed by G Lim, Y Zhang; comments to author 01.04.23; revised version received 22.05.23; accepted 29.09.23; published 27.10.23.
Copyright©Amira Soliman, Björn Agvall, Kobra Etminani, Omar Hamed, Markus Lingman. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 27.10.2023.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.