@Article{info:doi/10.2196/71413,
author="Liu, Xianglin
and Huang, Zhihua
and Guo, Yizhi
and Li, Yandeng
and Zhu, Jianming
and Wen, Jun
and Gao, Yunchun
and Liu, Jianyi",
title="Identification and Validation of an Explainable Prediction Model of Sepsis in Patients With Intracerebral Hemorrhage: Multicenter Retrospective Study",
journal="J Med Internet Res",
year="2025",
month="Apr",
day="28",
volume="27",
pages="e71413",
keywords="intracerebral hemorrhage; machine learning; sepsis; prediction model; SHAP; Shapley Additive Explanations",
abstract="Background: Sepsis is a life-threatening condition frequently observed in patients with intracerebral hemorrhage (ICH) who are critically ill. Early and accurate identification and prediction of sepsis are crucial. Machine learning (ML)--based predictive models exhibit promising sepsis prediction capabilities in emergency settings. However, their application in predicting sepsis among patients with ICH is still limited. Objective: The aim of the study is to develop an ML-driven risk calculator for early prediction of sepsis in patients with ICH who are critically ill and to clarify feature importance and explain the model using the Shapley Additive Explanations method. Methods: Patients with ICH admitted to the intensive care unit (ICU) from the Medical Information Mart for Intensive Care IV database between 2008 and 2022 were divided into training and internal test sets. The external test was performed using the eICU Collaborative Research Database, which includes over 200,000 ICU admissions across the United States between 2014 and 2015. Sepsis following ICU admission was identified using Sepsis-3.0 through clinical diagnosis combining elevation of the Sequential Organ Failure Assessment by ≥2 points with suspected infection. The Boruta algorithm was used for feature selection, confirming 29 features. Nine ML algorithms were used to construct the prediction models. Predictive performance was compared using several evaluation metrics, including the area under the receiver operating characteristic curve (AUC). The Shapley Additive Explanations technique was used to interpret the final model, and a web-based risk calculator was constructed for clinical practice. Results: Overall, 2414 patients with ICH were enrolled from the Medical Information Mart for Intensive Care IV database, with 1689 and 725 patients assigned to the training and internal test sets, respectively. An external test set of 2806 patients with ICH from the eICU database was used. Among the 9 ML models tested, the categorical boosting (CatBoost) model demonstrated the best discriminative ability. After reducing features based on their importance, an explainable final CatBoost model was developed using 8 features. The final model accurately predicted sepsis in internal (AUC=0.812) and external (AUC=0.771) tests. Conclusions: We constructed a web-based risk calculator with 8 features based on the CatBoost model to assist clinicians in identifying people at high risk for sepsis in patients with ICH who are critically ill. ",
issn="1438-8871",
doi="10.2196/71413",
url="https://www.jmir.org/2025/1/e71413",
url="https://doi.org/10.2196/71413"
}