Using a Constraint-Based Method to Identify Chronic Disease Patients Who Are Apt to Obtain Care Mostly Within a Given Health Care System: Retrospective Cohort Study

Background For several major chronic diseases including asthma, chronic obstructive pulmonary disease, chronic kidney disease, and diabetes, a state-of-the-art method to avert poor outcomes is to use predictive models to identify future high-cost patients for preemptive care management interventions. Frequently, an American patient obtains care from multiple health care systems, each managed by a distinct institution. As the patient’s medical data are spread across these health care systems, none has complete medical data for the patient. The task of building models to predict an individual patient’s cost is currently thought to be impractical with incomplete data, which limits the use of care management to improve outcomes. Recently, we developed a constraint-based method to identify patients who are apt to obtain care mostly within a given health care system. Our method was shown to work well for the cohort of all adult patients at the University of Washington Medicine for a 6-month follow-up period. It is unknown how well our method works for patients with various chronic diseases and over follow-up periods of different lengths, and subsequently, whether it is reasonable to perform this predictive modeling task on the subset of patients pinpointed by our method. Objective To understand our method’s potential to enable this predictive modeling task on incomplete medical data, this study assesses our method’s performance at the University of Washington Medicine on 5 subgroups of adult patients with major chronic diseases and over follow-up periods of 2 different lengths. Methods We used University of Washington Medicine data for all adult patients who obtained care at the University of Washington Medicine in 2018 and PreManage data containing usage information from all hospitals in Washington state in 2019. We evaluated our method’s performance over the follow-up periods of 6 months and 12 months on 5 patient subgroups separately—asthma, chronic kidney disease, type 1 diabetes, type 2 diabetes, and chronic obstructive pulmonary disease. Results Our method identified 21.81% (3194/14,644) of University of Washington Medicine adult patients with asthma. Around 66.75% (797/1194) and 67.13% (1997/2975) of their emergency department visits and inpatient stays took place within the University of Washington Medicine system in the subsequent 6 months and in the subsequent 12 months, respectively, approximately double the corresponding percentage for all University of Washington Medicine adult patients with asthma. The performance for adult patients with chronic kidney disease, adult patients with chronic obstructive pulmonary disease, adult patients with type 1 diabetes, and adult patients with type 2 diabetes was reasonably similar to that for adult patients with asthma. Conclusions For each of the 5 chronic diseases most relevant to care management, our method can pinpoint a reasonably large subset of patients who are apt to obtain care mostly within the University of Washington Medicine system. This opens the door to building models to predict an individual patient’s cost on incomplete data, which was formerly deemed impractical. International Registered Report Identifier (IRRID) RR2-10.2196/13783


Introduction Background
Care management is widely used to improve the outcomes of patients with chronic diseases [1]. Typically, we build a model to predict an individual patient's cost [1][2][3][4][5]. For a patient predicted to incur high cost in the future, we enroll the patient in a care management program for preemptive interventions. Then a care manager will call the patient regularly to check the patient's status and help arrange health and related services. Proper use of care management can cut cost by up to 15%, reduce hospital visits (emergency department visits and inpatient stays) by up to 40%, and bring many other benefits [4,[6][7][8][9][10][11][12][13]. Care management is typically used for managing the chronic diseases of asthma, chronic obstructive pulmonary disease (COPD), chronic kidney disease (CKD), and diabetes, as these diseases fulfill 3 conditions making a care management program economically feasible for implementation: 1) the disease has a high prevalence rate; 2) if not treated appropriately, the disease can result in expensive acute exacerbations; and 3) relatively low-cost and effective interventions within the patient's control are available for the disease [6,14].
In the United States, a patient often obtains care from several healthcare systems such as academic medical centers and private physician groups. As the patient's medical data are spread across these healthcare systems, none of them has complete medical data for the patient. Our prior work showed that <1/3 of hospital visits by University of Washington Medicine (UWM) adult patients took place within the UWM in a 6-month follow-up period from April to October 2017 [15]. Other researchers showed similar evidence of care fragmentation for adult hospital visits in Massachusetts [16] and for emergency department visits in Indiana [17]. Typical models for forecasting an individual patient's cost presume complete historical data [14,18,19]. A healthcare system with incomplete data for its patients does not use these models, resulting in many predictably costly patients being missed by care management interventions and having poor outcomes.
Recently, we developed the first constraint-based method to pinpoint a reasonably large subset of patients apt to obtain care mostly within a given healthcare system [15]. For a 6-month follow-up period from April to October 2017, we showed that our method worked well for the cohort of all adult patients at the UWM [15]. It is unknown how our method performs on patients with various chronic diseases and over follow-up periods of different lengths. If our method performs well in these cases, for the subset of patients with chronic diseases that is pinpointed by our method and for which the healthcare system has more complete data, we could build a model to predict an individual patient's cost. This would be better than the current practice of not using any cost prediction model to facilitate care management for this healthcare system at all.

Objectives
To understand our method's potential to enable building models to predict an individual patient's cost on incomplete medical data, this study assesses our method's performance at the UWM on 5 subgroups of adult patients and over follow-up periods of 2 different lengths. Each subgroup corresponds to a separate one of the 5 major chronic diseases: asthma, CKD, COPD, type 1 diabetes (T1D), and type 2 diabetes (T2D), for which care management is used.

Patient population
As the largest academic healthcare system in Washington state, the UWM provides both clinic-based and hospital-based care for adults. As shown in Figure 1, our patient cohort covered all adult patients (age≥18) who visited the UWM during 2018 and had information kept in the UWM's enterprise data warehouse. Unless explicitly specified as a particular type of visit, a visit can be of any type in this paper. Patients who died during 2018 were excluded from our cohort.

Data set
We used clinical and administrative data in the UWM's enterprise data warehouse during 2011-2018. The data set included information on demographics, visits, diagnoses, laboratory tests, medications, and primary care physicians (PCPs) for patients in our cohort. We also used 2019 PreManage data of UWM patients. As a commercial product of Collective Medical Technologies Inc., PreManage provides diagnosis and visit data of hospital visits (emergency department visits and inpatient stays) at all hospitals in Washington state as well as many hospitals in other U.S. states [20]. As shown in Figure 2, we used January 1, 2019 as the index date to separate the subsequent and prior periods for our analysis task.

Patient subgroups
We considered 5 patient subgroups comprised of patients with a specific major chronic disease in our patient cohort in 2018. One subgroup was created for each of 5 major chronic diseases: asthma, CKD, COPD, T1D, and T2D.  [21][22][23].

COPD case definition
By adjusting the criteria adopted by the National Quality Forum and the Centers for Medicare and Medicaid Services [27][28][29], we encompassed emergency department and outpatient visit data [30] to identify COPD patients. A patient was deemed to have COPD if the patient was ≥40 and fulfilled any of these 4 conditions: T1D and T2D case definition We used Nichols et al.'s method [31] to identify diabetes patients. A patient was deemed to have diabetes if the patient had ≥1 inpatient stay diagnosis code of diabetes (ICD-9: 250.x, 357.2, 362.0x, 366.41; ICD-10: E10.x, E11.x) or any 2 of the following events happening within 2 years of each other: 1) hemoglobin A1c (HbA1c) ≥6.5%, 2) random plasma glucose ≥200 mg/dL, 3) fasting plasma glucose ≥126 mg/dL, 4) an outpatient visit diagnosis code of diabetes (ICD-9: 250.x, 357.2, 362.0x, 366.41; ICD-10: E10.x, E11.x), and 5) a prescription of anti-hyperglycemic medication (α-glucosidase inhibitor, amylin analogue, biguanide, dipeptidyl peptidase-4 inhibitor, incretin mimetic, insulin, meglitinide, sulfonylurea, and thiazolidinedione). Two events of the same type, like 2 events of HbA1c ≥6.5%, would qualify if they happened on 2 different days. Since metformin, a biguanide, and thiazolidinedione could be used for other diseases, we did not count 2 prescriptions of metformin or thiazolidinedione with no other manifestation of diabetes. We also excluded events occurring during pregnancy.

Our recently developed constraint-based method for identifying patients
We looked at 3 UWM hospitals whose clinical and administrative data are kept in the UWM's enterprise data warehouse: University of Washington Medical Center, Harborview Medical Center, and Northwest Hospital. They are all in Seattle, Washington. To identify patients apt to obtain care mostly within the UWM, we used the parameterized PCP constraint developed in our recent paper [15]: the patient resides within d miles of ≥1 of the 3 UWM hospitals and has a UWM PCP. The distance between a UWM hospital and a patient's home is the ellipsoid great circle distance computed by the distVincentyEllipsoid function contained in R's geosphere package version 1.5-5 [34]. d is a parameter. For all UWM adult patients and the follow-up period of 6 months, we showed that d's optimal value is 5 [15]. The UWM PCPs are inclined to refer within the UWM. Thus, intuitively, patients having a UWM PCP are apt to obtain a larger percentage of their care from the UWM than other patients. All else being equal, the UWM tends to provide a larger portion of a patient's care when the patient resides closer to UWM hospitals. Moreover, the number of UWM patients fulfilling the constraint grows with d. When d=+∞, distance plays no role any more.

Data analysis
As shown in Figure 2, we considered 2 follow-up periods starting from January 1, 2019: the subsequent 6 months (January 1, 2019 -June 30, 2019) and the subsequent 12 months (January 1, 2019 -December 31, 2019). The 6-month follow-up period was chosen to be consistent with the length of the follow-up period used in our prior paper [15]. The 12-month follow-up period was chosen because to facilitate care management, typically at least 1 year of historical data is needed to build models to predict an individual patient's cost [14]. For each of the 5 patient subgroups and each of the 2 follow-up periods, we computed our method's performance on identifying patients apt to obtain care mostly within the UWM. We employed administrative data in the UWM's enterprise data warehouse to assess whether a patient fulfilled the parameterized PCP constraint. For each of the 5 patient subgroups, we calculated the percentage of patients fulfilling the constraint = n0/d0×100%. Here, n0 is the number of patients in the subgroup fulfilling the constraint. d0 is the number of patients in the subgroup. For all patients in the subgroup fulfilling the constraint, we used PreManage data to calculate: (1) the percentage of their hospital visits taking place within the UWM in the subsequent 6 months = n1/d1×100%. Here, n1 is the number of their hospital visits taking place within the UWM in the subsequent 6 months. d1 is the number of their hospital visits taking place anywhere in the subsequent 6 months. (2) the percentage of their hospital visits taking place within the UWM in the subsequent 12 months = n2/d2×100%. Here, n2 is the number of their hospital visits taking place within the UWM in the subsequent 12 months. d2 is the number of their hospital visits taking place anywhere in the subsequent 12 months. Since an average hospital visit costs much more than an average visit of another type, this percentage signifies the proportion of those patients' care obtained from the UWM.
When deciding the optimal value of the distance threshold d to use, we struck a balance between 2 goals: (1) Goal 1: For patients fulfilling the constraint, the proportion of their hospital visits taking place within the UWM should be as large as possible. This will maximize the completeness of UWM medical data and minimize bias in the results of analyses done on those data. As these patients each have a UWM PCP, we expect most of their outpatient visits to occur within the UWM in the subsequent 12 months. (2) Goal 2: The percentage of patients fulfilling the constraint should be as large as possible. This will help maximize the impact of the application using UWM medical data. To show how our method performs for every UWM hospital, for all patients in the subgroup fulfilling the constraint, we employed PreManage data to calculate: (1) the percentage of their hospital visits taking place at the UWM hospital in the subsequent 6 months = n3/d1×100%. Here, n3 is the number of their hospital visits taking place at the UWM hospital in the subsequent 6 months. Recall d1 is the number of their hospital visits taking place anywhere in the subsequent 6 months. (2) the percentage of their hospital visits taking place at the UWM hospital in the subsequent 12 months = n4/d2×100%. Here, n4 is the number of their hospital visits taking place at the UWM hospital in the subsequent 12 months. Recall d2 is the number of their hospital visits taking place anywhere in the subsequent 12 months.

Ethics approval
The UWM's institutional review board approved this retrospective cohort study. Table 1 shows our patient cohort's demographic and clinical characteristics.   By striking a balance between the 2 goals mentioned at the end of the Methods section, we chose d=5 as the optimal value to use for each patient subgroup and each follow-up period. This value is the same as that chosen in our prior paper [15]. For d=5 and for each of the 5 patient subgroups, Table 2 shows the percentage of patients fulfilling the parameterized PCP constraint, the percentages of patient hospital visits taking place within the UWM in the subsequent 6 months and in the subsequent 12 months, and the percentages of hospital visits by patients fulfilling the constraint that took place within the UWM in the subsequent 6 months and in the subsequent 12 months.  Table 2. For d=5 and for each of the 5 patient subgroups, the percentage of patients fulfilling the parameterized primary care physician (PCP) constraint, the percentages of patient hospital visits taking place within the University of Washington Medicine (UWM) in the subsequent 6 months and in the subsequent 12 months, and the percentages of hospital visits by patients fulfilling the constraint that took place within the UWM in the subsequent 6 months and in the subsequent 12

Principal results
For each of the 5 major chronic diseases most relevant to care management (asthma, COPD, CKD, T1D, and T2D), our constraint-based method can pinpoint a reasonably large subset of patients apt to obtain care mostly within the UWM. Using our method to pinpoint a subset of UWM adult patients with asthma, we roughly doubled the percentage of patient hospital visits taking place within the UWM in the subsequent 6 months from 37.11% (2,648/7,135) to 66.75% (797/1,194), and the corresponding percentage for the subsequent 12 months from 37.66% (6,857/18,206) to 67.13% (1,997/2,975). The results for adult patients with CKD, adult patients with COPD, adult patients with T1D, and adult patients with T2D are relatively similar. As the patients fulfilling the constraint all have a UWM PCP, we expect a majority of their outpatient visits to happen within the UWM in the subsequent 12 months, although we did not examine this in our study.

Comparison with our prior work
The performance numbers shown in this paper are relatively similar to those that our prior paper [15] showed for the group of all adult patients and the 6-month follow-up period from April to October 2017. There, using our constraint-based method with d=5 to pinpoint 16.01% (55,707/348,054) of UWM adult patients, we roughly doubled the percentage of patient hospital visits taking place within the UWM in the subsequent 6 months from 31.80% (39,171/123,162) to 69.38% (10,501/15,135).

Differences in the results for patients with T1D and patients with T2D
T1D tends to occur in younger people than T2D. Many young adults are students at the University of Washington and several other universities in the Seattle metropolitan area. During the summer and other university breaks, many of these students return to their hometowns outside the Seattle metropolitan area that the UWM primarily serves. The hospital visits they incur during these periods are likely to be outside of the UWM. Partly due to this, as shown in Table 2, the percentage of hospital visits by UWM adult patients with T1D taking place within the UWM in each follow-up period is about 30% less than the corresponding percentage for UWM adult patients with T2D. Looking only at patients fulfilling the parameterized PCP constraint with d=5, the percentage of hospital visits by patients with T1D taking place within the UWM in each follow-up period is about 15-30% less than the corresponding percentage for patients with T2D.

Possible use of our results
We showed that for each of 5 major chronic diseases most relevant to care management, the UWM offers most of the care and has decently complete medical data for patients fulfilling the parameterized PCP constraint with d=5. For these patients, we can build a predictive model to identify future high-cost patients among them and intervene preemptively via care management to avert poor outcomes [1][2][3][4][5]. As patients residing farther from the 3 UWM hospitals are inclined to obtain a smaller percentage of their care from the UWM, the UWM could consider adopting differing preventive interventions for patients residing at different distances from the UWM hospitals. This could help care management gain better results. For patients obtaining only a small percentage of their care from the UWM, it is hard for the UWM to adopt costly preventive interventions economically.
Like many other healthcare systems, the UWM has no complete claims data on its patients' healthcare use outside of the UWM. If a healthcare system has complete claims data on its patients' outside healthcare use, we could employ claims data instead of PreManage data to do a similar study.
A healthcare system with no access to PreManage data could also adopt our method. Without using PreManage data, one could assess our method's performance by asking some patients of the healthcare system about the care they obtained everywhere.

Limitations
This study has 2 limitations, which could be interesting topics for future work: (1) This study assesses our method at the UWM, which primarily serves an urban region. To know how our method generalizes, we need to redo our analysis at other healthcare systems, some providing care to rural regions and others primarily serving urban regions. Patients concentrate more in urban regions than in rural regions. For a healthcare system primarily serving rural regions, we expect d's optimal value to be >5. (2) For a healthcare system having incomplete medical data for its patients, we can employ our method to identify a subset of patients apt to obtain care mostly within the healthcare system and assess the degree of data incompleteness for this subset.
Analyzing incomplete data could lead to biased results, which are better than no result if the degree of bias is small. At present, we know neither the exact relation between data incompleteness and bias nor the extent of data incompleteness that can be tolerated before the results of data analysis become invalid. To assess whether our method could safely enable the data analysis task in such a healthcare system, we could obtain a more complete data set from Kaiser Permanente or any other similar healthcare system, drop differing portions of the data set, and assess the impact on the analysis results.

Conclusions
For each of the 5 major chronic diseases most relevant to care management (asthma, COPD, CKD, T1D, and T2D), our constraint-based method can pinpoint a reasonably large subset of patients apt to obtain care mostly within the UWM. This opens the door to building models to predict an individual patient's cost on incomplete data, which was formerly deemed infeasible.
COPD: chronic obstructive pulmonary disease eGFR: estimated glomerular filtration rate HbA1c: hemoglobin A1c ICD-9: International Classification of Diseases, Ninth Revision ICD-10: International Classification of Diseases, Tenth Revision PCP: primary care physician T1D: type 1 diabetes T2D: type 2 diabetes UWM: University of Washington Medicine    Figure 11. For every University of Washington Medicine (UWM) hospital and all adult patients with type 2 diabetes fulfilling the parameterized primary care physician (PCP) constraint, the percentages of their hospital visits taking place at the UWM hospital in the subsequent 6 months and in the subsequent 12 months.