Published on in Vol 27 (2025)

Preprints (earlier versions) of this paper are available at https://www.medrxiv.org/content/10.1101/2024.12.16.24318586v1, first published .
Using a Multilingual AI Care Agent to Reduce Disparities in Colorectal Cancer Screening for Higher Fecal Immunochemical Test Adoption Among Spanish-Speaking Patients: Retrospective Analysis

Using a Multilingual AI Care Agent to Reduce Disparities in Colorectal Cancer Screening for Higher Fecal Immunochemical Test Adoption Among Spanish-Speaking Patients: Retrospective Analysis

Using a Multilingual AI Care Agent to Reduce Disparities in Colorectal Cancer Screening for Higher Fecal Immunochemical Test Adoption Among Spanish-Speaking Patients: Retrospective Analysis

Original Paper

1Hippocratic AI, Palo Alto, CA, United States

2WellSpan Health, York, PA, United States

3School of Population and Public Health, University of British Columbia, Vancouver, BC, Canada

4University of California Davis Health, Davis, CA, United States

Corresponding Author:

Meenesh Bhimani, MD, MHA

Hippocratic AI

167 Hamilton Avenue

Palo Alto, CA, 94301

United States

Phone: 1 6509226682

Email: meenesh@hippocraticai.com


Background: Colorectal cancer (CRC) screening rates remain disproportionately low among Hispanic and Latino populations compared to non-Hispanic White populations. While artificial intelligence (AI) shows promise in health care delivery, concerns exist that AI-based interventions may disadvantage non–English-speaking populations due to biases in development and deployment.

Objective: This study aimed to evaluate the effectiveness of a multilingual AI care agent in engaging Spanish-speaking patients for CRC screening compared to that with English-speaking patients.

Methods: This retrospective analysis examined an AI-powered outreach initiative at WellSpan Health in Pennsylvania and Maryland during September 2024. The study included 1878 patients (517 Spanish-speaking, 1361 English-speaking) eligible for CRC screening who lacked active web-based health profiles. A multilingual AI conversational agent conducted personalized telephone calls in the patient’s preferred language to provide education about CRC screening and facilitate fecal immunochemical test (FIT) kit requests. The primary outcome was the FIT test opt-in rate, with secondary outcomes including connect rates and call duration. Statistical analysis included descriptive statistics, bivariate comparisons, and multivariate logistic regression.

Results: Spanish-speaking patients demonstrated significantly higher engagement across all measures than English-speaking patients with respect to FIT test opt-in rates (18.2% vs 7.1%, P<.001), connect rates (69.6% vs 53.0%, P<.001), and call duration (6.05 vs 4.03 minutes, P<.001). Demographically, Spanish-speaking patients were younger (mean age 57 vs 61 years, P<.001) and more likely to be female (49.1% vs 38.4%, P<.001). In multivariate analysis, Spanish language preference remained an independent predictor of FIT test opt-in (adjusted odds ratio 2.012, 95% CI 1.340-3.019; P<.001) after controlling for demographic factors and call duration.

Conclusions: AI-powered outreach achieved significantly higher engagement among Spanish-speaking patients, challenging the assumption that technological interventions inherently disadvantage non–English-speaking populations. The 2.6-fold higher FIT test opt-in rate among Spanish-speaking patients represents a notable departure from historical patterns of health care disparities. These findings suggest that language-concordant AI interactions may help address longstanding disparities in preventive care access. Study limitations include its single health care system setting, short duration, and lack of follow-up data on completed screenings. Future research should assess long-term adherence and whether higher engagement translates to improved clinical outcomes.

J Med Internet Res 2025;27:e71211

doi:10.2196/71211

Keywords



Colorectal cancer (CRC) screening is crucial for reducing CRC-related mortality by detecting and removing precancerous adenomas or identifying cancer at earlier stages [Nicholson FB, Barro JL, Atkin W, Lilford R, Patnick J, Williams CB, et al. Review article: population screening for colorectal cancer. Aliment Pharmacol Ther. 2005;22(11-12):1069-1077. [CrossRef] [Medline]1,Levin B, Lieberman DA, McFarland B, Smith RA, Brooks D, Andrews KS, American Cancer Society Colorectal Cancer Advisory Group, US Multi-Society Task Force, et al. American College of Radiology Colon Cancer Committee. Screening and surveillance for the early detection of colorectal cancer and adenomatous polyps, 2008: a joint guideline from the American Cancer Society, the US multi-society task force on colorectal cancer, and the American college of radiology. CA Cancer J Clin. 2008;58(3):130-160. [FREE Full text] [CrossRef] [Medline]2]. Various screening methods are available, including colonoscopy, fecal immunochemical tests (FIT), stool DNA tests, and computed tomographic colonography, each with different levels of effectiveness and potential harms [Lin JS, Perdue LA, Henrikson NB, Bean SI, Blasi PR. Screening for colorectal cancer: updated evidence report and systematic review for the US preventive services task force. JAMA. 2021;325(19):1978-1998. [CrossRef] [Medline]3]. Recent advances in understanding CRC pathogenesis and improvements in screening technologies have enhanced early detection capabilities [Li D. Recent advances in colorectal cancer screening. Chronic Dis Transl Med. 2018;4(3):139-147. [FREE Full text] [CrossRef] [Medline]4]. Although some concerns remain regarding the appropriate balance of benefits, risks, and costs, CRC screening is recommended in major guidelines and viewed as a critical public health intervention that can significantly reduce the burden of this prevalent cancer [Levin B, Lieberman DA, McFarland B, Smith RA, Brooks D, Andrews KS, American Cancer Society Colorectal Cancer Advisory Group, US Multi-Society Task Force, et al. American College of Radiology Colon Cancer Committee. Screening and surveillance for the early detection of colorectal cancer and adenomatous polyps, 2008: a joint guideline from the American Cancer Society, the US multi-society task force on colorectal cancer, and the American college of radiology. CA Cancer J Clin. 2008;58(3):130-160. [FREE Full text] [CrossRef] [Medline]2,Qaseem A, Denberg TD, Hopkins RH, Humphrey LL, Levine J, Sweet DE, et al. Clinical Guidelines Committee of the American College of Physicians. Screening for colorectal cancer: a guidance statement from the American college of physicians. Ann Intern Med. 2012;156(5):378-386. [FREE Full text] [CrossRef] [Medline]5-Ness RM, Llor X, Abbass MA, Bishu S, Chen CT, Cooper G, et al. NCCN Guidelines® Insights: Colorectal Cancer Screening, Version 1.2024. J Natl Compr Canc Netw. 2024;22(7):438-446. [FREE Full text] [CrossRef]7].

Despite these benefits, data consistently show lower CRC screening rates among the Hispanic population in the United States compared with non-Hispanic White populations, with studies reporting disparities ranging from 13.5% to 17% [Pollack LA, Blackman DK, Wilson KM, Seeff LC, Nadel MR. Colorectal cancer test use among Hispanic and non-hispanic U.S. populations. Prev Chronic Dis. 2006;3(2):A50. [FREE Full text] [Medline]8-Viramontes O, Bastani R, Yang L, Glenn BA, Herrmann AK, May FP. Colorectal cancer screening among Hispanics in the United States: disparities, modalities, predictors, and regional variation. Prev Med. 2020;138:106146. [FREE Full text] [CrossRef] [Medline]10]. Common barriers to CRC screening among Hispanics include fear, cost, lack of awareness, low literacy and education levels, and limited English proficiency [Wang J, Moehring J, Stuhr S, Krug M. Barriers to colorectal cancer screening in hispanics in the United States: an integrative review. Appl Nurs Res. 2013;26(4):218-224. [CrossRef] [Medline]11]. Effective interventions to increase screening rates include culturally tailored patient education, navigation services, and provider training [Gonzalez SA, Ziebarth TH, Wang J, Noor AB, Springer DL. Interventions promoting colorectal cancer screening in the Hispanic population: a review of the literature. J Nurs Scholarsh. 2012;44(4):332-340. [CrossRef] [Medline]12-Mojica CM, Parra-Medina D, Vernon S. Interventions promoting colorectal cancer screening among latino men: A systematic review. Prev Chronic Dis. 2018;15:E31. [FREE Full text] [CrossRef] [Medline]14]. Nonetheless, disparities persist, with significant regional variations across the United States [Viramontes O, Bastani R, Yang L, Glenn BA, Herrmann AK, May FP. Colorectal cancer screening among Hispanics in the United States: disparities, modalities, predictors, and regional variation. Prev Med. 2020;138:106146. [FREE Full text] [CrossRef] [Medline]10].

Previous research has demonstrated that technological innovations can improve preventive health screening rates among Hispanic populations, suggesting promising applications for artificial intelligence (AI) in this domain. Multiple digital approaches have already shown success: mHealth interventions featuring educational videos and interactive multimedia such as touchscreen tablets have increased cancer screening participation, while a combination of bilingual patient navigators, secure SMS messaging, and at-home testing achieved an increase in CRC screening rates [Watanabe-Galloway S, Ratnapradipa K, Subramanian R, Ramos A, Famojuro O, Schmidt C, et al. Mobile health (mHealth) interventions to increase cancer screening rates in hispanic/latinx populations: A scoping review. Health Promot Pract. 2023;24(6):1215-1229. [CrossRef] [Medline]15,Rozario MA, Walton A, Kang M, Padilla BI. Colorectal cancer screening: A quality improvement initiative using a bilingual patient navigator, mobile technology, and fecal immunochemical testing to engage hispanic adults. Clin J Oncol Nurs. 2021;25(4):423-429. [CrossRef] [Medline]16].

Earlier successes with phone-based interventions in Hispanic and psychiatric populations, along with traditional “promotora” programs that improved preventive exam compliance by 35%, suggest that AI could effectively augment these established approaches [Villegas N, Cianelli R, de Tantillo L, Warheit M, Montano NP, Ferrer L, et al. Assessment of technology use and technology preferences for HIV prevention among hispanic women. Hisp Health Care Int. 2018;16(4):197-203. [FREE Full text] [CrossRef] [Medline]17-Hunter JB, de Zapien JG, Papenfuss M, Fernandez ML, Meister J, Giuliano AR. The impact of a promotora on increasing routine chronic disease prevention among women aged 40 and older at the U.S.-Mexico border. Health Educ Behav. 2004;31(4 Suppl):18S-28S. [CrossRef] [Medline]19]. Given that tailored patient-centered technologies have consistently improved health outcomes and reduced disparities, AI represents a potential next step in this technological evolution [Tarver WL, Haggstrom DA. The use of cancer-specific patient-centered technologies among underserved populations in the United States: systematic review. J Med Internet Res. 2019;21(4):e10256. [FREE Full text] [CrossRef] [Medline]20].

Nonetheless, its impact on health care remains unclear, especially with respect to AI care agents. Indeed, some have suggested that AI-based interventions in health care may disadvantage non–English-speaking and minority populations due to biases in development and deployment [Alford J, Rathod N. AI could worsen health inequities for UK’s minority ethnic groups - new report. Imperial News. URL: https://www.imperial.ac.uk/news/230413/ai-could-worsen-health-inequities-uks/ [accessed 2024-11-19] 21,Nazer LH, Zatarah R, Waldrip S, Ke JXC, Moukheiber M, Khanna AK, et al. Bias in artificial intelligence algorithms and recommendations for mitigation. PLOS Digit Health. 2023;2(6):e0000278. [FREE Full text] [CrossRef] [Medline]22]. These concerns about AI bias in health care are well-documented in the literature. Multiple studies have identified systematic disparities in AI algorithm performance across racial and ethnic groups. For example, machine learning models have demonstrated higher predictive accuracy for White patients compared with minority groups across various applications [Barton M, Hamza M, Guevel B. Racial equity in healthcare machine learning: illustrating bias in models with minimal bias mitigation. Cureus. 2023;15(2):e35037. [FREE Full text] [CrossRef] [Medline]23]. In health risk prediction, algorithms have been shown to underestimate the care needs of Black patients [Obermeyer Z, Powers B, Vogeli C, Mullainathan S. Dissecting racial bias in an algorithm used to manage the health of populations. Science. 2019;366(6464):447-453. [FREE Full text] [CrossRef] [Medline]24], while in medical imaging, AI systems consistently underdiagnose conditions in underserved populations [Seyyed-Kalantari L, Zhang H, McDermott MBA, Chen IY, Ghassemi M. Underdiagnosis bias of artificial intelligence algorithms applied to chest radiographs in under-served patient populations. Nat Med. 2021;27(12):2176-2182. [FREE Full text] [CrossRef] [Medline]25]. These disparities stem from several mechanisms, including the underrepresentation of minorities in training datasets, biased feature selection, and implementation contexts that amplify existing health care inequities [Mbakwe AB, Lourentzou I, Celi LA, Wu JT. Fairness metrics for health AI: we have a long way to go. EBioMedicine. 2023;90:104525. [FREE Full text] [CrossRef] [Medline]26,Siddique SM, Tipton K, Leas B, Jepson C, Aysola J, Cohen JB, et al. The impact of health care algorithms on racial and ethnic disparities : A systematic review. Ann Intern Med. 2024;177(4):484-496. [FREE Full text] [CrossRef] [Medline]27]. Given this evidence, careful evaluation of AI-powered interventions specifically targeting minority populations is essential to determine whether they reduce or potentially exacerbate existing health care disparities.

Implementation science is considered a key factor in determining the optimal outcome of AI-based initiatives [Longhurst CA, Singh K, Chopra A, Atreja A, Brownstein JS. A call for artificial intelligence implementation science centers to evaluate clinical effectiveness. NEJM AI. 2024;1(8). [CrossRef]28]. Implementation of science frameworks addressing health equity can provide important theoretical grounding for AI-powered health care interventions. The Health Equity Implementation Framework, which integrates implementation science with health care disparities research, identifies multilevel factors affecting implementation from individual patient characteristics to societal contexts [Woodward EN, Matthieu MM, Uchendu US, Rogal S, Kirchner JE. The health equity implementation framework: proposal and preliminary study of hepatitis C virus treatment. Implement Sci. 2019;14(1):26. [FREE Full text] [CrossRef] [Medline]29]. This framework emphasizes culturally relevant factors, clinical encounters, and societal determinants that can influence health equity outcomes. Similarly, the EquIR (Equity-based Implementation Research) framework links pre- and postimplementation population health status with specific equity considerations [Baumann AA, Cabassa LJ. Reframing implementation science to address inequities in healthcare delivery. BMC Health Serv Res. 2020;20(1):190. [FREE Full text] [CrossRef] [Medline]30]. These theoretical frameworks suggest that properly designed AI-powered interventions could potentially help address disparities rather than exacerbate them, particularly when they incorporate elements of cultural specificity, language concordance, and responsiveness to community needs. However, implementation science also highlights the importance of evaluating such interventions for their actual impact on health equity outcomes rather than assuming beneficial effects [Woodward EN, Singh RS, Ndebele-Ngwenya P, Melgar Castillo A, Dickson KS, Kirchner JE. A more practical guide to incorporating health equity domains in implementation determinant frameworks. Implement Sci Commun. 2021;2(1):61. [FREE Full text] [CrossRef] [Medline]31].

This study aims to evaluate whether a multilingual AI care agent can effectively engage Spanish-speaking patients in colorectal cancer screening compared with English-speaking patients. Specifically, we examine whether AI-powered outreach results in comparable or higher FIT test opt-in rates among Spanish-speaking patients, potentially challenging the assumption that technological interventions inherently disadvantage non–English-speaking populations. We hypothesize that a language-concordant AI care agent will achieve engagement levels among Spanish-speaking patients that meet or exceed those observed in English-speaking patients, potentially helping to reduce the persistent screening disparities that conventional approaches have struggled to address.


We hypothesized that AI-powered outreach using a multilingual AI care agent would demonstrate comparable or higher engagement and FIT test opt-in rates among Spanish-speaking patients than among English-speaking patients, potentially helping to reduce existing screening disparities between these populations. Our primary aim was to evaluate the effectiveness of a multilingual AI-powered outreach system in improving CRC screening rates among Spanish-speaking patients compared to English-speaking patients.

Study Design

This retrospective observational study analyzed data from an AI-powered outreach initiative conducted during the week of September 16-24, 2024, at WellSpan Health, an integrated health system serving central Pennsylvania and northern Maryland. The study focused on comparing the effectiveness of multilingual AI-powered outreach for CRC screening between Spanish- and English-speaking patients. As the study analyzed anonymized administrative data, it was exempt from ethics review, and no individual patient consent was required.

Patient Population

The study population comprised 1878 patients eligible for CRC screening who did not have active web-based health system profiles. Of these, 517/1878 (28%) were Spanish-speaking, and 1361/1878 (72%) were English-speaking patients. Patients were included if they were due for CRC screening and had a verified date of birth. The study specifically targeted patients who were not actively engaged with the health care system and were unlikely to be reached through traditional outreach methods such as email.

Intervention

The intervention used an AI-powered conversational agent named Ana, designed to conduct personalized outreach calls for CRC screening. The platform was developed to engage patients in natural, empathetic telephone conversations while providing education about CRC screening and facilitating FIT test kit requests. The AI care agent was programmed with full bilingual capabilities in English and Spanish, allowing for natural conversation in the patient’s preferred language. The system was designed to engage in culturally appropriate dialogue, explain the importance of CRC screening, address patient questions and concerns, and facilitate the scheduling of FIT test kit deliveries. Calls were initiated to patients’ registered phone numbers, with language preference determined at the start of each interaction by the AI care agent [Hear our genAI healthcare agents in action. Hippocratic AI. URL: https://www.hippocraticai.com/video [accessed 2024-12-04] 32]. We used process models to map out the processes required to integrate AI into clinical workflow; identify potential issues before they become barriers to patient outreach; and adjust strategies for better outcomes [Longhurst CA, Singh K, Chopra A, Atreja A, Brownstein JS. A call for artificial intelligence implementation science centers to evaluate clinical effectiveness. NEJM AI. 2024;1(8). [CrossRef]28].

The AI care agent, Ana, is powered by Polaris 3.0 (Hippocratic AI), a constellation system designed for real-time multilingual phone conversations. The system incorporates language-specific preprocessing to ensure correct pronunciation of medical terminology and medication dosages, robust error handling for speech recognition challenges (particularly when patients code-switch between languages), and culturally appropriate dialogue frameworks. The AI agent was trained to detect when patients needed clarification about medical concepts and could adjust its communication style based on patient responses. When discussing colorectal cancer screening, Ana could explain the FIT test procedure, address common concerns, and capture patient preferences while maintaining a natural conversation flow in both English and Spanish.

Outcome Measures

Data were collected automatically through the AI platform’s integrated tracking system. The primary outcome measure was the FIT test opt-in rate, defined as the percentage of patients who agreed to receive an FIT test kit during the AI-powered conversation. Secondary outcomes included connect rate (percentage of successful connections with patients) and call duration.

Statistical Analysis

Analyses were performed using SPSS (IBM Corp) version 28 [IBM SPSS Statistics. IBM Corp. 2021. URL: https://www.ibm.com/products/spss-statistics [accessed 2025-04-26] 33]. Descriptive statistics were calculated for demographic characteristics and outcome measures. Differences between Spanish- and English-speaking groups were assessed using independent samples t tests for continuous variables and chi-square tests for categorical variables. A logistic regression analysis was conducted to evaluate factors associated with FIT test opt-in rates, controlling for age, gender, geographic location, and call duration. The model’s goodness of fit was assessed using the Hosmer-Lemeshow test, and model performance was evaluated using Nagelkerke R². Statistical significance was set at P<.001. No missing data were present in the analyzed dataset.

Ethical Considerations

This study utilized fully anonymized and de-identified datasets, where any personally identifiable information was removed prior to analysis. According to the U.S. Department of Health and Human Services, under the Common Rule (45 CFR 46.102(l)), research involving only de-identified data does not constitute human subjects research, as there is no intervention or interaction with living individuals and no access to identifiable private information. Specifically, our data met the "safe harbor" standard of de-identification outlined in HIPAA Privacy Rule § 164.514(b)(2), with all 18 types of identifying information removed/excluded [Guidance Regarding Methods for De-identification of Protected Health Information in Accordance with the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule. U.S. Department of Health and Human Services. URL: https://www.hhs.gov/hipaa/for-professionals/special-topics/de-identification/index.html#top34]. Therefore, while we carefully considered ethical implications, formal IRB (institutional review board) review was not required for this analysis. With that being said, we followed strict data security protocols and adhered to all relevant institutional policies regarding responsible data management and research integrity.


Baseline Characteristics

A total of 1878 patients were included in the analysis, comprising 517 (27.5%) Spanish-speaking and 1361 (72.5%) English-speaking individuals (Table 1). Significant demographic differences were observed between the language groups. Spanish-speaking patients were younger (mean age 57, SD 8.5 years vs 61, SD 8.7 years; P<.001) and more likely to be female (49.1% vs 38.4%, P<.001) compared with English-speaking patients. The majority of patients in both groups resided in Pennsylvania, though this proportion was higher among Spanish-speaking patients (99.3% vs 95.6%, P<.001).

Table 1. Baseline characteristics by language group.
CharacteristicsSpanish-Speaking patients (n=517)English-Speaking patients (n=1361)P value
Age (years), mean (SD)57 (8.5)61 (8.7)<.001a
Sex, n (%)<.001b

Female254 (49.1)523 (38.4)

Male263 (50.9)838 (61.6)
Geographic location, n (%)<.001b

Pennsylvania513 (99.3)1301 (95.6)

Maryland3 (0.6)48 (3.5)

Other1 (0.2)12 (0.9)

aIndependent sample t test.

bChi-square test.

Primary and Secondary Outcomes

Spanish-speaking patients demonstrated significantly higher engagement with the AI-powered outreach than English-speaking patients across all measured outcomes (Table 2). The FIT test opt-in rate was more than twice as high among Spanish-speaking patients (18.2% vs 7.1%, P<.001). Similarly, connect rates were substantially higher in the Spanish-speaking group (69.6% vs 53.0%, P<.001). In addition, Spanish-speaking patients engaged in longer conversations with the AI care agent (6.05 minutes vs 4.03 minutes, P<.001).

Table 2. Logistic regression analysis of factors associated with fit test opt-in. The dependent variable was fecal immunochemical test opt-in (yes or no). Model fit statistics: Hosmer-Lemeshow28=30.486, P=.001; Nagelkerke R²=0.445; model χ²5=452.681, P.001.
Variableβ Coefficient (SE)Adjusted ORa (95% CI)P value
Age (per year)–0.010 (0.11)0.990 (0.968-1.1012)N/Sb
State

PennsylvaniaReference1.00c

Maryland or other–1.527 (0.908)0.217 (0.037-1.288)N/S
Sex

MaleReference1.00

Female–0.296 (0.200)0.744 (0.503-1.101)N/S

Call duration (per minute)0.520 (0.031)1.682 (1.583-1.787)<.001
Language

EnglishReference1.00

Spanish0.699 (0.207)2.012 (1.340-3.019)<.001

Constant–3.195 (0.715)<.001

aOR: odds ratio.

bN/S: not significant.

cNot applicable.

Multivariate Analysis

A logistic regression analysis revealed that Spanish language preference remained an independent predictor of FIT test opt-in after adjusting for age, gender, geographic location, and call duration (Table 2). Spanish-speaking patients were more than twice as likely to opt in for FIT testing compared with English-speaking patients (adjusted odds ratio [OR] 2.012, 95% CI 1.340-3.019; P<.001). Call duration was also significantly associated with an opt-in likelihood (adjusted OR 1.682 per minute, 95% CI 1.583-1.787; P<.001). Age, gender, and geographic location were not significantly associated with FIT test opt-in rates.

The regression model demonstrated good explanatory power, explaining nearly 44.5% of the observed variation in FIT test opt-ins (Nagelkerke R²=0.445). The model demonstrated good overall significance as well (Model χ²5=452.681, P<.001), although the Hosmer-Lemeshow test suggested potential issues with model fit (χ²8=30.486, P=.001). These findings indicate that language preference was a robust predictor of engagement with CRC screening outreach, independent of other demographic factors.


Principal Findings

Our study suggests that AI-powered outreach can effectively engage Spanish-speaking populations in CRC screening, with outcomes exceeding those observed in English-speaking patients. The 2.6-fold higher FIT test opt-in rate among Spanish-speaking patients (94/517, 18.2% vs 97/1361, 7.1%) represents a departure from historical patterns of health care disparities–a noteworthy finding given that previous studies have documented consistently lower CRC screening rates among Hispanic and Latino populations (53.4%) than in non-Hispanic White populations (70.4%) [Viramontes O, Bastani R, Yang L, Glenn BA, Herrmann AK, May FP. Colorectal cancer screening among Hispanics in the United States: disparities, modalities, predictors, and regional variation. Prev Med. 2020;138:106146. [FREE Full text] [CrossRef] [Medline]10]. Moreover, higher engagement levels observed among Spanish-speaking patients, evidenced by superior connect rates (69.6% vs 53.0%) and longer call durations (6.05 vs 4.03 minutes), suggest that AI outreach may help overcome traditional barriers to health care access and engagement.

Our multivariate analysis revealed that Spanish language preference remained a significant and independent predictor of FIT test opt-in even after controlling for demographic factors and call duration, with Spanish-speaking patients twice as likely to opt-in to screening than their English-speaking counterparts. Our findings that AI-powered outreach achieved higher engagement among Spanish-speaking patients appear to run counter to the substantial body of evidence documenting AI bias in health care. Multiple studies have demonstrated that AI applications can disadvantage minority populations through various mechanisms. Obermeyer et al [Obermeyer Z, Powers B, Vogeli C, Mullainathan S. Dissecting racial bias in an algorithm used to manage the health of populations. Science. 2019;366(6464):447-453. [FREE Full text] [CrossRef] [Medline]24] found that health risk prediction algorithms significantly underestimated the care needs of Black patients, potentially reducing their access to additional care support. Similarly, Embi [Embi PJ. Algorithmovigilance-advancing methods to analyze and monitor artificial intelligence-driven health care for effectiveness and equity. JAMA Netw Open. 2021;4(4):e214622. [FREE Full text] [CrossRef] [Medline]35] reported that AI models predicting postpartum depression assigned White patients twice the likelihood of diagnosis than Black patients. In medical imaging, Seyyed-Kalantari et al [Seyyed-Kalantari L, Zhang H, McDermott MBA, Chen IY, Ghassemi M. Underdiagnosis bias of artificial intelligence algorithms applied to chest radiographs in under-served patient populations. Nat Med. 2021;27(12):2176-2182. [FREE Full text] [CrossRef] [Medline]25] documented that chest x-ray classifiers consistently underdiagnosed conditions in underserved patient populations [Seyyed-Kalantari L, Zhang H, McDermott MBA, Chen IY, Ghassemi M. Underdiagnosis bias of artificial intelligence algorithms applied to chest radiographs in under-served patient populations. Nat Med. 2021;27(12):2176-2182. [FREE Full text] [CrossRef] [Medline]25].

The contrast between these documented disparities and our results warrants careful consideration. In addition, 2 factors may explain this divergence. First, the nature of our intervention (direct patient outreach and education) differs fundamentally from clinical decision support or diagnostic applications where algorithmic biases have been most extensively documented. Second, the apparent success in Spanish-speaking engagement may reflect the particular receptiveness of this population to personalized outreach addressing a known screening disparity, rather than inherent advantages in the AI methodology itself.

The study benefits from several strengths, including a large sample size with a substantial representation of Spanish-speaking patients and complete data capture through automated systems. The real-world implementation in a diverse health care setting, coupled with statistical analysis controlling for potential confounders, provides evidence for the effectiveness of the intervention. The direct comparison of language groups within the same intervention period further strengthens our findings.

However, important limitations must be considered. The study’s confinement to a single health care system in central Pennsylvania and northern Maryland represents a significant limitation that may affect generalizability to other geographic regions, patient populations, and health care delivery models. Health care systems differ considerably in their organizational structures, patient demographics, and existing approaches to preventive care, which may influence the effectiveness of AI-powered interventions across different settings. In addition, the short study duration may not capture longer-term engagement patterns. A critical limitation is the absence of follow-up data on FIT test completion rates, which prevents assessment of whether higher engagement and opt-in rates actually translated to completed screening tests. This gap leaves an important open question about the intervention’s true impact on screening compliance and its potential to reduce disparities in completed screenings. Our study specifically targeted patients without active web-based health system profiles, which intentionally focused on less digitally engaged individuals but consequently may have excluded patients who regularly interact with health care services through digital platforms. This methodological choice could introduce selection bias. We were also unable to assess socioeconomic, medical, and other factors that might influence engagement, further limiting our ability to fully contextualize our findings. For example, other unaccounted-for factors, such as cultural or social factors among the Spanish-speaking population, might have contributed to a higher response rate among this population relative to the English-speaking population.

Our findings suggest that AI-powered outreach can effectively complement existing care delivery systems, particularly for traditionally underserved populations. The higher engagement rates among Spanish-speaking patients indicate that language-concordant AI interactions may help address longstanding disparities in preventive care access and usage. Health care systems seeking to improve screening rates among diverse populations should consider implementing multilingual AI outreach as part of their comprehensive screening strategy.

The promising results of this intervention suggest that technological solutions may not inherently exacerbate health care disparities as some have feared. While our findings show higher engagement and FIT test opt-in rates among Spanish-speaking patients, these are preliminary outcomes that represent only the initial steps in the screening process. Nevertheless, these early results have potentially important implications for health care policy and resource allocation for technological innovations in health care delivery, warranting further investigation of AI-powered interventions across the complete screening continuum.

The markedly higher engagement among Spanish-speaking patients suggests that AI-powered outreach may be particularly effective in reaching traditionally underserved populations. This finding is especially relevant given the historical challenges in engaging Hispanic and Latino communities in preventive care programs. The success of this intervention demonstrates that AI, when properly implemented, can serve as a tool for promoting health equity rather than perpetuating disparities. This approach could be further adapted for other minority populations by incorporating additional languages (such as Mandarin, Vietnamese, or Arabic) and culturally specific communication patterns. Adaptation would require not only linguistic translation but also cultural tailoring of messaging, addressing population-specific barriers to screening, and incorporating community input into AI design. Similar to how the Spanish-language AI agent effectively engaged Hispanic patients, culturally responsive AI systems could potentially bridge engagement gaps for other underserved groups.

Future research should prioritize the assessment of long-term patient adherence to screening recommendations following AI-powered outreach, tracking not only initial engagement but also subsequent screening behaviors over multiple years. Studies should also evaluate the cost-effectiveness of AI interventions compared with traditional outreach methods, including analyses of implementation costs, health care resource usage, and potential cost savings from earlier cancer detection. Most critically, research must determine whether the higher engagement observed with AI-driven interventions ultimately translates to improved clinical outcomes, including increased rates of early-stage cancer detection, reduced cancer mortality, and narrowed disparities in health outcomes. In addition, future studies should explore how AI outreach models can be specifically tailored to address the unique cultural, linguistic, and health-belief characteristics of diverse ethnic groups, particularly focusing on populations whose primary language is not English. Such culturally adaptive AI approaches might incorporate cultural nuances, dialect variations, and community-specific health concerns to enhance relevance and effectiveness across different populations. Investigation of AI outreach effectiveness in other languages and cultural contexts would provide valuable insights for expansion, while studies examining patient experience with AI-powered interactions would further inform implementation strategies.

Conclusion

This study demonstrates that carefully designed AI-powered outreach through multilingual AI care agents can effectively engage diverse populations in preventive care, particularly benefiting traditionally underserved Spanish-speaking communities. The significantly higher engagement and FIT test opt-in rates among Spanish-speaking patients challenge previous assumptions about technological interventions potentially disadvantaging non–English-speaking populations. These findings suggest that language-concordant AI interactions may help address longstanding disparities in preventive care access.

Future research should prioritize immediate needs to track FIT test completion rates following AI outreach, expand evaluation to diverse health care settings beyond a single system, and conduct qualitative research with Spanish-speaking patients to understand engagement factors. In the mid-term, researchers should evaluate the cost-effectiveness of multilingual AI interventions compared with traditional methods, including applications beyond CRC screening; extend linguistic capabilities to additional languages with appropriate cultural adaptations; and develop standardized implementation frameworks for equitable deployment. Long-term research priorities should focus on tracking clinical outcomes including early cancer detection rates and mortality to determine if AI outreach ultimately reduces disparities. Investigating how AI care agents can address intersecting social determinants of health contributing to screening disparities and developing integrated models that combine AI outreach with human navigation for complex cases, may prove particularly fruitful.

As health care systems increasingly adopt technological solutions, this research agenda will ensure that AI applications are developed, deployed, and evaluated with explicit attention to health equity principles, including cultural competency, language access, and community engagement. By following this roadmap, future innovations can build on our findings to promote rather than hinder health equity in preventive care and beyond.

Acknowledgments

This research was supported by Hippocratic AI. AI tools were used in the preparation of this manuscript for initial draft assistance with manuscript language and structure, grammar and style refinement, and reference formatting. All AI-generated content was thoroughly reviewed, verified, and edited by the authors. All statistical analyses, interpretations, and conclusions were conducted and validated by the human authors. The authors take full responsibility for the final content of this manuscript.

Data Availability

The datasets analyzed during this study are available from the corresponding author on reasonable request.

Conflicts of Interest

MB, MSA, GM, RL, MRD, AM, SM, SG, and AC are employees of Hippocratic AI, which provided funding for this study. RHB is an employee of WellSpan, which provided data for this study. JDA is an Adjunct Professor at the University of British Columbia and received compensation for work performed on this project. AA is an employee of UC Davis Health and received compensation for work performed on this project. All authors have reviewed and approved the manuscript and materials included in this submission.

  1. Nicholson FB, Barro JL, Atkin W, Lilford R, Patnick J, Williams CB, et al. Review article: population screening for colorectal cancer. Aliment Pharmacol Ther. 2005;22(11-12):1069-1077. [CrossRef] [Medline]
  2. Levin B, Lieberman DA, McFarland B, Smith RA, Brooks D, Andrews KS, American Cancer Society Colorectal Cancer Advisory Group, US Multi-Society Task Force, et al. American College of Radiology Colon Cancer Committee. Screening and surveillance for the early detection of colorectal cancer and adenomatous polyps, 2008: a joint guideline from the American Cancer Society, the US multi-society task force on colorectal cancer, and the American college of radiology. CA Cancer J Clin. 2008;58(3):130-160. [FREE Full text] [CrossRef] [Medline]
  3. Lin JS, Perdue LA, Henrikson NB, Bean SI, Blasi PR. Screening for colorectal cancer: updated evidence report and systematic review for the US preventive services task force. JAMA. 2021;325(19):1978-1998. [CrossRef] [Medline]
  4. Li D. Recent advances in colorectal cancer screening. Chronic Dis Transl Med. 2018;4(3):139-147. [FREE Full text] [CrossRef] [Medline]
  5. Qaseem A, Denberg TD, Hopkins RH, Humphrey LL, Levine J, Sweet DE, et al. Clinical Guidelines Committee of the American College of Physicians. Screening for colorectal cancer: a guidance statement from the American college of physicians. Ann Intern Med. 2012;156(5):378-386. [FREE Full text] [CrossRef] [Medline]
  6. Shaukat A, Levin TR. Current and future colorectal cancer screening strategies. Nat Rev Gastroenterol Hepatol. 2022;19(8):521-531. [FREE Full text] [CrossRef] [Medline]
  7. Ness RM, Llor X, Abbass MA, Bishu S, Chen CT, Cooper G, et al. NCCN Guidelines® Insights: Colorectal Cancer Screening, Version 1.2024. J Natl Compr Canc Netw. 2024;22(7):438-446. [FREE Full text] [CrossRef]
  8. Pollack LA, Blackman DK, Wilson KM, Seeff LC, Nadel MR. Colorectal cancer test use among Hispanic and non-hispanic U.S. populations. Prev Chronic Dis. 2006;3(2):A50. [FREE Full text] [Medline]
  9. May FP, Yang L, Corona E, Glenn BA, Bastani R. Disparities in colorectal cancer screening in the United States before and after implementation of the affordable care act. Clin Gastroenterol Hepatol. 2020;18(8):1796-1804.e2. [CrossRef] [Medline]
  10. Viramontes O, Bastani R, Yang L, Glenn BA, Herrmann AK, May FP. Colorectal cancer screening among Hispanics in the United States: disparities, modalities, predictors, and regional variation. Prev Med. 2020;138:106146. [FREE Full text] [CrossRef] [Medline]
  11. Wang J, Moehring J, Stuhr S, Krug M. Barriers to colorectal cancer screening in hispanics in the United States: an integrative review. Appl Nurs Res. 2013;26(4):218-224. [CrossRef] [Medline]
  12. Gonzalez SA, Ziebarth TH, Wang J, Noor AB, Springer DL. Interventions promoting colorectal cancer screening in the Hispanic population: a review of the literature. J Nurs Scholarsh. 2012;44(4):332-340. [CrossRef] [Medline]
  13. Naylor K, Ward J, Polite BN. Interventions to improve care related to colorectal cancer among racial and ethnic minorities: a systematic review. J Gen Intern Med. 2012;27(8):1033-1046. [FREE Full text] [CrossRef] [Medline]
  14. Mojica CM, Parra-Medina D, Vernon S. Interventions promoting colorectal cancer screening among latino men: A systematic review. Prev Chronic Dis. 2018;15:E31. [FREE Full text] [CrossRef] [Medline]
  15. Watanabe-Galloway S, Ratnapradipa K, Subramanian R, Ramos A, Famojuro O, Schmidt C, et al. Mobile health (mHealth) interventions to increase cancer screening rates in hispanic/latinx populations: A scoping review. Health Promot Pract. 2023;24(6):1215-1229. [CrossRef] [Medline]
  16. Rozario MA, Walton A, Kang M, Padilla BI. Colorectal cancer screening: A quality improvement initiative using a bilingual patient navigator, mobile technology, and fecal immunochemical testing to engage hispanic adults. Clin J Oncol Nurs. 2021;25(4):423-429. [CrossRef] [Medline]
  17. Villegas N, Cianelli R, de Tantillo L, Warheit M, Montano NP, Ferrer L, et al. Assessment of technology use and technology preferences for HIV prevention among hispanic women. Hisp Health Care Int. 2018;16(4):197-203. [FREE Full text] [CrossRef] [Medline]
  18. Duarte AC, Thomas SA. The use of phone technology in outpatient populations: A systematic review. Open Nurs J. 2016;10:45-58. [CrossRef] [Medline]
  19. Hunter JB, de Zapien JG, Papenfuss M, Fernandez ML, Meister J, Giuliano AR. The impact of a promotora on increasing routine chronic disease prevention among women aged 40 and older at the U.S.-Mexico border. Health Educ Behav. 2004;31(4 Suppl):18S-28S. [CrossRef] [Medline]
  20. Tarver WL, Haggstrom DA. The use of cancer-specific patient-centered technologies among underserved populations in the United States: systematic review. J Med Internet Res. 2019;21(4):e10256. [FREE Full text] [CrossRef] [Medline]
  21. Alford J, Rathod N. AI could worsen health inequities for UK’s minority ethnic groups - new report. Imperial News. URL: https://www.imperial.ac.uk/news/230413/ai-could-worsen-health-inequities-uks/ [accessed 2024-11-19]
  22. Nazer LH, Zatarah R, Waldrip S, Ke JXC, Moukheiber M, Khanna AK, et al. Bias in artificial intelligence algorithms and recommendations for mitigation. PLOS Digit Health. 2023;2(6):e0000278. [FREE Full text] [CrossRef] [Medline]
  23. Barton M, Hamza M, Guevel B. Racial equity in healthcare machine learning: illustrating bias in models with minimal bias mitigation. Cureus. 2023;15(2):e35037. [FREE Full text] [CrossRef] [Medline]
  24. Obermeyer Z, Powers B, Vogeli C, Mullainathan S. Dissecting racial bias in an algorithm used to manage the health of populations. Science. 2019;366(6464):447-453. [FREE Full text] [CrossRef] [Medline]
  25. Seyyed-Kalantari L, Zhang H, McDermott MBA, Chen IY, Ghassemi M. Underdiagnosis bias of artificial intelligence algorithms applied to chest radiographs in under-served patient populations. Nat Med. 2021;27(12):2176-2182. [FREE Full text] [CrossRef] [Medline]
  26. Mbakwe AB, Lourentzou I, Celi LA, Wu JT. Fairness metrics for health AI: we have a long way to go. EBioMedicine. 2023;90:104525. [FREE Full text] [CrossRef] [Medline]
  27. Siddique SM, Tipton K, Leas B, Jepson C, Aysola J, Cohen JB, et al. The impact of health care algorithms on racial and ethnic disparities : A systematic review. Ann Intern Med. 2024;177(4):484-496. [FREE Full text] [CrossRef] [Medline]
  28. Longhurst CA, Singh K, Chopra A, Atreja A, Brownstein JS. A call for artificial intelligence implementation science centers to evaluate clinical effectiveness. NEJM AI. 2024;1(8). [CrossRef]
  29. Woodward EN, Matthieu MM, Uchendu US, Rogal S, Kirchner JE. The health equity implementation framework: proposal and preliminary study of hepatitis C virus treatment. Implement Sci. 2019;14(1):26. [FREE Full text] [CrossRef] [Medline]
  30. Baumann AA, Cabassa LJ. Reframing implementation science to address inequities in healthcare delivery. BMC Health Serv Res. 2020;20(1):190. [FREE Full text] [CrossRef] [Medline]
  31. Woodward EN, Singh RS, Ndebele-Ngwenya P, Melgar Castillo A, Dickson KS, Kirchner JE. A more practical guide to incorporating health equity domains in implementation determinant frameworks. Implement Sci Commun. 2021;2(1):61. [FREE Full text] [CrossRef] [Medline]
  32. Hear our genAI healthcare agents in action. Hippocratic AI. URL: https://www.hippocraticai.com/video [accessed 2024-12-04]
  33. IBM SPSS Statistics. IBM Corp. 2021. URL: https://www.ibm.com/products/spss-statistics [accessed 2025-04-26]
  34. Guidance Regarding Methods for De-identification of Protected Health Information in Accordance with the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule. U.S. Department of Health and Human Services. URL: https://www.hhs.gov/hipaa/for-professionals/special-topics/de-identification/index.html#top
  35. Embi PJ. Algorithmovigilance-advancing methods to analyze and monitor artificial intelligence-driven health care for effectiveness and equity. JAMA Netw Open. 2021;4(4):e214622. [FREE Full text] [CrossRef] [Medline]


AI: artificial intelligence
CRC: colorectal cancer
EquIR: Equity-based Implementation Research
FIT: fecal immunochemical test
OR: odds ratio


Edited by J Sarvestan; submitted 28.01.25; peer-reviewed by S Mohamed Shaffi, S Mohanadas, O Ibikunle, OS Igunma; comments to author 24.02.25; revised version received 17.03.25; accepted 05.04.25; published 25.06.25.

Copyright

©Meenesh Bhimani, R Hal Baker, Markel Sanz Ausin, Gerald Meixiong, Rae Lasko, Mariska Raglow-Defranco, Alex Miller, Subhabrata Mukherjee, Saad Godil, Anderson Cook, Jonathan D Agnew, Ashish Atreja. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 25.06.2025.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research (ISSN 1438-8871), is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.