Research Letter
Abstract
The study investigated gender bias in GPT-4’s assessment of coronary artery disease risk by presenting identical clinical vignettes of men and women with and without psychiatric comorbidities. Results suggest that psychiatric conditions may influence GPT-4’s coronary artery disease risk assessment among men and women.
J Med Internet Res 2024;26:e54242doi:10.2196/54242
Keywords
Introduction
With the emergence of large language models (LLMs), the use of artificial intelligence (AI) in health care settings is growing, but this is not without risks. While these algorithms offer advancements in diagnostics and clinical decision-making by analyzing vast amounts of data, they may unintentionally reinforce preexisting biases and inequities as they are trained on potentially biased data [
- ].Gender bias refers to the systematic and often unconscious differential or inappropriate treatment and consideration of patients based on their gender [
]. This phenomenon has been extensively reported in the management of cardiovascular disease, with evidence showing that women are inadequately represented in research, underdiagnosed, and subject to treatment disparities [ , ]. Mental illness stigma represents another bias, leading to suboptimal care that may also impact women more significantly, with research suggesting that physicians are more likely to view men’s symptoms as physical and women’s as psychosocial [ , ].This study investigated if LLMs reproduce existing gender biases when presented with clinical vignettes of chronic coronary syndrome with and without psychiatric comorbidities using GPT-4 (OpenAI) [
], an LLM trained on a diverse range of internet text, whose performance in the cardiovascular field has previously been validated [ ].Methods
We generated hypothetical clinical scenarios by matching men and women with identical chest pain types (typical, atypical, and nonanginal) using the same symptom descriptions. We adjusted their ages to achieve nearly identical pretest probabilities (with a maximal pretest probability gap of 5%) of obstructive coronary artery disease (CAD) based on the latest chronic coronary syndrome guidelines [
]. We submitted these cases to GPT-4, asking which patient, woman or man, had a higher risk for obstructive CAD ( ). We then added psychiatric comorbidities to the original scenarios (generalized anxiety disorder, borderline personality disorder, major depressive disorder, schizophrenia, and bipolar I disorder). These cases were submitted individually to GPT-4 to ensure no carryover effects from previous discussions. Overall, 30 cases were presented to GPT-4, with 5 having no psychiatric comorbidities and 25 having psychiatric comorbidities. Mean age was 65.8 (SD 6.24) years for women and 52.8 (SD 4.96) years for men. The cases were presented with typical chest pain (n=12), atypical chest pain (n=12), and nonanginal symptoms (n=6).Here is a multiple-choice question:
I am a general practitioner in a city in Europe and I have seen 2 patients with chest pain.
a) [age]-year-old female patient, without cardiovascular risk factor, with [typical/atypical/non-anginal] chest pain for one year.
b) [age]-year-old male patient, without cardiovascular risk factor, with [typical/atypical/non-anginal] chest pain for one year.
Which of these patients (a or b) is at a higher risk of having obstructive coronary artery disease?
Results
Among the 5 scenarios without psychiatric comorbidities, GPT-4 indicated that women had a higher risk of obstructive CAD in 100% of the cases, arguing that women’s higher age was a decisive factor. When adding psychiatric conditions, GPT-4’s response substantially changed, indicating that men had a higher risk of CAD in 56% (14/25) of the cases (
).Patient group and gender | Risk, n (%) | ||
Apparently healthy patient (n=5) | |||
Women | 5 (100) | ||
Men | 0 (0) | ||
Same patient with psychiatric disorder (n=25) | |||
Women | 11 (44) | ||
Men | 14 (56) |
Discussion
This study’s primary finding is a substantial shift in the perception of risk between men and women when a psychiatric comorbidity is added to the vignette. Despite presenting identical complaints in scenarios without psychiatric comorbidities, nearly 1 in 2 women was suddenly assessed as having a lower pretest probability when concurrently having a psychiatric condition.
This suggests that the inclusion of a psychiatric comorbidity could alter the algorithm’s assessment of CAD risk among men and women. This shift could be interpreted as a sign of mental illness stigma, affecting the risk assessment of patients with psychiatric comorbidities [
]. Although these findings should be confirmed with more cases, it is interesting to observe this in a smaller sample. Another complementary interpretation is that women’s chest pain symptoms may be more frequently undervalued as being psychological compared to men’s, as reported by Colameco et al [ ]. Indeed, 120 physicians assessed the management of headache or abdominal pain based on gender and reported that women were perceived as being more emotional. Moreover, no scientific evidence suggests that any of these psychiatric conditions disproportionately increase the risk of CAD in men over women.We acknowledge the low number of cases tested due to constraints in combining possibilities with the pretest probability table [
]. Moreover, our study only assessed the answer to the first prompt and did not evaluate the potential variability in GPT-4’s responses. Consequently, the low number of cases presented for assessment does not yield strong power. However, this choice of analysis is also a strength, as the clinical cases were based on real pretest probabilities [ ] and 1 case per scenario was used to mimic real-life scenarios.Concerns have already been raised regarding GPT-4’s decision-making mechanism, often perceived as a “black box.” These preliminary data suggest that GPT-4 may underperform in marginalized groups and corroborate the need for explainable models and the integration of bias detection systems. These results warrant further investigation with different LLMs and clinical scenarios investigating other diseases.
Data Availability
The raw data from this study, as well as the prompts used, are available to readers by contacting the principal investigator, upon reasonable request.
Conflicts of Interest
OM has received honoraria and/or research grant from Edwards Lifesciences and Abbott.
References
- Khera R, Butte AJ, Berkwits M, Hswen Y, Flanagin A, Park H, et al. AI in medicine-JAMA's focus on clinical outcomes, patient-centered care, quality, and equity. JAMA. Sep 05, 2023;330(9):818-820. [FREE Full text] [CrossRef] [Medline]
- Eaneff S, Obermeyer Z, Butte AJ. The case for algorithmic stewardship for artificial intelligence and machine learning technologies. JAMA. Oct 13, 2020;324(14):1397-1398. [CrossRef] [Medline]
- Cirillo D, Catuara-Solarz S, Morey C, Guney E, Subirats L, Mellino S, et al. Sex and gender differences and biases in artificial intelligence for biomedicine and healthcare. NPJ Digit Med. Jun 1, 2020;3:81. [FREE Full text] [CrossRef] [Medline]
- Hamberg K. Gender bias in medicine. Womens Health (Lond). May 2008;4(3):237-243. [FREE Full text] [CrossRef] [Medline]
- Vogel B, Acevedo M, Appelman Y, Bairey Merz CN, Chieffo A, Figtree GA, et al. The Lancet Women and Cardiovascular Disease Commission: reducing the global burden by 2030. Lancet. Jun 19, 2021;397(10292):2385-2438. [CrossRef] [Medline]
- Huber E, Le Pogam MA, Clair C. Sex related inequalities in the management and prognosis of acute coronary syndrome in Switzerland: cross sectional study. BMJ Med. Nov 17, 2022;1(1):e000300. [FREE Full text] [CrossRef] [Medline]
- Colameco S, Becker L, Simpson M. Sex bias in the assessment of patient complaints. J Fam Pract. Jun 1983;16(6):1117-1121. [Medline]
- Subu MA, Wati DF, Netrida N, Priscilla V, Dias JM, Abraham MS, et al. Types of stigma experienced by patients with mental illness and mental health nurses in Indonesia: a qualitative content analysis. Int J Ment Health Syst. Oct 18, 2021;15(1):77. [FREE Full text] [CrossRef] [Medline]
- GPT-4. OpenAI. URL: https://openai.com/index/gpt-4/ [accessed 2024-10-14]
- Skalidis I, Cagnina A, Luangphiphat W, Mahendiran T, Muller O, Abbe E, et al. ChatGPT takes on the European Exam in Core Cardiology: an artificial intelligence success story? Eur Heart J Digit Health. May 2023;4(3):279-281. [FREE Full text] [CrossRef] [Medline]
- Knuuti J, Wijns W, Saraste A, Capodanno D, Barbato E, Funck-Brentano C, et al. 2019 ESC Guidelines for the diagnosis and management of chronic coronary syndromes. Eur Heart J. Jan 14, 2020;41(3):407-477. [FREE Full text] [CrossRef] [Medline]
Abbreviations
AI: artificial intelligence |
CAD: coronary artery disease |
LLM: large language model |
Edited by A Mavragani; submitted 02.11.23; peer-reviewed by K Jordan, A Bashir, J Walsh; comments to author 13.03.24; revised version received 11.06.24; accepted 10.09.24; published 22.10.24.
Copyright©Margaux Achtari, Adil Salihu, Olivier Muller, Emmanuel Abbé, Carole Clair, Joëlle Schwarz, Stephane Fournier. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 22.10.2024.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research (ISSN 1438-8871), is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.