Original Paper
Abstract
Background: Doctor review websites have become increasingly popular as a source of information for patients looking to select a primary care provider. Zocdoc is one such platform that allows patients to not only rate and review their experiences with doctors but also directly schedule appointments. This study examines how several physician characteristics including gender, age, race, languages spoken in a physician’s office, education, and facial attractiveness impact the average numerical rating of primary care doctors on Zocdoc.
Objective: The aim of this study was to investigate the association between physician characteristics and patient satisfaction ratings on Zocdoc.
Methods: A data set of 1455 primary care doctor profiles across 30 cities was scraped from Zocdoc. The profiles contained information on the physician’s gender, education, and languages spoken in their office. Age, facial attractiveness, and race were imputed from profile pictures using commercial facial analysis software. Each doctor profile listed an average overall satisfaction rating, bedside manner rating, and wait time rating from verified patients. Descriptive statistics, the Wilcoxon rank sum test, and multivariate logistic regression were used to analyze the data.
Results: The average overall rating on Zocdoc was highly positive, with older age, lower facial attractiveness, foreign degrees, allopathic degrees, and speaking more languages negatively associated with the average rating. However, the effect sizes of these factors were relatively small. For example, graduates of Latin American medical schools had a mean overall rating of 4.63 compared to a 4.77 rating for US graduates (P<.001), a difference roughly equivalent to a 2.8% decrease in appointments. On multivariate analysis, being Asian and having a doctor of osteopathic medicine degree were positively associated with higher overall ratings, while attending a South Asian medical school and speaking more European and Middle Eastern languages in the office were negatively associated with higher overall ratings.
Conclusions: Overall, the findings suggest that age, facial attractiveness, education, and multilingualism do have some impact on web-based doctor reviews, but the numerical effect is small. Notably, bias may play out in many forms. For example, a physician's appearance or accent may impact a patient's trust, confidence, or satisfaction with their physician, which could in turn influence their take-up of preventative services and lead to either better or worse health outcomes. The study highlights the need for further research in how physician characteristics influence patient ratings of care.
doi:10.2196/51672
Keywords
Introduction
There has been a growing interest in understanding the factors that may influence a patient's perception of their physician and how these perceptions might in turn impact quality of care. Research has shown that patients often rely on nonverbal cues, such as a physician's attire, to form their opinions about their health care provider [
- ]. With the advent of various web-based review platforms, such as Healthgrades and Zocdoc, patients have easy access to a wealth of information about their health care providers, as well as a relatively straightforward means to express their satisfaction. Research suggests that these web-based ratings do influence patients' choices of medical providers [ - ]. On the web-based physician review platform Zocdoc, a half-star improvement in ratings on a scale of 1-5 stars leads to a 10% increase in appointments [ , ].Past research is mixed on the factors influencing these web-based reviews. Some research suggests that patient experience and clinical quality directly affect web-based ratings [
, ]. Other research suggests that there exists minimal correlation between the quality and value of care or peer-assessed performance with web-based ratings, and that nonphysician characteristics such as staff friendliness and appointment wait times are instead the key determinants [ - ].What remains clear, however, is that many different factors play a role [
, ]. Web-based reviews for female physicians, for example, have been found to be more emotional and informal compared to those for male physicians [ ]. Moreover, female physicians receive lower web-based ratings than their male counterparts on some platforms, though this trend is not consistent across all platforms [ - ]. Other research finds that patient-physician racial concordance is associated with higher scores on internal patient satisfaction surveys [ ]. One study of a direct-to-consumer telemedicine platform shows that, on average, patients report higher rates of dissatisfaction with Black and Asian physicians compared to White physicians [ ].However, the influence of physician factors on physician ratings is likely not limited to gender and race alone. Despite the growing influence of web-based review sites, much remains unknown about the various factors that may influence physician ratings. For instance, many studies have examined the association between patient attractiveness and physician care practices [
- ]. However, to our knowledge, no study has examined how physician attractiveness impacts patient satisfaction in real-world medical settings.Other factors such as school ranking and multilingualism are similarly understudied. Existing research finds little correlation between medical school ranking and performance scores [
]. In addition, language-concordant care is associated with increased patient satisfaction [ , ]. However, the roles of ranking and multilingualism have not been examined in web-based review settings.In this paper, we aim to explore the role of various physician characteristics, including gender, age, race, number of languages spoken in a physician’s office, education, and facial attractiveness, in shaping patient perceptions of their physicians through ratings on a web-based doctor review site.
Methods
We collected primary care doctor profile data from Zocdoc, a platform that allows patients to rate and review their experiences with doctors. Physician variables included facial attractiveness, race, age, gender, language, and education. Outcome variables included overall satisfaction, bedside manner, and wait time scores from patient reviews. Descriptive statistics, the Wilcoxon rank sum test, and multivariate logistic regression were used to analyze the data.
Data Collection
We collected a data set of primary care doctor profiles from Zocdoc, a web-based platform that allows patients to search for and schedule telemedicine and in-person appointments with doctors, and also to rate and review their experiences. Notably, unlike many other sites that host physician reviews, Zocdoc only posts reviews from patients who have attended an appointment [
]. After appointments, patients receive emails from Zocdoc asking for feedback. Zocdoc is free for patients to use, and makes money by charging health care providers subscription or booking fees [ ]. Health care providers may come from both independent practices and integrated networks. Patients can search for a physician using criteria such as condition, specialty, city, state, ZIP code, insurance carrier and plan, or a specific doctor's name. They can then filter by date, time of day, distance, gender, in-person or video consultations, hospital affiliation, and languages spoken by the physician.Zocdoc does not maintain a public-centralized list of all providers on the platform. In order to systematically collect a large sample of providers, we used Zocdoc’s search engine to identify primary care doctors within 50 miles of each of the 30 largest cities in the United States [
]. Specifying ‟primary care doctor” in the search bar yields a variety of specialties, including internists, nurse practitioners, and physician assistants. We filtered these results specifically to profiles that listed their specialty as primary care doctor. We collected these profiles using a browser-based web scraping tool [ ].Physician Variables
From each profile, we collected the listed gender, languages spoken in the physician’s office, type of medical school degree (allopathic or osteopathic), medical school institution, and downloaded the physician’s profile picture. All physicians were either graduates of allopathic medical schools, which grant doctor of medicine degrees, or osteopathic medical schools, which grant doctor of osteopathic medicine degrees. We determined the geographical location of each medical school (Africa, the Caribbean, East Asia, Europe, Latin American, the Middle East, South Asian, the mainland United States or Canada, other) by searching for each medical school on Google. The US medical schools were additionally coded for whether or not they were ranked in the top 30 of the US News & World Report’s 2023-2024 Medical School Research rankings [
]. For analysis purposes, we divided the number of languages spoken into buckets (1, 2, 3, and ≥4). We additionally broke down the number of languages into 7 categories: European, East and Southeast Asian, South Asian, Middle Eastern, African, Caribbean, and Creole (Table S1 in ).Through Face++, a commercial facial analysis software commonly used to infer demographic factors, we imputed a facial attractiveness rating for each profile [
- ]. Face++’s estimation of facial attractiveness has been found to correlate well (r=0.72) with human raters [ ]. We use a second commonly used commercial facial analysis software, Kairos, to infer race and age, which has been found to outperform Face++’s race and age models [ - ]. One study from 2019 found the mean absolute error for Kairos’ age model to be ±3.30 (SD 2.64) years when compared to human raters’ estimates as ground truth [ ]. Accuracy for Kairos’ race model was 95.06% (95% CI 94.08%-95.93%) when compared to human raters [ ]. Similar methods have been used to understand the diversity in hospital system faculty and medical editorial boards [ , ]. To examine potential nonlinearities and for interpretability purposes, we divided facial attractiveness and age into quartiles. For robustness, we compared Face++’s age prediction results with those of Kairos (Tables S2-S3 in and ).Outcome Variables
Each doctor profile was associated with 3 average scores from patient reviews: overall satisfaction, bedside manner, and wait time. Each score is rated on a scale of 1-5 stars. We additionally collected the total number of reviews. Profiles with 0 reviews were removed from the sample.
Statistical Analysis
The data are analyzed at the physician level. We used descriptive statistics to summarize the data and the Wilcoxon rank sum test to determine if there were significant differences in average review scores across gender, age quartile, facial attractiveness quartile, education, and number of languages spoken. Multivariable logistic regression was used to assess the association between these factors and average review scores. The outcome for the regression is a binary variable indicating whether or not the physician is above the 25th percentile of ratings to understand factors associated with very low ratings. All statistical analyses were conducted using R (version 4.1.2; The R Foundation).
Ethical Considerations
This study used publicly available data posted for public use by the providers, and therefore, did not require institutional review board approval.
Results
A total of 1521 primary care doctor profiles from Zocdoc were collected. Of these, 66 were missing an overall satisfaction rating and excluded, leaving a sample of 1455 primary care doctor profiles for analysis (
). One profile lacked information on the medical school attended, one profile contained a profile picture from which facial attractiveness, age, and race could not be estimated, and one profile contained a profile picture for which only facial attractiveness could not be assessed. These 3 profiles were still included for analysis. Three cities—Houston, Chicago, and New York City—represented 49% of all primary care doctors in the sample (Table S4 in ). The majority of physicians were men (54%) with no nonbinary individuals represented in the study. The sample included speakers of 66 languages, with a mean of 1.73 languages spoken (Table S1 in ).Characteristics | ||||
Overall satisfaction rating, median (IQR) | 4.82 (4.67-4.93) | |||
Wait time rating, median (IQR) | 4.67 (4.45-4.83) | |||
Bedside manner rating, median (IQR) | 4.88 (4.75-4.98) | |||
Number of reviews, median (IQR) | 57 (19-230) | |||
Gender, n (%) | ||||
Female | 675 (46) | |||
Male | 780 (54) | |||
Age (in years), mean (SD) | 31 (8) | |||
Race, n (%) | ||||
Asian | 598 (41) | |||
Black | 107 (7.4) | |||
Hispanic | 149 (10) | |||
White | 600 (41) | |||
Facial attractiveness, mean (SD) | 56 (12) | |||
Top 30 medical school, n (%) | 106 (7.3) | |||
Degree, n (%) | ||||
DOb | 279 (19) | |||
MDc | 1176 (81) | |||
Medical school location, n (%) | ||||
Africa | 28 (1.9) | |||
Caribbean | 141 (9.7) | |||
East and Southeast Asia | 40 (2.8) | |||
Europe | 68 (4.7) | |||
Latin America | 59 (4.1) | |||
Middle East | 38 (2.6) | |||
Other | 3 (0.2) | |||
South Asia | 180 (12) | |||
United States or Canada | 897 (62) | |||
Number of languages spoken in office, n (%) | ||||
1 | 858 (59) | |||
2 | 326 (22) | |||
3 | 165 (11) | |||
≥4 | 106 (7.3) | |||
Number of European languages, mean (SD) | 1.29 (0.60) | |||
Number of South Asian languages, mean (SD) | 0.26 (0.73) | |||
Number of East or Southeast Asian languages, mean (SD) | 0.07 (0.33) | |||
Number of Middle Eastern languages, mean (SD) | 0.06 (0.26) | |||
Number of African languages, mean (SD) | 0.0158 (0.1543) | |||
Number of Creole languages, mean (SD) | 0.0034 (0.0585) |
aBedside rating and wait time rating were missing from 35 profiles; age, race, attractiveness, and medical school were missing from 1-2 profiles.
bDO: doctor of osteopathic medicine.
cMD: doctor of medicine.
Univariate Analysis
Average overall ratings on Zocdoc are typically highly positive with a median of 4.82 (IQR 4.67-4.93). We find significant but small differences in mean overall rating by physician characteristics including age quartile, facial attractiveness quartile, degree country, degree type, and number of languages spoken (
, Table S5 in ). For example, the median overall rating for primary care doctors in the first quartile of age is 4.83 (IQR 4.69-4.92) but 4.78 (IQR 4.61-4.93) for doctors in the fourth quartile (Wilcoxon rank sum test, P=.02). We find similar results when using Face++’s age estimates (Table S2 in ). The median overall rating for primary care doctors in the first quartile of facial attractiveness is 4.81 (IQR 4.66-4.93) but 4.84 (IQR 4.72-4.93) for doctors in the fourth quartile (P=.03). Osteopathic physicians have a median overall rating of 4.86 (IQR 4.74-4.96) compared to 4.81 (IQR 4.66-4.92) for allopathic physicians (P<.001). The median overall rating for primary care doctors with a US or Canadian degree is 4.85 (IQR 4.70-4.94) compared to 4.79 (IQR 4.61-4.90) for those with a foreign degree (P<.001). However, not all foreign-educated physicians have significantly lower scores. For example, Caribbean-educated physicians have a median rating of 4.84 (IQR 4.73-4.93). Physicians educated in Latin America, however, have a median overall rating of 4.71 (IQR 4.54-4.88), a gap of 0.14 in median overall ratings compared with the US- or Canadian-educated physicians. Doctors with 1 language spoken in their office have a median overall rating of 4.84 (IQR 4.69-4.95) while doctors with ≥4 languages have a median overall rating of 4.76 (IQR 4.52-4.85) (P<.001). We observe no significant differential effects in overall ratings by gender across the other variables.Average bedside manner rating is similarly highly positive with a median of 4.88 (IQR 4.75-4.98). We find significant but small differences by physician characteristics including age quartile, degree type, medical school location, and number of languages spoken (Table S5 in
). For example, the median bedside manner rating for primary care doctors in the first quartile of age is 4.89 (IQR 4.77-4.97) but 4.84 (IQR 4.70-4.96) for doctors in the fourth quartile (P<.001). Osteopathic physicians have a median bedside manner rating of 4.91 (IQR 4.81-5.00) compared to 4.87 (IQR 4.73-4.97) for allopathic physicians (P<.001). The median bedside manner rating for primary care doctors with a US or Canadian degree is 4.90 (IQR 4.77-5.00) compared to 4.85 (IQR 4.70-4.95) for those with a foreign degree (P<.001). Doctors with 1 language spoken in their office have a median bedside manner rating of 4.90 (IQR 4.77-5.00) while doctors with ≥4 languages have a median bedside manner rating of 4.81 (IQR 4.58-4.91; P<.001).The average wait time rating is similarly highly positive with a median of 4.67 (IQR 4.45-4.83). We find significant but small differences by physician characteristics including age quartile, facial attractiveness quartile, degree type, medical school location, and number of languages spoken (Table S5 in
). For example, the median wait time rating for primary care doctors in the first quartile of age is 4.72 (IQR 4.50-4.86) but 4.62 (IQR 4.38-4.79) for doctors in the fourth quartile (P<.001). The median wait time rating for primary care doctors in the first quartile of facial attractiveness is 4.64 (IQR 4.40-4.81) but 4.71 (IQR 4.53-4.85) for doctors in the fourth quartile (P=.003). Osteopathic physicians have a median wait time rating of 4.74 (IQR 4.54-4.88) compared to 4.65 (IQR 4.42-4.82) for allopathic physicians (P<.001). The median wait time rating for primary care doctors with a US degree is 4.69 (IQR 4.50-4.84) compared to 4.62 (IQR 4.40-4.81) for those with a foreign degree (P=002). Doctors with 1 language spoken in their office have a mean wait time rating of 4.70 (IQR 4.50-4.85) while doctors who speak ≥4 languages have a median wait time rating of 4.46 (IQR 4.24-4.67; P<.001).Having excluded 66 profiles with zero reviews, the vast majority of profiles have multiple reviews with a median of 57 (IQR 19-230). We find significant differences in the number of reviews by facial attractiveness quartile, number of languages spoken, and degree type. For example, the median number of reviews for doctors below the 25th percentile of facial attractiveness is 50, but 73 for doctors above the 75th percentile (P=.009). The median number of reviews for doctors with 1 language is 42, while doctors with ≥4 languages have a median number of reviews of 244 (P<.001). The median number of reviews for osteopathic physicians is 44 compared to 61 for allopathic physicians (P=.01). Linear regression analysis finds no association between average overall rating and number of reviews.
Multivariate Analysis
On multivariate analysis (
), being older, attending a South Asian medical school, and speaking more European and Middle Eastern languages are associated with lower odds of higher overall ratings. For example, attending a South Asian medical school is associated with 0.32 (95% CI 0.20-0.51) greater odds of being in the bottom quartile of overall ratings relative to US or Canadian graduates. Variance inflation factor analysis suggests that there does not exist a problematic amount of collinearity, and all variables have a variance inflation factor below 5. Additionally, we do not find any strong influential outliers on binned residual analysis. We find similar results when using Face++'s Age predictions (Table S3 in ) and when treating age and facial attractiveness as discrete variables (Table S6 in ). We present results analyzing overall ratings using cut points at the 75th percentile and the median in Table S7 in .Characteristics | Overalla, ORb (95% CI) | Bedside mannerc, OR (95% CI) | Wait timed, OR (95% CI) | |
Gender | ||||
Female | Reference level | Reference level | Reference level | |
Male | 1.38 (0.97-1.97) | 1.25 (0.88-1.79) | 1.13 (0.79-1.61) | |
Age quartile | ||||
Q1 (0-25) | Reference level | Reference level | Reference level | |
Q2 (25-30) | 1.12 (0.76-1.64) | 0.99 (0.67-1.45) | 0.99 (0.67-1.47) | |
Q3 (30-37) | 0.69 (0.46-1.04) | 0.66 (0.44-0.99)e | 0.71 (0.46-1.07) | |
Q4 (37-65) | 0.49 (0.30-0.81)f | 0.51 (0.31-0.84)f | 0.68 (0.41-1.12) | |
Race | ||||
White | Reference level | Reference level | Reference level | |
Asian | 1.14 (0.82-1.59) | 1.22 (0.88-1.70) | 1.12 (0.80-1.57) | |
Black | 0.98 (0.58-1.69) | 1.08 (0.64-1.88) | 0.67 (0.41-1.12) | |
Hispanic | 0.88 (0.57-1.38) | 0.94 (0.61-1.47) | 1.22 (0.77-1.99) | |
Facial attractiveness quartile | ||||
Q1 (0-47) | Reference level | Reference level | Reference level | |
Q2 (47-55) | 0.81 (0.57-1.15) | 0.87 (0.61-1.23) | 0.87 (0.61-1.23) | |
Q3 (55-64) | 0.98 (0.68-1.42) | 0.98 (0.67-1.42) | 1.07 (0.73-1.55) | |
Q4 (64-90) | 1.09 (0.73-1.62) | 0.87 (0.59-1.29) | 1.18 (0.79-1.77) | |
Top 30 Ranking | 0.68 (0.42-1.11) | 0.56 (0.35-0.91)e | 0.87 (0.53-1.43) | |
Region | ||||
United States or Canada | Reference level | Reference level | Reference level | |
Africa | 0.96 (0.39-2.55) | 0.76 (0.31-1.94) | 1.48 (0.60-4.00) | |
Caribbean | 0.93 (0.59-1.51) | 0.93 (0.59-1.51) | 1.78 (1.06, 3.10)e | |
East or Southeast Asia | 0.83 (0.37-1.95) | 1.02 (0.45-2.50) | 0.78 (0.36-1.76) | |
Europe | 1.01 (0.54-1.97) | 0.83 (0.45-1.58) | 0.73 (0.40-1.36) | |
Latin America | 0.56 (0.31-1.02) | 0.63 (0.34-1.19) | 0.73 (0.40-1.37) | |
Middle East | 0.69 (0.32-1.58) | 0.74 (0.34-1.68) | Reference level | |
Other | 0.80 (0.07-17.7) | 0.19 (0.01-2.03) | 0.87 (0.08-19.3) | |
South Asia | 0.32 (0.20-0.51)g | 0.30 (0.19-0.48)g | Reference level | |
Degree | ||||
DOh | Reference level | Reference level | Reference level | |
MDi | 0.88 (0.60-1.29) | 0.79 (0.53-1.16) | 0.77 (0.52-1.13) | |
Number of European languages | 0.78 (0.63-0.96)e | 0.77 (0.62-0.95)e | 0.62 (0.50, 0.77)g | |
Number of East or Southeast Asian languages | 0.88 (0.59-1.36) | 0.77 (0.52-1.17) | 0.78 (0.53-1.18) | |
Number of South Asian languages | 1.05 (0.87-1.29) | 1.02 (0.84-1.24) | 1.00 (0.83-1.22) | |
Number of Middle Eastern languages | 0.58 (0.36-0.94)e | 0.58 (0.36-0.94)e | 0.50 (0.30, 0.84)f | |
Number of African languages | 0.73 (0.34-1.64) | 0.84 (0.40-2.02) | 0.71 (0.33-1.53) | |
Number of Creole languages | 0.14 (0.01-1.02) | Reference level | 0.21 (0.01-1.57) |
aAkaike information criterion (AIC)=1606; Bayesian information criterion (BIC)=1749; deviance=1552; and area under the receiver operating characteristic curve (AUROC)=0.654.
bOR: odds ratio.
cAIC=1597; BIC=1739; deviance=1543; and AUROC=0.655.
dAIC=1542; BIC=1684; deviance=1488; and AUROC=0.666.
eP<.05.
fP<.01.
gP<.001.
hDO: doctor of osteopathic medicine.
iMD: doctor of medicine.
Discussion
This study aimed to examine the impact of physician characteristics, specifically gender, age, facial attractiveness, medical school ranking, foreign degree status, and number of languages spoken, on patients' ratings of primary care physicians on Zocdoc, a platform that only publishes reviews from verified patients. Our univariate findings show that older age, lower facial attractiveness, foreign degrees, allopathic degrees, and more languages spoken in office are negatively associated with the overall rating of primary care doctors. However, the effect sizes of these factors are small and may not be clinically significant. For example, we find a 0.14 gap in median overall rating between doctors with a US degree and doctors with a Latin American degree. Prior research identifies that a half-star improvement in ratings leads to a 10% increase in likelihood that a physician will fill an appointment [
]. A back-of-the-envelope calculation suggests that this 0.14 gap is equivalent to a 2.8% increase in appointments.To our knowledge, this paper is the first to examine the association between the number of languages, age, facial attractiveness, degree type, foreign graduate status, school ranking, and race with web-based physician reviews, and the first to examine the association between the number of languages and school ranking on patient satisfaction in general.
Our findings on the role of degree type and school ranking are generally consistent with past research. For example, a national telephone survey found that patients of osteopathic physicians generally reported higher rates of satisfaction than patients of allopathic physicians [
]. Past research has found that osteopathic physicians are more likely than allopathic physicians to call patients by their first name, provide information on the underlying causes of their illnesses, and have conversations with them about the social, family, and emotional implications of their medical conditions, all of which may contribute to higher satisfaction rates [ ]. In addition, past research finds little correlation between medical school ranking and patient mortality and readmission rates, suggesting an elite ranking has negligible impact on patient care [ ].While it may be easy to hypothesize reasons for differences based on age, facial attractiveness, or foreign degree status, it is not entirely clear what could drive the differences we observe with the number of languages. We find that more languages spoken is associated with a decrease in rating. The number of languages may serve as a proxy for foreign-born status, but we cannot be certain. It is possible that physicians who speak 2 languages are more likely to be bilingual US natives whereas those who speak more than 2 languages are more likely to be immigrants. Survey data have found that patients report lower satisfaction with international medical graduates [
, ]. Older Medicare patients notably have lower mortality rates when treated by international graduates compared to US graduates [ ].In addition, we did not find any numerical rating differences by race. This is inconsistent with past work on findings on a direct-to-consumer telemedicine platform in which patients report lower satisfaction with Asian and Black physicians [
]. Moreover, we did not find any numerical rating differences by gender, which is inconsistent with some previous papers on the topic [ , , , , ]. Zocdoc, however, differs from many other physician review platforms in that reviews can only come from patients after they have received care from their respective physician [ ]. On third-party independent platforms like RateMDs, Google Reviews, or Healthgrades, anyone may post a review regardless of whether they have actually seen the provider. Additionally, Zocdoc solicits reviews after each appointment which may counter the typical biases of only extremely satisfied or unsatisfied consumers leaving reviews, leading to larger, more representative samples. Lastly, Zocdoc only publishes patient reviews that do not violate their community standards barring those that include personal information, pricing specifics, profanity, claims about the accuracy of a provider’s treatment or diagnosis, or promotional content.Several limitations must be considered in interpreting the study results. First, our study only examined patient reviews of physicians on one web-based review platform and results may not generalize to other platforms. For instance, millennial women, New Yorkers, and residents of urban areas are disproportionately represented as patients on Zocdoc [
]. Moreover, while we find no differences by gender, female physicians on review platforms such as RateMDs and Google Reviews have been found to have lower numerical ratings than men [ - ]. This may be driven by a lack of verification in review postings or a lack of review moderation. Because Zocdoc moderates reviews, we cannot determine how results might change if reviews that violated community standards were included in the analysis. For example, it is possible that immigrants or women are more likely to receive lower but also more profane reviews. If Zocdoc removes these profane reviews, any disparities we are able to observe may be attenuated. In addition, because most physicians have a very high mean rating, there may be a ceiling effect that makes it difficult to discern relationships between ratings and physician factors. Second, we restrict the study to primary care physicians and the results may differ for other specialties. For example, one small study of 271 sports medicine surgeons found gender differences in ratings on 1 of 3 platform studies [ ]. Third, we limit our analysis to differences in numerical ratings. However, linguistic analyses may yield different types of bias; research on Zocdoc does find that text reviews of women physicians are more informal and emotional than reviews of men [ ]. Fourth, we rely on automated face classification software to classify facial attractiveness, race, and age. However, while past social science work may have relied on human raters to code unstructured information, many social scientists have fully moved toward using automated algorithmic procedures [ - , , , - ]. Moreover, while we do not have access to the ground truth data on race and age, patients most likely do not have access to this information and instead are influenced by perceived race and age. Fifth, the number of languages spoken in a provider’s office may correspond to the number of languages spoken by the physicians themselves or by their staff, and what this variable proxies is unclear. Sixth, the overall explanatory power of our multivariable models is fair and may be limited due to the many factors that play key roles in patient satisfaction that we are unable to observe. Seventh, our data do not permit any mechanistic or causal interpretations. Despite these limitations, our study represents an important step in understanding the potential biases in web-based doctor reviews and highlights the need for further research in this area.Although our study suggests that physician factors have a real but limited impact on numerical ratings, it is important to note that bias may play out in many forms. For example, a physician's appearance or accent may impact a patient's trust, confidence, or satisfaction with their physician, which could in turn influence their take-up of preventative services and lead to either better or worse health outcomes [
]. Such a phenomenon would not be captured in our analysis of numerical ratings, but our results open the door to investigating such phenomenon across facial attractiveness, multilingualism, education, and age more deeply. In conclusion, this study provides insights into the association between physician characteristics and patients' web-based ratings of primary care physicians. Future research should consider textual analyses of reviews, investigate how factors like facial attractiveness interact with patient outcomes, and explore whether the findings of this study generalize to other medical specialties, review platforms, or patient populations. Ultimately, our findings underscore the need for greater awareness of potential biases in web-based doctor reviews and the importance of considering a range of factors in evaluating health care providers.Data Availability
The data sets generated during or analyzed during this study are available in the Open Science Framework repository.
Conflicts of Interest
None declared.
Languages.
DOCX File , 17 KBComparison of Kairos and Face++ on Age.
DOCX File , 14 KBMultivariate Results using Face++ Age Predictions.
DOCX File , 17 KBLocations of Primary Care Doctors.
DOCX File , 15 KBMedian Outcome by Physician Demographic Factor.
DOCX File , 20 KBHigher resolution version of
. Overall primary care physician Zocdoc ratings by physician demographic factor.PDF File (Adobe PDF File), 1367 KBMultivariate Logistic Regression Results with Discrete Age and Facial Attractiveness.
DOCX File , 16 KBMultivariate Logistic Regression Results with Alternate Overall Rating Cut Point.
DOCX File , 15 KBReferences
- Hribar CA, Chandran A, Piazza M, Quinsey CS. Association between patient perception of surgeons and color of scrub attire. JAMA Surg. 2023;158(4):421-423. [FREE Full text] [CrossRef] [Medline]
- Rehman SU, Nietert PJ, Cope DW, Kilpatrick AO. What to wear today? Effect of doctor's attire on the trust and confidence of patients. Am J Med. 2005;118(11):1279-1286. [CrossRef] [Medline]
- Petrilli CM, Saint S, Jennings JJ, Caruso A, Kuhn L, Snyder A, et al. Understanding patient preference for physician attire: a cross-sectional observational study of 10 academic medical centres in the USA. BMJ Open. 2018;8(5):e021239. [FREE Full text] [CrossRef] [Medline]
- Jennings JD, Ciaravino SG, Ramsey FV, Haydel CH. Physicians' attire influences patients' perceptions in the urban outpatient orthopaedic surgery setting. Clin Orthop Relat Res. 2016;474(9):1908-1918. [FREE Full text] [CrossRef] [Medline]
- Clark M, Shuja A, Thomas A, Steinberg S, Geffen J, Malespin M, et al. Patients' perceptions of gastroenterologists' attire in the clinic and endoscopy suite. Ann Gastroenterol. 2018;31(2):237-240. [FREE Full text] [CrossRef] [Medline]
- Au S, Khandwala F, Stelfox HT. Physician attire in the intensive care unit and patient family perceptions of physician professional characteristics. JAMA Intern Med. 2013;173(6):465-467. [CrossRef] [Medline]
- Burkle CM, Keegan MT. Popularity of internet physician rating sites and their apparent influence on patients' choices of physicians. BMC Health Serv Res. 2015;15:416. [FREE Full text] [CrossRef] [Medline]
- Hanauer DA, Zheng K, Singer DC, Gebremariam A, Davis MM. Public awareness, perception, and use of online physician rating sites. JAMA. 2014;311(7):734-735. [CrossRef] [Medline]
- Yaraghi N, Wang W, Gao GG, Agarwal R. How online quality ratings influence patients' choice of medical providers: controlled experimental survey study. J Med Internet Res. 2018;20(3):e99. [FREE Full text] [CrossRef] [Medline]
- Xu Y, Armony M, Ghose A. The Interplay between Online Reviews and Physician Demand: An Empirical Investigation. Rochester, NY. SSRN Scholarly Paper; 2016.
- Luca M, Vats S. Digitizing Doctor Demand: The Impact of Online Reviews on Doctor Choice. 2013. URL: https://www.aeaweb.org/conference/2014/retrieve.php?pdfid=55 [accessed 2024-06-25]
- McGrath RJ, Priestley JL, Zhou Y, Culligan PJ. The validity of online patient ratings of physicians: analysis of physician peer reviews and patient ratings. Interact J Med Res. 2018;7(1):e8. [FREE Full text] [CrossRef] [Medline]
- Lu SF, Rui H. Can We Trust Online Physician Ratings? Evidence from Cardiac Surgeons in Florida. Rochester, NY. SSRN Scholarly Paper; 2014.
- Daskivich TJ, Houman J, Fuller G, Black JT, Kim HL, Spiegel B. Online physician ratings fail to predict actual performance on measures of quality, value, and peer review. J Am Med Inform Assoc. 2018;25(4):401-407. [FREE Full text] [CrossRef] [Medline]
- Chen J, Presson A, Zhang C, Ray D, Finlayson S, Glasgow R. Online physician review websites poorly correlate to a validated metric of patient satisfaction. J Surg Res. 2018;227:1-6. [CrossRef] [Medline]
- Widmer RJ, Maurer MJ, Nayar VR, Aase LA, Wald JT, Kotsenas AL, et al. Online physician reviews do not reflect patient satisfaction survey responses. Mayo Clin Proc. 2018;93(4):453-457. [FREE Full text] [CrossRef] [Medline]
- Greaves F, Pape UJ, Lee H, Smith DM, Darzi A, Majeed A, et al. Patients' ratings of family physician practices on the internet: usage and associations with conventional measures of quality in the English National Health Service. J Med Internet Res. 2012;14(5):e146. [FREE Full text] [CrossRef] [Medline]
- Gao GG, McCullough JS, Agarwal R, Jha AK. A changing landscape of physician quality reporting: analysis of patients' online ratings of their physicians over a 5-year period. J Med Internet Res. 2012;14(1):e38. [FREE Full text] [CrossRef] [Medline]
- Gupta S, Kayla J. Understanding gender bias toward physicians using online doctor reviews. Psychol Lang Commun. 2022;26(1):18-41. [FREE Full text] [CrossRef]
- Thawani A, Paul MJ, Sarkar U, Wallace BC. Are online reviews of physicians biased against female providers? 2019. Presented at: Proceedings of the 4th Machine Learning for Healthcare Conference; 2019:406-423; Durham, NC, USA. URL: https://proceedings.mlr.press/v106/thawani19a.html
- Barnett J, Bjarnadóttir MV, Anderson D, Chen C. Understanding gender biases and differences in web-based reviews of sanctioned physicians through a machine learning approach: mixed methods study. JMIR Form Res. 2022;6(9):e34902. [FREE Full text] [CrossRef] [Medline]
- Dunivin Z, Zadunayski L, Baskota U, Siek K, Mankoff J. Gender, soft skills, and patient experience in online physician reviews: a large-scale text analysis. J Med Internet Res. 2020;22(7):e14455. [FREE Full text] [CrossRef] [Medline]
- Nwachukwu BU, Adjei J, Trehan SK, Chang B, Amoo-Achampong K, Nguyen JT, et al. Rating a sports medicine surgeon's "quality" in the modern era: an analysis of popular physician online rating websites. HSS J. 2016;12(3):272-277. [FREE Full text] [CrossRef] [Medline]
- Takeshita J, Wang S, Loren AW, Mitra N, Shults J, Shin DB, et al. Association of racial/ethnic and gender concordance between patients and physicians with patient experience ratings. JAMA Netw Open. 2020;3(11):e2024583. [FREE Full text] [CrossRef] [Medline]
- Martinez KA, Keenan K, Rastogi R, Roufael J, Fletcher A, Rood MN, et al. Association of racial/ethnic and gender concordance between patients and physicians with patient experience ratings. J Gen Intern Med. 2020;35(9):2600-2606. [FREE Full text] [CrossRef] [Medline]
- Young JW. Symptom disclosure to male and female physicians: effects of sex, physical attractiveness, and symptom type. J Behav Med. 1979;2(2):159-169. [CrossRef] [Medline]
- Hall JA, Ruben M, Swatantra. First impressions of physicians according to their physical and social group characteristics. J Nonverbal Behav. 2020;44(2):279-299. [FREE Full text] [CrossRef]
- Hadjistavropoulos HD, Ross MA, von Baeyer CL. Are physicians' ratings of pain affected by patients' physical attractiveness? Soc Sci Med. 1990;31(1):69-72. [CrossRef] [Medline]
- Bordieri JE, Solodky ML, Mikos KA. Physical attractiveness and nurses' perceptions of pediatric patients. Nurs Res. 1985;34(1):24-26. [Medline]
- Rao AR, Clarke D. Exploring relationships between medical college rankings and performance with big data. Big Data Anal. 2019;4(1). [FREE Full text] [CrossRef]
- Eskes C, Salisbury H, Johannsson M, Chene Y. Patient satisfaction with language--concordant care. J Physician Assist Educ. 2013;24(3):14-22. [CrossRef] [Medline]
- Lopez Vera A, Thomas K, Trinh C, Nausheen F. A case study of the impact of language concordance on patient care, satisfaction, and comfort with sharing sensitive information during medical care. J Immigr Minor Health. 2023;25(6):1261-1269. [FREE Full text] [CrossRef] [Medline]
- Luca. Verified Reviews. Zocdoc. 2013. URL: https://www.zocdoc.com/about/verifiedreviews/ [accessed 2023-04-25]
- How Does Zocdoc Make Money. Zocdoc. URL: https://www.zocdoc.com/about/question/how-does-zocdoc-make-money/ [accessed 2023-04-25]
- City and Town Population Totals: 2020-2021. United States Census Bureau. Washington, DC. URL: https://www.census.gov/data/tables/time-series/demo/popest/2020s-total-cities-and-towns.html [accessed 2024-06-25]
- Powerful Web Scraper for Regular and Professional use. 2013. URL: https://webscraper.io/ [accessed 2024-06-25]
- The Best Medical Schools for Research, Ranked. U.S. News & World Report. URL: https://www.usnews.com/best-graduate-schools/top-medical-schools/research-rankings [accessed 2023-04-25]
- Benjamin E, Luca M, Svirsky D. Racial discrimination in the sharing economy: evidence from a field experiment. Am Econ J: Appl Econ. 2017:1-22. [FREE Full text] [CrossRef]
- Ouyang P, Wang J. Physician's online image and patient's choice in the online health community. Internet Res. 2022;32(6):1952-1977. [FREE Full text] [CrossRef]
- Kosinski M. Facial width-to-height ratio does not predict self-reported behavioral tendencies. Psychol Sci. Nov 2017;28(11):1675-1682. [CrossRef] [Medline]
- Jaeger B, Sleegers WW, Evans AM, Stel M, van Beest I. The effects of facial attractiveness and trustworthiness in online peer-to-peer markets. J Econ Psychol. 2019;75:102125. [FREE Full text] [CrossRef]
- Troncoso I, Luo L. Look the part? the role of profile pictures in online labor markets. Mark Sci. 2023;42(6):1080-1100. [FREE Full text] [CrossRef]
- Jaeger B, Sleegers WWA, Evans AM. Automated classification of demographics from face images: a tutorial and validation. Social & Personality Psych. 2020;14(3):e12520. [FREE Full text] [CrossRef]
- Morgan A, Shah K, Tran K, Chino F. Racial, ethnic, and gender representation in leadership positions at national cancer institute-designated cancer centers. JAMA Netw Open. 2021;4(6):e2112807. [FREE Full text] [CrossRef] [Medline]
- Alharbi M, Shihong H. A survey of incorporating affective computing for human-system co-adaptation. 2020. Presented at: Proceedings of the 2nd World Symposium on Software Engineering; 2020 September:72-79; New York, NY, USA. URL: https://doi.org/10.1145/3425329.3425343
- Goel N, Rutagarama M, Faltings B. Tackling peer-to-peer discrimination in the sharing economy. 2020. Presented at: Proceedings of the 12th ACM Conference on Web Science; 2020 July:365-361; New York, NY, USA. URL: https://doi.org/10.1145/3394231.3397926
- Mathis MS, Badewa TE, Obiarinze RN, Wilkinson LT, Martin CA. A novel use of artificial intelligence to examine diversity and hospital performance. J Surg Res. 2021;260:377-382. [CrossRef] [Medline]
- Toney C, Shroyer Mathis M, Martin C. The use of facial recognition software and published manuscripts to examine trends in surgical editorial board diversity. J Surg Res. 2023;286:104-109. [CrossRef] [Medline]
- Licciardone JC, Herron KM. Characteristics, satisfaction, and perceptions of patients receiving ambulatory healthcare from osteopathic physicians: a comparative national survey. J Am Osteopath Assoc. 2001;101(7):374-385. [Medline]
- Carey TS, Motyka TM, Garrett JM, Keller RB. Do osteopathic physicians differ in patient interaction from allopathic physicians? An empirically derived approach. J Am Osteopath Assoc. 2003;103(7):313-318. [Medline]
- Tsugawa Y, Blumenthal DM, Jha AK, Orav EJ, Jena AB. Association between physician medical school ranking and patient outcomes and costs of care: observational study. BMJ. 2018;362:k3640. [FREE Full text] [CrossRef] [Medline]
- Engelhardt KE, Matulewicz RS, DeLancey JO, Merkow RP, Quinn CM, Kreutzer L, et al. Physician characteristics associated with patient experience scores: implications for adjusting public reporting of individual physician scores. BMJ Qual Saf. 2019;28(5):412-415. [CrossRef] [Medline]
- Tsugawa Y, Jena AB, Orav EJ, Jha AK. Quality of care delivered by general internists in US hospitals who graduated from foreign versus US medical schools: observational study. BMJ. 2017;356:j273. [FREE Full text] [CrossRef] [Medline]
- Kauff M, Anslinger J, Christ O, Niemann M, Geierhos M, Huster L. Ethnic and gender-based prejudice towards medical doctors? The relationship between physicians' ethnicity, gender, and ratings on a physician rating website. J Soc Psychol. 2022;162(5):540-548. [CrossRef] [Medline]
- Fisher T. Do women prefer female doctors? it depends. In: The Paper Gown. New York. Zocdoc; 2018.
- Sehgal NKR, Brownstein JS, Majumder MS, Tuli G. US COVID-19 clinical trial leadership gender disparities. Lancet Digit Health. 2023;5(3):e109-e111. [FREE Full text] [CrossRef] [Medline]
- Peng L, Cui G, Chung Y, Zheng W. The faces of success: beauty and ugliness premiums in e-commerce platforms. J. Mark. 2020;84(4):67-85. [FREE Full text] [CrossRef]
- Dietl H, Özdemir A, Rendall A. The role of facial attractiveness in tennis TV-viewership. Sport Manag Rev. 2020;23(3):521-535. [FREE Full text] [CrossRef]
- Hyunkyu J. Judging an airbnb booking by its cover: how profile photos affect guest ratings. JCM. 2022;39(4):371-382. [FREE Full text] [CrossRef]
- Frakes MD, Gruber J. Racial concordance and the quality of medical care: evidence from the military. In: Nber working paper series. Cambridge, MA. National Bureau of Economic Research; 2022.
Edited by A Mavragani; submitted 07.08.23; peer-reviewed by K Jordan, K Martinez; comments to author 11.11.23; revised version received 17.11.23; accepted 12.06.24; published 29.07.24.
Copyright©Neil K R Sehgal, Benjamin Rader, John S Brownstein. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 29.07.2024.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research (ISSN 1438-8871), is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.