Health Equity in Artificial Intelligence and Primary Care Research: Protocol for a Scoping Review

Background Though artificial intelligence (AI) has the potential to augment the patient-physician relationship in primary care, bias in intelligent health care systems has the potential to differentially impact vulnerable patient populations. Objective The purpose of this scoping review is to summarize the extent to which AI systems in primary care examine the inherent bias toward or against vulnerable populations and appraise how these systems have mitigated the impact of such biases during their development. Methods We will conduct a search update from an existing scoping review to identify studies on AI and primary care in the following databases: Medline-OVID, Embase, CINAHL, Cochrane Library, Web of Science, Scopus, IEEE Xplore, ACM Digital Library, MathSciNet, AAAI, and arXiv. Two screeners will independently review all abstracts, titles, and full-text articles. The team will extract data using a structured data extraction form and synthesize the results in accordance with PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews) guidelines. Results This review will provide an assessment of the current state of health care equity within AI for primary care. Specifically, we will identify the degree to which vulnerable patients have been included, assess how bias is interpreted and documented, and understand the extent to which harmful biases are addressed. As of October 2020, the scoping review is in the title- and abstract-screening stage. The results are expected to be submitted for publication in fall 2021. Conclusions AI applications in primary care are becoming an increasingly common tool in health care delivery and in preventative care efforts for underserved populations. This scoping review would potentially show the extent to which studies on AI in primary care employ a health equity lens and take steps to mitigate bias. International Registered Report Identifier (IRRID) PRR1-10.2196/27799


Rationale 3
Describe the rationale for the review in the context of what is already known. Explain why the review questions/objectives lend themselves to a scoping review approach.

1-2
Objectives 4 Provide an explicit statement of the questions and objectives being addressed with reference to their key elements (e.g., population or participants, concepts, and context) or other relevant key elements used to conceptualize the review questions and/or objectives.

Protocol and registration 5
Indicate whether a review protocol exists; state if and where it can be accessed (e.g., a Web address); and if available, provide registration information, including the registration number.
2 Eligibility criteria 6 Specify characteristics of the sources of evidence used as eligibility criteria (e.g., years considered, language, and publication status), and provide a rationale.

2,3
Information sources* 7 Describe all information sources in the search (e.g., databases with dates of coverage and contact with authors to identify additional sources), as well as the date the most recent search was executed.
2 Search 8 Present the full electronic search strategy for at least 1 database, including any limits used, such that it could be repeated.

Supplemental
Appendix 2 Selection of sources of evidence † 9 State the process for selecting sources of evidence (i.e., screening and eligibility) included in the scoping review.

2,3
Data charting process ‡ 10 Describe the methods of charting data from the included sources of evidence (e.g., calibrated forms or forms that have been tested by the team before their use, and whether data charting was done independently or in duplicate) and any 3 SECTION ITEM PRISMA-ScR CHECKLIST ITEM REPORTED ON PAGE # processes for obtaining and confirming data from investigators.

Data items 11
List and define all variables for which data were sought and any assumptions and simplifications made.

3, Supplemental Appendix 3
Critical appraisal of individual sources of evidence §

12
If done, provide a rationale for conducting a critical appraisal of included sources of evidence; describe the methods used and how this information was used in any data synthesis (if appropriate).
NA Synthesis of results 13 Describe the methods of handling and summarizing the data that were charted. 3

Selection of sources of evidence 14
Give numbers of sources of evidence screened, assessed for eligibility, and included in the review, with reasons for exclusions at each stage, ideally using a flow diagram.
3, Figure 1 Characteristics of sources of evidence 15 For each source of evidence, present characteristics for which data were charted and provide the citations.
3-5; Summarize the main results (including an overview of concepts, themes, and types of evidence available), link to the review questions and objectives, and consider the relevance to key groups.

5,6
Limitations 20 Discuss the limitations of the scoping review process. 6

Conclusions 21
Provide a general interpretation of the results with respect to the review questions and objectives, as well as potential implications and/or next steps. 5,6,7

Funding 22
Describe sources of funding for the included sources of evidence, as well as sources of funding for the scoping review. Describe the role of the funders of the scoping review. 7 JBI = Joanna Briggs Institute; PRISMA-ScR = Preferred Reporting Items for Systematic reviews and Meta-Analyses extension for Scoping Reviews. * Where sources of evidence (see second footnote) are compiled from, such as bibliographic databases, social media platforms, and Web sites. † A more inclusive/heterogeneous term used to account for the different types of evidence or data sources (e.g., quantitative and/or qualitative research, expert opinion, and policy documents) that may be eligible in a scoping review as opposed to only studies. This is not to be confused with information sources (see first footnote). ‡ The frameworks by Arksey and O'Malley (6) and Levac and colleagues (7) and the JBI guidance (4, 5) refer to the process of data extraction in a scoping review as data charting. § The process of systematically examining research evidence to assess its validity, results, and relevance before using it to inform a decision. This term is used for items 12 and 19 instead of "risk of bias" (which is more applicable to systematic reviews of interventions) to include and acknowledge the various sources of evidence that may be used in a scoping review (e.g., quantitative and/or qualitative research, expert opinion, and policy document).

Develop initial search strategy
• Use harvested terms to inform comprehensive search strategy in conjunction with topic area knowledge and discussion amongst reviewers.

Pilot test search strategy in health and computer science databases
• Revise strategy in an iterative fashion to balance comprehensiveness with feasibility.

SUPPLEMENTAL APPENDIX 3: ADDITIONAL METHODS & RESULTS
1. Diagnostic Decision Support: AI provided information to inform diagnosis, such as the probability that a patient has a particular condition.
2. Treatment Decision Support: AI provided information to inform treatment decisions, whereby treatment was interpreted broadly to include any management or care provided (or absence of unnecessary actions) to someone with the health condition(s) or symptom(s) of interest.
3. Referral Support: AI provided information to support decisions about referring patients to specialist services or AI assisted with technical aspects of the referral process.
4. Future State Prediction: AI provided predictions towards future events, for example utilization of emergency department, development of a health condition, or prognosis for an existing condition.

Health Care Utilization Analyses:
AI provided information about interactions with or processes within health care systems, for example frequency or quantity of patient visits.

Knowledge Base and Ontology Construction or Use:
Construction or use of knowledge bases or ontologies including PC concepts.
7. Information Extraction: AI used to extract knowledge from structured or unstructured data (e.g. electronic medical records) for further use.
8. Descriptive Information Provision: AI used to summarize data in a meaningful way for human interpretation, for example prevalence of a condition or patterns of patient profiles.
9. Other (specified): The PC function was not represented by the above categories; specifics were recorded.

Author Reported Intended End-User(s)
People who the research or research end-product was stated as intended for, regardless of whether those intended end users were involved with the research or how close the research was to being applicable for those users in practice setting: Patient, Physician, Nurse, Nurse Practitioner, Administrator, Researcher, Other (specified), or Unknown. If the study was developing a deployable AI method or tool (broadly defined) but more research was needed before the AI method of interest would be ready to implement or be utilized by its intended end user, Researcher was included as a target end user.

Target Health Condition(s)
The health condition of interest as stated by the study authors or inferred by reviewers, or Unknown if no condition was stated or inferable. Conditions were extracted in full form and MZ later organized them into 27 and 10 category formats. When a study intended for AI to be applicable for all health conditions "General" was used; specifics about any test conditions were also extracted.

Location of Data Source(s) or intended location of implementation
Country or next level of granularity where data were collected, or the geographical location where the study stated implementation would occur. Unknown was used when the location of data source was not stated or when all data were simulated.

Subfield(s) of Artificial Intelligence
Artificial Intelligence methods were organized according to 10 subfields; a single study may include one or more subfields: 1. Bayesian Network: Graphical models (directed acyclic graphs) used to describe dependency relationships among variables that enable the efficient representation of multivariate probability distributions. The resulting distributions can be queried to find the probability of an event occurring given a particular set of evidence. Bayesian networks can be developed manually, such as from physician input, learned from data, or created using a combination of the two. 3. Data Mining: The process of eliciting information from collections of data, such as by finding and counting pattern occurrences using inferential algorithms; humans may then interpret these patterns. For example, Soler et al. (2015) used data mining on electronic medical records to identify relationships between reasons for encounter and diagnoses recorded for the corresponding visit. 83 We did not consider extracting information in a structured way, such as using a database query to get a basic count of disease X diagnoses, to be the type of data mining that falls under the umbrella of artificial intelligence.

Expert System:
Consists of two parts: 1) a knowledge base that contains facts and rules, such as if-then statements derived from medical guidelines and 2) an inference engine that uses the knowledge base to arrive at conclusions or answers to questions. Note: Each study fulfills one or more appointment type categories; each category is counted a maximum of one time for any given study.