Original Paper
Abstract
Background: The COVID-19 pandemic has imposed a large, initially uncontrollable, public health crisis both in the United States and across the world, with experts looking to vaccines as the ultimate mechanism of defense. The development and deployment of COVID-19 vaccines have been rapidly advancing via global efforts. Hence, it is crucial for governments, public health officials, and policy makers to understand public attitudes and opinions towards vaccines, such that effective interventions and educational campaigns can be designed to promote vaccine acceptance.
Objective: The aim of this study was to investigate public opinion and perception on COVID-19 vaccines in the United States. We investigated the spatiotemporal trends of public sentiment and emotion towards COVID-19 vaccines and analyzed how such trends relate to popular topics found on Twitter.
Methods: We collected over 300,000 geotagged tweets in the United States from March 1, 2020 to February 28, 2021. We examined the spatiotemporal patterns of public sentiment and emotion over time at both national and state scales and identified 3 phases along the pandemic timeline with sharp changes in public sentiment and emotion. Using sentiment analysis, emotion analysis (with cloud mapping of keywords), and topic modeling, we further identified 11 key events and major topics as the potential drivers to such changes.
Results: An increasing trend in positive sentiment in conjunction with a decrease in negative sentiment were generally observed in most states, reflecting the rising confidence and anticipation of the public towards vaccines. The overall tendency of the 8 types of emotion implies that the public trusts and anticipates the vaccine. This is accompanied by a mixture of fear, sadness, and anger. Critical social or international events or announcements by political leaders and authorities may have potential impacts on public opinion towards vaccines. These factors help identify underlying themes and validate insights from the analysis.
Conclusions: The analyses of near real-time social media big data benefit public health authorities by enabling them to monitor public attitudes and opinions towards vaccine-related information in a geo-aware manner, address the concerns of vaccine skeptics, and promote the confidence that individuals within a certain region or community have towards vaccines.
doi:10.2196/30854
Keywords
Introduction
As of May 21, 2021, the COVID-19 pandemic had led to more than 160 million confirmed cases and more than 3 million deaths worldwide [
]. COVID-19 has continued to spread worldwide due to its highly contagious nature, diverse variants, and the mass public’s inconsistent adherence to effective public health measures, such as wearing masks and maintaining social distance [ ]. Meanwhile, the emergence of asymptomatic cases (which are difficult to detect) has become more frequent, potentially leading to a substantial accumulation in the number of infections over time [ ]. As such, it is important to keep COVID-19 vaccines widely available and accessible [ ].Since January 2020, scientists and medical experts around the world have been developing and testing COVID-19 vaccines; 16 vaccines have been approved for emergency use around the world so far, but the progress of vaccination has been subject to hesitancy, distrust, and debate. Vaccine hesitancy was identified by the World Health Organization as one of the top 10 global health threats in 2019 [
]. In many countries, such hesitancy, along with vaccine misinformation, have presented substantial obstacles towards vaccinating a sufficient amount of the population in order to establish herd immunity [ , ].Therefore, it is crucial for governments, public health officials, and policy makers to understand the potential drivers that affect public opinion towards COVID-19 vaccines [
]. A number of campaigns against antivaccination activists have been made through multiple channels since January 2020. Notably, the accelerated pace of vaccine development has further heightened public anxieties and could compromise the public’s acceptance of the vaccine [ ]. However, this acceptance varies across geographic contexts and the pandemic timeline. As governments put more effort into developing strategies for promoting vaccine acceptance and uptake, the key questions regarding the willingness to be vaccinated persist — what are the public’s opinions and perceptions towards COVID-19 vaccines and what are the potential drivers that affect such opinions?The internet and social media have provided rich user-generated data sources, in the form of infodemiology studies [
], in real time for performing public health surveillance [ ]. Social media, especially Twitter, have been considered as major channels for the distribution of health information and opinion exchange, helping people to make intelligent decisions [ , ]. The analysis of big data derived from Twitter has been an emerging trend in recent COVID-19 vaccine–related studies. Geotagged tweets (hereinafter termed as geotweets) provide a rich volume of cost-effective content, including news, events, public comments, and the locational information of Twitter users. Through sentiment analysis and topic modeling methods that have been widely used in existing studies, qualitative tweet contents can be retrieved to reflect public opinions and attitudes towards COVID-19 vaccines. Additionally, users’ location information enables researchers to investigate the spatiotemporal patterns of the public’s opinions and attitudes. In general, existing studies have investigated people’s reactions towards COVID-19 vaccines, with a geographical emphasis on the United States [ - ]. Some papers have also studied other countries in the world, including China [ ], South Africa [ ], Australia [ ], the United Kingdom [ , ], Canada [ ], and Africa [ ], and to a global scale [ ]. However, the study period of these works is relatively limited to or predominantly focused on the early stage of the pandemic or up to the end of 2020. None of these studies cover early 2021, the period of implementing mass systemic vaccine distribution. Furthermore, although sentiment analysis and topic modeling have been broadly applied, what remains less explored are the potential drivers that induce a change in public sentiment and opinion on vaccines, such as important events and announcements by political leaders (eg, the propaganda of vaccine success or vaccine conspiracy theories). There is a pressing need to investigate public opinion towards COVID-19 vaccines across a longer timeline and to explore the potential drivers that influence the change in such opinion over time.To address these knowledge gaps, this study aimed to analyze the spatiotemporal patterns of public sentiment and emotion and explore the keywords and major topics of tweets regarding COVID-19 vaccines that were tweeted by Twitter users. Drawing on more than 300,000 geotweets from March 1, 2020 to February 28, 2021 in the United States, we employed sentiment and emotion analysis at both the national and state levels. We identified 3 phases along the pandemic timeline that display sharp changes in public sentiment and emotion. Using cloud mapping of keywords and topic modeling, we identified 11 key events and major topics as the potential drivers that induced such changes. Findings from this study can help governments, policymakers, and public health officials understand factors that motivate and cause hesitance in the public towards vaccination. With this understanding, these entities can better design potential interventions during their vaccination campaigns.
Methods
Data
Using the Twitter streaming Application Programming Interface (API), the Harvard Center for Geographic Analysis collected geotweets from March 1, 2020 to February 28, 2021. Geotweets provide the location information of user-defined places. If users activate the GPS function in Twitter, their longitude and latitude are provided. We used the keyword “vaccin*” to query vaccine-related tweets, generating a total of 308,755 geotweets. In the results, 1.43% (44,118/308,755) of geotweets’ geographic locations are at a state level (ie, Massachusetts, United States), and others are geocoded at a city level (ie, Cambridge, MA) or at a finer geographical level (ie, Uptown Coffee, Oxford, MS). We then conducted a series of data preprocessing of the geotweets’ contents. First, we generalized the variations of COVID-related terms to “COVID-19,” including “corona,” “covid,” “covid19,” and “coronavirus”; second, we removed unrelated website links from the search results, including links starting with the fragment of “https”; third, we removed punctuation (eg, period, question mark, comma, colon, and ellipsis) and other key symbols (eg, bracket, single and double quotes) and converted capital letters into lower-case letters; fourth, we removed inflectional endings (eg, “ly”) and reverted words to their root or dictionary form (eg, “peopl” from people, “dai” from daily, and “viru” from virus), by employing the word lemmatization function provided in the Python package Natural Language Toolkit 3.6.2 [
].Methodology
To explore the spatiotemporal patterns of public sentiment and emotion towards COVID-19 vaccines, we conducted 4 sets of analyses, including sentiment analysis, emotion analysis, topic modeling, and word cloud mapping. For the sentiment analysis, we applied Valence Aware Dictionary for Sentiment Reasoning (VADER), a well-known rule-based model, to estimate sentiment compound scores [
]. The sentiment compound score is computed by summing the score of each word in the lexicon, adjusted according to the rules. The rules embody grammatical and syntactical conventions for expressing and emphasizing sentiment intensity. Then, the score is normalized to be between –1 (most extreme negative) and +1 (most extreme positive). To reclassify sentences as positive, neutral, or negative sentiment, threshold values are set as follows: A tweet with a compound score larger than 0.05 is classified as positive sentiment; a tweet with a compound score smaller than –0.05 is classified as negative sentiment; otherwise, it is classified as neutral sentiment [ ]. We then cross-tabulated the 3 types of sentiment on daily and weekly bases with the number of geotweets. We generated line graphs at the national level and in the top 10 states with the largest number of geotweets.Different from sentiment analysis, which detects positive, neutral, or negative feelings from tweet contents, emotion analysis aims to recognize the types of feelings more specifically through the content expression, such as anger, fear, and happiness. The emotion analysis of this study was performed based on the National Research Council Canada Lexicon (NRCLex) [
]. NRCLex examines 4 pairs of primary bipolar emotions: joy (feeling happy) versus sadness (feeling sad); anger (feeling angry) versus fear (feeling of being afraid); trust (stronger admiration and weaker acceptance) versus disgust (feeling something is wrong or nasty); and surprise (being unprepared for something) versus anticipation (looking forward positively to something). We then examined the temporal patterns of these 8 types of emotion at both national and state levels.In order to investigate the potential drivers of such changes, we applied the Latent Dirichlet Allocation (LDA) model [
] to detect popular topics based on a certain number of key dates as the turning points of sentiment scores or with a sharp change in the number of geotweets. The LDA model generates automatic summaries of topics in terms of a discrete probability distribution over words for each topic and further infers per-document discrete distributions over topics [ ]. Each topic is treated as a cluster, and each document is assigned to a cluster that represents its dominant topic. LDA is an unsupervised algorithm [ ], meaning that, prior to running the model, users need to predefine the number of topics. To estimate the optimal number of topics, we used the Python package [ ] and pyLDAvis [ ] to compare the results with topic numbers from 3 to 10 and found that the smallest overlap among topics occurs when the topic number is 3. We further visualized the topic modeling results in bar graphs with the Y-axis, which indicates the top 10 keywords associated with that topic, and the X-axis, which shows the weight of each keyword (to reveal the extent to which a certain keyword contributes to that topic). Based on the top 10 most relevant keywords to each topic, we generalized and presented the name of each topic at the bottom of each graph.We then categorized the study period into 3 phases based on 2 iconic events: the results of Phase 1 clinical trials by Moderna that were published in The New England Journal of Medicine on July 14, 2020 [
] and the first COVID-19 vaccine shots that were given in the United States on December 14, 2020 [ ]. Phase 1, dating from March 1, 2020 to July 13, 2020, is the stage in which the public was waiting for official announcements regarding the effectiveness of COVID-19 vaccines; Phase 2, ranging from July 14, 2020 to December 13, 2020, is when the positive news of COVID-19 vaccine development began to arrive; and Phase 3 starts from December 14, 2020, when the first vaccine shots were given in the United States. We then aggregated sentiment scores at the state level and analyzed the changes in sentiment over the 3 phases in the top 10 states. Finally, we produced word cloud maps over the 3 predefined phases based on the frequency of keywords appearing in Tweet contents, with the size of a keyword reflecting its frequency and popularity.Results
Sentiment Analysis and Topic Modeling
shows the overall trends in the weekly sentiment scores, unveiling the increased positive attitude towards COVID-19 vaccines within the study period. We identified 11 key dates as turning points in sentiment scores or in the number of geotweets. Correspondingly, a total of 33 topics on these 11 key dates are summarized and presented in - . In Phase 1, changes in the sentiment score were relatively stable, except for a sharp drop on June 21, 2020. This drop could have resulted from the misinformation and conspiracy theories related to Bill Gates. Vaccine-adverse conspiracy related to Gates claimed that the pandemic is a cover for his plan to implant trackable microchips made by Microsoft [ ]. Topic modeling suggests that Gates was referred to as “satan,” “terrorist,” and “evil” on that day ( ).
In Phase 2, the first stimulus was observed on July 14, 2020, when the results of Phase 1 clinical trials by Moderna were published [
]. However, we did not observe a dramatic change in sentiment score until July 15, 2020, when Donald Trump tweeted “Great News on Vaccines!” [ ]. Topic modeling suggests that keywords related to “good,” “trial,” “promis,” and “test” were widely discussed on July 15, 2020 ( ). Speculation suggests that, compared to key events in the development of COVID-19 vaccines, comments from public figures on vaccination could trigger bigger changes in public sentiment.Another sharp increase in sentiment score was observed on July 22, 2020, when the partnership between Pfizer and the US government accelerated the production and delivery of 100 million doses of COVID-19 vaccines [
]. The keywords “pfizer,” “govern,” and “million” were widely discussed and identified through topic modeling ( ). On August 20, 2020, the sentiment score dropped dramatically after Kamala Harris formally accepted the Democrats’ vice-presidential nomination at the 2020 Democratic National Convention. Harris advocated, “There is no vaccine for racism,” mentioning the context of the racism protests for George Floyd and Breonna Taylor [ ]. Of the keywords, “racism” and “kamala” were observed through topic modeling. Another increase in sentiment score appeared on November 9, 2020, when Pfizer announced that its vaccine is 90% effective ( ) [ ]. On the same day, Trump tweeted “STOCK MARKET UP BIG, VACCINE COMING SOON. REPORT 90% EFFECTIVE. SUCH GREAT NEWS!” Amid positive news from Pfizer, people questioned whether Pfizer purposefully released study results after Election Day, though Pfizer’s CEO claimed that the release timing had nothing to do with politics [ ]. On that day, widely discussed keywords included “trump,” “pfizer,” and “elect” ( ).In Phase 3 on December 14, 2020, an increased sentiment score was observed when an intensive care unit nurse received the first COVID-19 vaccine in New York. On the same day, the Electoral College voted to cement Biden’s victory over Trump. Discussion regarding COVID-19 vaccines (“pfizer,” “nurs,” “receive”) quickly increased on Twitter, while other related discussions regarding mask wearing (“wear” and “mask”) and the presidential election (“house,” “trump,” “biden”) remained popular (
). By December 18, 2020, the sentiment score remained high as both Pfizer and Moderna were authorized for emergency use by the US Food and Drug Administration [ ]. Trump tweeted “Moderna vaccine overwhelmingly approved. Distribution to start immediately.” Additionally, the fact that former Vice President Pence and second lady Karen Pence received a COVID-19 vaccine [ ] was widely discussed (“penc” and “receiv”). Expectations for the COVID-19 vaccines were also discussed (“need” and “want”; ). On January 30, 2021, the Department of Defense paused a plan to give COVID-19 vaccines to detainees in the Guantanamo Bay prison camp [ ], which raised queries of COVID-19 vaccine delivery, leading to a moderate decrease in the sentiment score. Keywords were observed, including “terrorist” and “distribut” through topic modeling ( ). On February 12, 2021, an increased sentiment score was observed after the Biden administration announced the purchase of 200 million COVID-19 vaccine doses from Pfizer and Moderna [ ]. Discussion surrounding the administration of COVID-19 vaccines was extensive (“wait,” “get,” “need”; ). Topic modeling also suggests that complaints were pervasive (“teacher,” “school,” and “get”; ) because teachers were not prioritized for vaccination in states despite the Center for Disease Control and Prevention’s recommendation.We then broke down the sentiment scores by state in tandem, along with the pandemic timeline. We present the results in the top 10 states with the largest number of geotweets (
), including California, New York, Texas, Florida, Illinois, Ohio, North Carolina, Pennsylvania, Georgia, and Virginia. The temporal patterns in sentiment scores vary across states, with more obvious fluctuations before November 2020 in Illinois, Ohio, North Carolina, Georgia, Pennsylvania, and Virginia. A number of sharp decreases in sentiment scores was observed in June 2020 in Illinois, North Carolina, Ohio, Pennsylvania, Georgia, and Virginia, in line with the tendency of sentiment drops at the national level. The states with relatively larger numbers of geotweets (ie, California, New York, Texas, and Florida) had more stable temporal trends and sentiment scores compared with the states with relatively smaller numbers of geotweets (eg, Ohio, North Carolina, Pennsylvania, Georgia, and Virginia).We further examined the absolute values of the average positive and negative sentiment scores by states in
. In the majority of the states, the absolute positive sentiment score was larger than that of the negative sentiment score. The difference between the positive and negative sentiment scores was relatively more obvious in the mainland states of Alabama, Utah, Nebraska, Minnesota, and West Virginia (highlighted in dark grey in ), as well as in Hawaii and Alaska; the potential drivers triggering such differences across states may either relate to information or news spreading locally or be subject to the variations caused by the different sampling size in each state.The changes in positive and negative sentiment scores over 2 periods of time (Phase 1 to Phase 2; Phase 2 to Phase 3) were compared and are presented in
. From Phase 1 to Phase 2, an increase in positive sentiment scores (orange bars) appeared in most states, most obviously in South Dakota, followed by North Dakota and Arkansas; meanwhile, a decrease in negative sentiment scores (dark blue bars) was also observed in the majority of states, most obviously in South Dakota and Rhode Island, followed by Montana, North Dakota, and Arkansas. From Phase 2 to Phase 3, the decrease in negative sentiment scores (light blue bars) appearred in most states, most obviously in Idaho and Rhode Island, followed by North Dakota, Vermont, and New Hampshire. However, the change in positive sentiment scores (red bars) from Phase 2 to Phase 3 varied across states, with a slight increase that is more obviously observed in Idaho, North Dakota, and New Mexico, while a slight decrease is more obviously observed in South Dakota, Rhode Island, and Connecticut. In addition, the magnitude of both positive and negative sentiment scores from Phase 1 to Phase 2 (the height of dark blue and orange bars) was more obvious in most states than that of Phase 2 to Phase 3 (the height of light blue and red bars). This indicates that the fluctuation in people’s opinions towards vaccines became less obvious with the gradual development of vaccines and more encouraging news.Emotion Analysis
shows the temporal patterns in the 8 types of emotion, including joy, trust, anticipation, trust, surprise, disgust, sadness, and fear. Through the vertical comparison of the weekly average trend lines (dashed lines), we found that the emotion with the highest weekly average scores along the majority of the timeline was trust (blue dashed line), followed by fear, anticipation, sadness, anger, joy, disgust, and surprise. It is worth noting that the weekly average emotion score of fear was higher than that of trust before mid April 2020, possibly due to rapid COVID-19 infection and ineffective control of viral spread at the early stage of the pandemic. These events may have caused fear, uncertainty, or even feelings of panic [ ]. Although fluctuations in emotion scores (eg, local peaks and valleys) can be found within each of the 8 emotions, the general trend implies that the public’s trust in and anticipation towards vaccination were accompanied by a mixture of fear, sadness, and anger.
We further investigated the relative distributions of 8 emotions in each state, as indicated by the percentage of emotion scores for each type with different colors (
). The overall patterns of the 8 emotions are consistent across most states. Throughout the entire timeline and in each of the 3 phases of the pandemic, trust was the dominant emotion towards vaccination over the full timeline of the pandemic. It was followed by anticipation, fear, sadness, anger, joy, disgust, and surprise. The state-level patterns largely align with the national pattern as depicted in , although there are some exceptions, such as fear outweighing anticipation, joy, and trust (eg, Washington) and with fear, anger, and sadness outweighing other emotions (eg, Maine). As shown in and , the emotion of trust stayed consistent over time, while the changes in trends for other types of emotion were distinct across phases and by state.We further compared the change in the percentage of emotions over 2 periods of time (Phase 1 to Phase 2; Phase 2 to Phase 3). From Phase 1 to Phase 2 (
), a decrease in fear (dark blue bars) was observed in most states, though its magnitude varied across states. This decrease was most obvious in South Dakota, followed by North Dakota, Arkansas, Mississippi, North Carolina, and South Carolina. The changes in anger, sadness, and disgust varied across states, with a general decrease in most states but sporadic increases in others (eg, Idaho, New Mexico, and New Hampshire). Furthermore, the combination of a decrease in fear and an increase in joy, trust, and anticipation was observed in most states except South Dakota. Throughout the period from Phase 2 to Phase 3 ( ), it is difficult to generalize the pattern of emotion change across states in terms of type and magnitude. An increase in joy, trust, anticipation, and surprise along with a decrease in fear, anger, sadness, and disgust were the most notable (high bars) in Idaho and Rhode Island, followed by Missouri, Vermont, and New Hampshire. On the contrary, some states encountered a decrease in trust and anticipation in tandem with an increase in anger and sadness, including South Dakota, North Dakota, Montana, Kansas, Indiana, Maine, and Delaware. The complexity of emotion changes from Phase 2 to Phase 3 varied across states, reflecting the diversity in people’s opinions and psychological reactions to vaccination, which should be subject to an in-depth investigation of causality.Word Cloud Visualization
We produced word cloud mappings of 50 popular words associated with positive and negative sentiments over the 3 phases (
). The size of a word represents its popularity and the frequency with which it appears in tweets. Among the words associated with positive sentiment, the popular ones were “hope,” “help,” “thank,” “love,” “safe,” “cure,” and “free,” although the word “peopl,” with a more neutral nature, appears to be the most popular. Throughout the 3 phases, “hope,” “safe,” and “thank” grew larger from Phase 1 to Phase 3; in particular, “thank” became the most popular word in Phase 3. On the contrary, “flu,” “death,” “trump,” “fuck,” “lie,” “die,” “kill,” “shit,” and “stupid” were popular words associated with negative sentiment. Over the 3 phases, “flu” became smaller from Phase 1 to Phase 3 whereas “die,” “fuck,” “shit,” and “trump” evolved to be larger from Phase 1 to Phase 3; in particular, “trump” became predominant in Phase 2 possibly due to Trump’s increasing popularity caused by the 2020 Presidential Election. More specifically, while people were waiting for the news of COVID-19 vaccine development during Phase 1, their uncertainties on potential vaccines were reflected in the included keywords, which were related to the coronavirus and public’s frustration of the pandemic (eg, “viru,” “death,” “cure,” and “test”). Some keywords related to the COVID-19 vaccine were also observed, including “hope” and “develop.” Positive news about the development of COVID-19 vaccines appeared in Phase 2, which brought hope as well as misinformation regarding the vaccines to the public. At this stage, more specific information about COVID-19 vaccines was discussed (eg, “Pfizer,” “effect,” “risk,” “develop,” and “approve”), as compared to Phase 1. With Pfizer and Moderna vaccines approved during Phase 3, the public’s attention moved from vaccine development towards vaccine distribution (“distribution,” “wait,” and “free”), effectiveness (“safe” and “risk”), and priority (“teacher”). In all 3 phases, public figures (eg, “Trump,” “Biden,” and “Bill Gates”) contributed to hot topics with impacts on both positive and negative sentiments.Discussion
Principal Findings
Drawing on geotweets from March 1, 2020 to February 28, 2021, this study examined public opinion on COVID-19 vaccines in the United States, by unveiling the spatiotemporal patterns of public sentiment and emotion over time, modeling the popular keywords and topics of Twitter contents, and analyzing the potential drivers of public opinion on vaccines. Our findings indicate that critical social or international events or announcements by political leaders and authorities may have potential impacts on public opinion towards COVID-19 vaccines. Such examples include the vaccine-adverse conspiracy related to Bill Gates on June 21, 2020, the tweet by Donald Trump of “Great News on Vaccines!” on July 14, 2020, Kamala Harris’s advocacy of “There is no vaccine for racism” on August 20, 2020, Biden’s victory of the presidential election over Trump on December 14, 2020, and the authorized emergent usage of Pfizer and Moderna on December 18, 2020. In the proposed 3 phases over the study timeframe, changes in public opinions on vaccines varied across space and time. More specifically, the fluctuation in people’s sentimental response to the vaccine during the earlier stage of the pandemic was more obvious compared to that in the later stage of the pandemic. However, an increase in positive sentiment in parallel with a decrease in negative sentiment were generally observed in most states, reflecting the rising confidence and anticipation of the public towards COVID-19 vaccines. Furthermore, the public’s 8 types of emotion towards the COVID-19 vaccine displayed a general trend of a combination of trust and anticipation with a mixture of fear, sadness, and anger. Moreover, the word cloud mapping showed that positive keywords including “hope,” “safe,” and “thank” grew larger from Phase 1 to Phase 3; in particular, “thank” became the most popular word in Phase 3, indicating the public’s increasingly positive response towards vaccination. In all 3 phases, public figures (eg, “Trump,” “Biden,” and “Bill Gates”) contributed to the most popular topics, impacting both positive and negative sentiments. The aforementioned findings reveal the diversity and complexity of people’s perception on and their psychological reaction towards COVID-19 vaccines, which indicates a further need to be cautious in the interpretation of analytical outcomes and to initiate an additional in-depth investigation of the causality.
Our findings are partially supported by the current literature. Hussain et al [
] observed a marked increase in positive sentiment toward COVID-19 vaccines in the United States from March 1, 2020 to November 22, 2020. Guntuku et al [ ] and Roy and Ghosh [ ] found that Republican legislators became more engaged in public discussion on vaccine progress, which may have implications for COVID-19 vaccine uptake among their followers. Germani and Biller-Andorno [ ] revealed that antivaccination supporters have been heavily engaged in discussions and dissemination of misinformation and conspiracy theories. Considering the limitations (ie, random sample) inherent in Twitter data, it is important to propose alternative data that provide a complementary understanding of public opinions towards the COVID-19 vaccine to promote vaccination in the United States.Implications and Recommendations
The emergence of the internet and social media has provided new platforms for persuasion and the rapid spread of (mis)information, which leads to new opportunities for and challenges to the communication of vaccine information [
]. There are over 4.3 billion people using the internet nowadays, with 3.8 billion of these individuals as social media users [ ]. The popularity of social media platforms coupled with the advent of digital detection strategies benefit public health authorities by enabling the monitoring of public sentiment towards vaccine-relevant information in a geo-aware, (near) real-time manner. This can inform more effective policymaking and promote participatory dialogue to establish confidence towards vaccines, in order to maximize vaccine uptake. Some of our findings add new value to the current scholarship and also provide new insights and suggestions for policy implications with regard to safeguarding societal and economic health.First, our findings indicate that public figures, especially politicians, play a crucial role in impacting the public’s opinions on vaccination. Negative opinions expressed by public figures about a vaccine could impact a large population of people, especially those who do not hold an unswayable opinion [
]. People tend to believe public figures’ opinions, as they are elected officials who can influence health care systems and are perceived to have more information about a vaccine [ , ]. Thus, public figures have a responsibility to disseminate accurate health information and should be cautious in expressing their opinions in public. This also highlights the necessity of considering the impact that public figures within vaccine campaigns have on upholding the public’s confidence towards the concept of vaccination.Second, our study reveals that vaccine-adverse conspiracy theories led to a sharp decline in sentiment scores. We need to be aware of the fact that social media platforms with a massive number of users, to some degree, “disrupted” traditional vaccine information communication [
], allowing antivaccination advocates to disseminate misleading messages to a certain audience, whose views on vaccination could be susceptible to change. However, it also means that governmental officials should consider using these platforms to communicate with individuals directly about vaccination via geotailored messages to address concerns specific to a certain region.Third, different states demonstrated various trends in sentimental and emotional scores. Our geospatial analysis and map visualization [
] better portray more aspects of users’ attitudes towards COVID-19 vaccines. This helps identify the areas with high negative sentimental and emotional scores that require further research to understand the public's underlying fears and concerns about COVID-19 vaccines. We also recommend government and public health agencies conduct COVID-19 vaccine campaigns in these areas to address people’s fears and concerns about COVID-19 vaccines and provide guidance to access available vaccines.Limitations and Future Work
Our study has several limitations that can be improved in future studies. First, the demographics of Twitter users is typically characterized by younger users who are avid users of mobile phone apps and the internet, and such users may not be able to reflect the opinion and perception of the general public with varying demographics and socioeconomic statuses [
, ]. In addition, the representativeness of Twitter users is not stationary but geographically varying [ , ]. Like other studies that rely on digital devices, the “digital divide” [ ] issue needs to be acknowledged. This study only accounts for the reactions from Twitter users to vaccines, which, to some degree, neglect the underprivileged members of society (especially the poor and elderly), inhabitants of rural areas (who do not have access to digital devices), and those who are not willing to share their thoughts on social media platforms. Additionally, the Twitter API that we used allows access to approximately only 1% of the total records [ ]. As Padilla et al [ ] demonstrated, tweet sentiment can be impacted based on attraction visits throughout the course of a day. Hence, future work needs to increase the sample size to reduce the uncertainties and fluctuations of sentiment scores and emotions. Efforts are also needed to distinguish between local residents and visitors and also conduct investigations under finer temporal scales. In early 2021, Twitter released a new Twitter API (academic research product track) that grants free access to a full-archive search with enhanced features and functionality for researchers to obtain more precise, complete, and unbiased data for analyzing the public conversation [ ]. Further efforts can be made to explore the potential of this new API in mining public opinions towards COVID-19 vaccines at a more granular scale. Since emotion is a complex and integrated product of human feelings [ ], future research efforts can be put into exploring more diverse dimensions of emotion, on top of the 8 primary types of emotion. Moreover, disaster and crisis management includes 4 phases, namely prevention (capacity building), preparation (early warning), response (search, rescue, and emergency relief), and recovery (rehabilitation) [ ]. Management of the COVID-19 pandemic is still in the response phase. For policy and decision-making endeavors that are pertinent to COVID-19 crisis management, it will be highly beneficial if researchers and practitioners continuously monitor emotional and perspective variations throughout the response and also extend the study timeline to the recovery phase or massive vaccination phase in the post-pandemic years. More importantly, to understand the impact of vaccination on countries, the workflow and methodology used in this study can be applied in multiple languages to global-scale geotweets.Acknowledgments
This research was partially funded by the National University of Singapore Start-up Grant under WBS R-109-000-270-133 awarded to WL and NSF under Grant 1841403, 2027540, and 2028791. This research has also been partially supported by Ball State University Digital Fellowship Funding and the Faculty of Arts & Social Sciences Staff Research Support Scheme FY2021 of National University of Singapore (WBS: C-109-000-222-091).
Conflicts of Interest
None declared.
References
- Dong E, Du H, Gardner L. An interactive web-based dashboard to track COVID-19 in real time. The Lancet Infectious Diseases 2020 May 19;20(5):533-534. [CrossRef] [Medline]
- Brown TS, Walensky RP. Serosurveillance and the COVID-19 Epidemic in the US: Undetected, Uncertain, and Out of Control. JAMA 2020 Aug 25;324(8):749-751. [CrossRef] [Medline]
- Nishiura H, Kobayashi T, Miyama T, Suzuki A, Jung S, Hayashi K, et al. Estimation of the asymptomatic ratio of novel coronavirus infections (COVID-19). Int J Infect Dis 2020 May;94:154-155 [FREE Full text] [CrossRef] [Medline]
- Haynes BF. A New Vaccine to Battle Covid-19. N Engl J Med 2021 Feb 04;384(5):470-471 [FREE Full text] [CrossRef] [Medline]
- Ten threats to global health in 2019. World Health Organization. URL: https://www.who.int/news-room/spotlight/ten-threats-to-global-health-in-2019 [accessed 2021-05-30]
- Lazarus JV, Ratzan SC, Palayew A, Gostin LO, Larson HJ, Rabin K, et al. A global survey of potential acceptance of a COVID-19 vaccine. Nat Med 2021 Feb;27(2):225-228 [FREE Full text] [CrossRef] [Medline]
- Lane S, MacDonald NE, Marti M, Dumolard L. Vaccine hesitancy around the globe: Analysis of three years of WHO/UNICEF Joint Reporting Form data-2015-2017. Vaccine 2018 Jun 18;36(26):3861-3867 [FREE Full text] [CrossRef] [Medline]
- Cornwall W. Officials gird for a war on vaccine misinformation. Science 2020 Jul 03;369(6499):14-15. [CrossRef] [Medline]
- Fadda M, Albanese E, Suggs L. When a COVID-19 vaccine is ready, will we all be ready for it? Int J Public Health 2020 Jul;65(6):711-712 [FREE Full text] [CrossRef] [Medline]
- Zeraatkar K, Ahmadi M. Trends of infodemiology studies: a scoping review. Health Info Libr J 2018 Jun 04;35(2):91-120 [FREE Full text] [CrossRef] [Medline]
- Luo W, MacEachren AM. Geo-social visual analytics. JOSIS 2014 Jun 20;8:27-66. [CrossRef]
- Kwak H, Lee C, Park H, Moon S. What is Twitter, a social network or a news media? 2010 Presented at: 19th international conference on World wide web; April 26-30, 2010; Raleigh, NC. [CrossRef]
- Paul M, Dredze M. You Are What You Tweet: Analyzing Twitter for Public Health. 2011 Presented at: Fifth International AAAI Conference on Weblogs and Social Media; July 17–21, 2011; Barcelona, Catalonia, Spain.
- Hussain A, Tahir A, Hussain Z, Sheikh Z, Gogate M, Dashtipour K, et al. Artificial Intelligence-Enabled Analysis of Public Attitudes on Facebook and Twitter Toward COVID-19 Vaccines in the United Kingdom and the United States: Observational Study. J Med Internet Res 2021 Apr 05;23(4):e26627 [FREE Full text] [CrossRef] [Medline]
- Jang H, Rempel E, Roth D, Carenini G, Janjua NZ. Tracking COVID-19 Discourse on Twitter in North America: Infodemiology Study Using Topic Modeling and Aspect-Based Sentiment Analysis. J Med Internet Res 2021 Feb 10;23(2):e25431 [FREE Full text] [CrossRef] [Medline]
- Engel-Rebitzer E, Stokes DC, Buttenheim A, Purtle J, Meisel ZF. Changes in legislator vaccine-engagement on Twitter before and after the arrival of the COVID-19 pandemic. Hum Vaccin Immunother 2021 May 10:1-5. [CrossRef] [Medline]
- Germani F, Biller-Andorno N. The anti-vaccination infodemic on social media: A behavioral analysis. PLoS One 2021;16(3):e0247642 [FREE Full text] [CrossRef] [Medline]
- Guntuku SC, Purtle J, Meisel ZF, Merchant RM, Agarwal A. Partisan Differences in Twitter Language Among US Legislators During the COVID-19 Pandemic: Cross-sectional Study. J Med Internet Res 2021 Jun 03;23(6):e27300 [FREE Full text] [CrossRef] [Medline]
- Roy KC, Hasan S. Modeling the dynamics of hurricane evacuation decisions from twitter data: An input output hidden markov modeling approach. Transportation Research Part C: Emerging Technologies 2021 Feb;123:102976. [CrossRef]
- Wang J, Zhou Y, Zhang W, Evans R, Zhu C. Concerns Expressed by Chinese Social Media Users During the COVID-19 Pandemic: Content Analysis of Sina Weibo Microblogging Data. J Med Internet Res 2020 Nov 26;22(11):e22152 [FREE Full text] [CrossRef] [Medline]
- Mutanga MB, Abayomi A. Tweeting on COVID-19 pandemic in South Africa: LDA-based topic modelling approach. African Journal of Science, Technology, Innovation and Development 2020 Oct 08:1-10. [CrossRef]
- Kwok SWH, Vadde SK, Wang G. Tweet Topics and Sentiments Relating to COVID-19 Vaccination Among Australian Twitter Users: Machine Learning Analysis. J Med Internet Res 2021 May 19;23(5):e26953 [FREE Full text] [CrossRef] [Medline]
- Cotfas L, Delcea C, Roxin I, Ioanas C, Gherai DS, Tajariol F. The Longest Month: Analyzing COVID-19 Vaccination Opinions Dynamics From Tweets in the Month Following the First Vaccine Announcement. IEEE Access 2021;9:33203-33223. [CrossRef]
- Griffith J, Marani H, Monkman H. COVID-19 Vaccine Hesitancy in Canada: Content Analysis of Tweets Using the Theoretical Domains Framework. J Med Internet Res 2021 Apr 13;23(4):e26874 [FREE Full text] [CrossRef] [Medline]
- Gbashi S, Adebo OA, Doorsamy W, Njobeh PB. Systematic Delineation of Media Polarity on COVID-19 Vaccines in Africa: Computational Linguistic Modeling Study. JMIR Med Inform 2021 Mar 16;9(3):e22916 [FREE Full text] [CrossRef] [Medline]
- Chang C, Monselise M, Yang CC. What Are People Concerned About During the Pandemic? Detecting Evolving Topics about COVID-19 from Twitter. J Healthc Inform Res 2021 Jan 17:1-28 [FREE Full text] [CrossRef] [Medline]
- Bird S, Klein E, Loper E. Natural language processing with Python: analyzing text with the natural language toolkit. 2019. URL: https://www.nltk.org/book/ [accessed 2021-08-14]
- Hutto CJ, Gilbert E. VADER: A Parsimonious Rule-based Model for Sentiment Analysis of Social Media Text. 2014 Presented at: Eighth International AAAI Conference on Weblogs and Social Media; June 1–4, 2014; Ann Arbor, MI.
- Mohammad S, Turney P. NRC emotion lexicon. Saif Mohammad. 2013 Nov 15. URL: http://www.saifmohammad.com/WebDocs/NRCemotionlexicon.pdf [accessed 2021-08-14]
- Blei DM, Ng AY, Jordan MI. Latent dirichlet allocation. The Journal of Machine Learning Research 2003 Mar 01;3:993-1022. [CrossRef]
- Hu T, She B, Duan L, Yue H, Clunis J. A Systematic Spatial and Temporal Sentiment Analysis on Geo-Tweets. IEEE Access 2020;8:8658-8667. [CrossRef]
- Yan Y, Feng C, Huang W, Fan H, Wang Y, Zipf A. Volunteered geographic information research in the first decade: a narrative review of selected journal articles in GIScience. International Journal of Geographical Information Science 2020 Feb 26;34(9):1765-1791. [CrossRef]
- Řehůřek R. GENSIM: topic modelling for humans. URL: https://radimrehurek.com/gensim/ [accessed 2021-05-30]
- bmabey / pyLDAvis. GitHub. 2014 Mar 24. URL: https://github.com/bmabey/pyLDAvis [accessed 2021-05-30]
- Jackson LA, Anderson EJ, Rouphael NG, Roberts PC, Makhene M, Coler RN, et al. An mRNA Vaccine against SARS-CoV-2 — Preliminary Report. N Engl J Med 2020 Nov 12;383(20):1920-1931. [CrossRef]
- Guarino B, Cha AE, Wood J, Witte G. ‘The weapon that will end the war’: First coronavirus vaccine shots given outside trials in U.S. The Washington Post. URL: https://www.washingtonpost.com/nation/2020/12/14/first-covid-vaccines-new-york/ [accessed 2021-05-30]
- Goodman J, Carmichael F. Coronavirus: Bill Gates ‘microchip’ conspiracy theory and other vaccine claims fact-checked. BBC News. 2020 May 30. URL: https://www.bbc.com/news/52847648 [accessed 2021-08-14]
- U.S. Government Engages Pfizer to Produce Millions of Doses of COVID-19 Vaccine. U.S. Department of Health & Human Services. 2020 Jul 22. URL: https://www.hhs.gov/about/news/2020/07/22/us-government-engages-pfizer-produce-millions-doses-covid-19-vaccine.html [accessed 2021-05-30]
- Nagourney A, Glueck K. Kamala Harris Takes the Spotlight, a Moment for Her and History. New York Times. 2020 Aug 19. URL: https://www.nytimes.com/2020/08/19/us/politics/kamala-harris-dnc.html [accessed 2021-05-30]
- Callaway E. What Pfizer’s landmark COVID vaccine results mean for the pandemic. Nature. 2020 Nov 9. URL: https://www.nature.com/articles/d41586-020-03166-8 [accessed 2021-05-30]
- Benveniste A. Pfizer CEO: Our vaccine timing had nothing to do with politics. CNN. 2020 Nov 09. URL: https://www.cnn.com/2020/11/09/business/pfizer-covid-vaccine/index.html [accessed 2021-05-30]
- Pfizer-BioNTech COVID-19 Vaccine. U.S. Food & Drug Administration. 2020 Dec 11. URL: https://www.fda.gov/emergency-preparedness-and-response/coronavirus-disease-2019-covid-19/pfizer-biontech-covid-19-vaccine [accessed 2021-05-30]
- Shabad R. Pence receives Covid vaccine in televised appearance, hails "medical miracle". NBC News. 2020 Dec 18. URL: https://www.nbcnews.com/politics/white-house/pence-set-receive-covid-vaccine-televi3ed-appearance-n1251655 [accessed 2021-05-30]
- Herridge C, McDonald C. Department of Defense pauses plan to give COVID-19 vaccine to Guantanamo detainees. CBS News. 2021 Jan 30. URL: https://www.cbsnews.com/news/dod-pauses-plan-to-give-covid-vaccine-to-guantanamo-detainees/ [accessed 2021-05-30]
- Biden Administration purchases additional doses of COVID-19 vaccines from Pfizer and Moderna. U.S. Department of Health & Human Services. 2021 Feb 11. URL: https://www.hhs.gov/about/news/2021/02/11/biden-administration-purchases-additional-doses-covid-19-vaccines-from-pfizer-and-moderna.html [accessed 2021-05-30]
- Wang S, Liu Y, Hu T. Examining the Change of Human Mobility Adherent to Social Restriction Policies and Its Effect on COVID-19 Cases in Australia. Int J Environ Res Public Health 2020 Oct 29;17(21):1 [FREE Full text] [CrossRef] [Medline]
- Roy S, Ghosh P. A Comparative Study on Distancing, Mask and Vaccine Adoption Rates from Global Twitter Trends. Healthcare (Basel) 2021 Apr 21;9(5):1 [FREE Full text] [CrossRef] [Medline]
- Kang GJ, Ewing-Nelson SR, Mackey L, Schlitt JT, Marathe A, Abbas KM, et al. Semantic network analysis of vaccine sentiment in online social media. Vaccine 2017 Jun 22;35(29):3621-3638 [FREE Full text] [CrossRef] [Medline]
- Digital in 2020. We Are Social. URL: https://wearesocial.com/digital-2020 [accessed 2021-05-30]
- Zhang EJ, Chughtai AA, Heywood A, MacIntyre CR. Influence of political and medical leaders on parental perception of vaccination: a cross-sectional survey in Australia. BMJ Open 2019 Mar 26;9(3):e025866 [FREE Full text] [CrossRef] [Medline]
- Abu-Akel A, Spitz A, West R. The effect of spokesperson attribution on public health message sharing during the COVID-19 pandemic. PLoS One 2021 Feb 3;16(2):e0245100 [FREE Full text] [CrossRef] [Medline]
- Puri N, Coomes EA, Haghbayan H, Gunaratne K. Social media and vaccine hesitancy: new updates for the era of COVID-19 and globalized infectious diseases. Hum Vaccin Immunother 2020 Nov 01;16(11):2586-2593 [FREE Full text] [CrossRef] [Medline]
- Mocnik F, Raposo P, Feringa W, Kraak M, Köbben B. Epidemics and pandemics in maps – the case of COVID-19. Journal of Maps 2020 Jun 18;16(1):144-152. [CrossRef]
- Wilson K, Atkinson K, Deeks S. Opportunities for utilizing new technologies to increase vaccine confidence. Expert Rev Vaccines 2014 Aug;13(8):969-977. [CrossRef] [Medline]
- Karami A, Kadari RR, Panati L, Nooli SP, Bheemreddy H, Bozorgi P. Analysis of Geotagging Behavior: Do Geotagged Users Represent the Twitter Population? IJGI 2021 Jun 02;10(6):373. [CrossRef]
- Gore RJ, Diallo S, Padilla J. You Are What You Tweet: Connecting the Geographic Variation in America's Obesity Rate to Twitter Content. PLoS One 2015 Sep 2;10(9):e0133505 [FREE Full text] [CrossRef] [Medline]
- Jiang Y, Li Z, Ye X. Understanding demographic and socioeconomic biases of geotagged Twitter users at the county level. Cartography and Geographic Information Science 2018 Feb 09;46(3):228-242. [CrossRef]
- Huang X, Li Z, Jiang Y, Li X, Porter D. Twitter reveals human mobility dynamics during the COVID-19 pandemic. PLoS One 2020 Nov 10;15(11):e0241957 [FREE Full text] [CrossRef] [Medline]
- Yan Y, Chen J, Wang Z. Mining public sentiments and perspectives from geotagged social media data for appraising the post-earthquake recovery of tourism destinations. Applied Geography 2020 Oct;123:102306. [CrossRef]
- Padilla JJ, Kavak H, Lynch CJ, Gore RJ, Diallo SY. Temporal and spatiotemporal investigation of tourist attraction visit sentiment on Twitter. PLoS One 2018 Jun 14;13(6):e0198857 [FREE Full text] [CrossRef] [Medline]
- Yan Y, Kuo C, Feng C, Huang W, Fan H, Zipf A. Coupling maximum entropy modeling with geotagged social media data to determine the geographic distribution of tourists. International Journal of Geographical Information Science 2018 Apr 20;32(9):1699-1736. [CrossRef]
- Twitter API: Academic Research product track. Twitter Developer. URL: https://developer.twitter.com/en/products/twitter-api/academic-research [accessed 2021-05-30]
- Plutchik R. The Nature of Emotions. Am. Sci 2001;89(4):344. [CrossRef]
Abbreviations
API: application programming interface |
LDA: Latent Dirichlet Allocation |
NRCLex: National Research Council Canada Lexicon |
VADER: Valence Aware Dictionary for Sentiment Reasoning |
Edited by C Basch; submitted 01.06.21; peer-reviewed by FB Mocnik, P Luz, ECY Su, R Zhang, C Lynch; comments to author 20.06.21; revised version received 12.07.21; accepted 26.07.21; published 10.09.21
Copyright©Tao Hu, Siqin Wang, Wei Luo, Mengxi Zhang, Xiao Huang, Yingwei Yan, Regina Liu, Kelly Ly, Viraj Kacker, Bing She, Zhenlong Li. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 10.09.2021.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.