@Article{info:doi/10.2196/20550, author="Xue, Jia and Chen, Junxiang and Hu, Ran and Chen, Chen and Zheng, Chengda and Su, Yue and Zhu, Tingshao", title="Twitter Discussions and Emotions About the COVID-19 Pandemic: Machine Learning Approach", journal="J Med Internet Res", year="2020", month="Nov", day="25", volume="22", number="11", pages="e20550", keywords="machine learning; Twitter data; COVID-19; infodemic; infodemiology; infoveillance; public discussion; public sentiment; Twitter; social media; virus", abstract="Background: It is important to measure the public response to the COVID-19 pandemic. Twitter is an important data source for infodemiology studies involving public response monitoring. Objective: The objective of this study is to examine COVID-19--related discussions, concerns, and sentiments using tweets posted by Twitter users. Methods: We analyzed 4 million Twitter messages related to the COVID-19 pandemic using a list of 20 hashtags (eg, ``coronavirus,'' ``COVID-19,'' ``quarantine'') from March 7 to April 21, 2020. We used a machine learning approach, Latent Dirichlet Allocation (LDA), to identify popular unigrams and bigrams, salient topics and themes, and sentiments in the collected tweets. Results: Popular unigrams included ``virus,'' ``lockdown,'' and ``quarantine.'' Popular bigrams included ``COVID-19,'' ``stay home,'' ``corona virus,'' ``social distancing,'' and ``new cases.'' We identified 13 discussion topics and categorized them into 5 different themes: (1) public health measures to slow the spread of COVID-19, (2) social stigma associated with COVID-19, (3) COVID-19 news, cases, and deaths, (4) COVID-19 in the United States, and (5) COVID-19 in the rest of the world. Across all identified topics, the dominant sentiments for the spread of COVID-19 were anticipation that measures can be taken, followed by mixed feelings of trust, anger, and fear related to different topics. The public tweets revealed a significant feeling of fear when people discussed new COVID-19 cases and deaths compared to other topics. Conclusions: This study showed that Twitter data and machine learning approaches can be leveraged for an infodemiology study, enabling research into evolving public discussions and sentiments during the COVID-19 pandemic. As the situation rapidly evolves, several topics are consistently dominant on Twitter, such as confirmed cases and death rates, preventive measures, health authorities and government policies, COVID-19 stigma, and negative psychological reactions (eg, fear). Real-time monitoring and assessment of Twitter discussions and concerns could provide useful data for public health emergency responses and planning. Pandemic-related fear, stigma, and mental health concerns are already evident and may continue to influence public trust when a second wave of COVID-19 occurs or there is a new surge of the current pandemic. ", issn="1438-8871", doi="10.2196/20550", url="http://www.jmir.org/2020/11/e20550/", url="https://doi.org/10.2196/20550", url="http://www.ncbi.nlm.nih.gov/pubmed/33119535" }