Health, Psychosocial, and Social Issues Emanating From the COVID-19 Pandemic Based on Social Media Comments: Text Mining and Thematic Analysis Approach

Background The COVID-19 pandemic has caused a global health crisis that affects many aspects of human lives. In the absence of vaccines and antivirals, several behavioral change and policy initiatives such as physical distancing have been implemented to control the spread of COVID-19. Social media data can reveal public perceptions toward how governments and health agencies worldwide are handling the pandemic, and the impact of the disease on people regardless of their geographic locations in line with various factors that hinder or facilitate the efforts to control the spread of the pandemic globally. Objective This paper aims to investigate the impact of the COVID-19 pandemic on people worldwide using social media data. Methods We applied natural language processing (NLP) and thematic analysis to understand public opinions, experiences, and issues with respect to the COVID-19 pandemic using social media data. First, we collected over 47 million COVID-19–related comments from Twitter, Facebook, YouTube, and three online discussion forums. Second, we performed data preprocessing, which involved applying NLP techniques to clean and prepare the data for automated key phrase extraction. Third, we applied the NLP approach to extract meaningful key phrases from over 1 million randomly selected comments and computed sentiment score for each key phrase and assigned sentiment polarity (ie, positive, negative, or neutral) based on the score using a lexicon-based technique. Fourth, we grouped related negative and positive key phrases into categories or broad themes. Results A total of 34 negative themes emerged, out of which 15 were health-related issues, psychosocial issues, and social issues related to the COVID-19 pandemic from the public perspective. Some of the health-related issues were increased mortality, health concerns, struggling health systems, and fitness issues; while some of the psychosocial issues were frustrations due to life disruptions, panic shopping, and expression of fear. Social issues were harassment, domestic violence, and wrong societal attitude. In addition, 20 positive themes emerged from our results. Some of the positive themes were public awareness, encouragement, gratitude, cleaner environment, online learning, charity, spiritual support, and innovative research. Conclusions We uncovered various negative and positive themes representing public perceptions toward the COVID-19 pandemic and recommended interventions that can help address the health, psychosocial, and social issues based on the positive themes and other research evidence. These interventions will help governments, health professionals and agencies, institutions, and individuals in their efforts to curb the spread of COVID-19 and minimize its impact, and in reacting to any future pandemics.


Background
Infectious diseases have occurred in the past and continue to emerge. Infectious diseases are termed "emerging" if they newly appear in a population or have existed but are increasing rapidly in incidence or geographic range [1]. Examples of emerging infectious diseases include acquired immunodeficiency syndrome, Ebola, dengue hemorrhagic fever, Lassa fever, severe acute respiratory syndrome (SARS), H1N1 flu, Zika, etc [2]. Evidence shows that emerging infectious diseases are among the leading causes of death and disability globally [3]. For instance, a 1-year estimate of the 2009 H1N1 flu pandemic shows that 43-89 million people were infected [4], and 201,200 respiratory deaths and 83,300 cardiovascular deaths were linked to the disease [5] worldwide. In addition, 770,000 HIV deaths were recorded in 2018 alone, with approximately 37.9 million people already infected with the virus globally [6]. Ebola is another deadly infectious disease that has an average case-fatality rate of about 50%, with a range of 25%-90% case-fatality rates in past outbreaks [2,7].
In December 2019, COVID-19, caused by the novel coronavirus, emerged and soon became the latest deadly infectious disease [8,9] worldwide, with more than 9.4 million confirmed cases and over 482,800 deaths in 188 countries and regions as of June 25, 2020 [10]. Hence, it was declared a pandemic by the World Health Organization. The COVID-19 pandemic has strained the global health systems and caused socioeconomic challenges due to job losses and lockdowns (and other restrictive measures) imposed by governments and public health agencies to curtail the spread of the virus. Evidence has already shown that emerging infectious diseases impose significant burden on global economies and public health [3,[11][12][13]. To understand public concern, personal experiences, and factors that hinder or facilitate the efforts to control the spread of the COVID-19 pandemic, social media data can produce rich and useful insights that were previously impossible in both scale and extent [14].
Over the years, social media has witnessed a surge in active users to more than 3.8 billion worldwide [15], making it a rich source of data for research in diverse domains. In the health domain, social media data (ie, user comments or posts on Twitter, Facebook, YouTube, Instagram, online forums, blogs, etc) have been used to investigate mental health issues [16,17], maternal health issues [18,19], diseases [20][21][22][23][24], substance use [25,26], and other health-related issues [27,28]. Other domains (eg, politics, commerce, marketing, or banking) have also witnessed widespread use of social media data to uncover new insights related to election results [29][30][31][32], election campaigns [33], customer behavior and engagement [34,35], etc. Regarding the COVID-19 crisis, social media data can reveal public perceptions toward how governments and health agencies worldwide are handling the pandemic and the social, economic, psychological, and health impacts of the disease on people regardless of their geographic locations in line with various factors that hinder or facilitate the efforts to control the spread of the COVID-19 pandemic globally.
In this paper, we apply natural language processing (NLP) to understand public opinions, experiences, and issues with respect to the COVID-19 pandemic using data from Twitter, Facebook, YouTube, and three online discussion forums (ie, Archinect [36,37], LiveScience [38], and PushSquare [39]). NLP is a well-established method that has been applied in many JMIR papers and other health informatics papers to understand various health-related issues. For example, Abdalla et al [40] studied the privacy implications of word embeddings trained on clinical data containing personal health information, while Bekhuis et al [41] applied NLP to extract clinical phrases and keywords from a corpus of messages posted to an internet mailing list. Specifically, we aim to answer the following research questions (RQs): • RQ1: What are the negative issues (health, psychosocial, and social issues) shared by people on social media with respect to the COVID-19 pandemic?
• RQ2: What are the positive opinions or perceptions of people with respect to COVID-19 and how it is being handled?
• RQ3: How can the negative issues be addressed using insights from the positive opinions and other research evidence?
The methodological approach used in answering our RQs are as follows: • We apply an NLP approach for extracting opinionated key phrases from COVID-19-related social media comments. • We uncover various negative and positive themes, representing public perceptions toward the COVID-19 pandemic after categorizing the key phrases. Our results revealed 34 negative themes, out of which 15 were health-related issues, psychosocial issues, and social issues related to the pandemic from the public perspective. In addition, 20 positive themes emerged from our results. • We recommend interventions that can help address the health, psychosocial, and social issues based on the positive themes and other research evidence. These interventions will help governments, health professionals and agencies, institutions, and individuals in their efforts to curb the spread of COVID-19 and minimize its impact, as well as in reacting to any future pandemics.

Relevant Literature
Social media has been a rich source of data for research in many domains, including health [42]. Research that uses social media in conjunction with NLP within the health domain continues to grow and cover broad application areas such as health JMIR  surveillance (eg, mental health, substance use, diseases, and pharmacovigilance), health communication, sentiment analysis, and so on [43]. For example, Park and Conway [44] used the lexicon-based approach to track prevalence of keywords indicating public interest in four health issues-Ebola, e-cigarettes, marijuana, and influenza-based on social media data. Afterward, they generated topics that explain changes in discussion volume over time using the latent Dirichlet allocation (LDA) algorithm. Similarly, Jelodar et al [45] applied LDA to extract latent topics in COVID-19-related comments and used the long short-term memory recurrent neural network technique for sentiment classification. Furthermore, Nobles et al [46] used social media data to examine the needs (including seeking health information) of the reportable sexually transmitted diseases community. Their NLP approach involves extracting the top 50 unigrams from the posts based on frequency and then generating topics using the nonnegative matrix factorization technique instead of LDA. Paul et al [47] applied the Ailment Topic Aspect Model to generate latent topics from Twitter data with the aim of detecting mentions of specific ailments including allergies, obesity, and insomnia. They used a list of key phrases to automatically identify possible systems and treatments.
McNeill et al [48] investigated how the dissemination of H1N1-related advice in the United Kingdom encourages or discourages vaccine and antiviral uptake using Twitter data. They conducted an automated content analysis using the KH Coder tool (Koichi Higuchi) to explore potential topics based on frequency of occurrence and then performed a more detailed or conversational analysis to understand skepticism over economic beneficiaries of vaccination and the risks and benefits of medication based on public opinion. On the other hand, Oyebode et al [49] performed sentiment analysis on user reviews of mental health apps using the machine learning approach. They compared five classifiers (based on five different machine learning algorithms) and used the best performing classifier to predict the sentiment polarity of reviews. However, none of the aforementioned approaches considers the context in which words appear in unstructured texts, which instinctively plays a substantial role in conveying meaning.
To investigate the significance of contextual text analysis, Dave and Varma [50] compared the noncontextual n-gram chunking approach and the contextual part-of-speech (POS) chunking approach in their experimental research in the field of advertising. Although the n-gram chunking method simply extracts words of varying lengths within a sentence boundary as candidate key phrases, the POS chunking method infers the context of words using POS patterns such as one or more noun tags (NN, NNP, NNS, and NNPS) along with adjective tags (JJ) and optional cardinal tags (CD) and determiners (DT). They focused on key phrases up to a length of 6 for their experiments.
Their initial assessment showed that the majority of the key phrases generated using the n-gram chunking method are not meaningful within the advertising context, hence not useful. Furthermore, they observed the impact of key phrases from both methods on the performance of classification systems based on naive Bayes, logistic regression, and bagging machine learning algorithms. Their findings revealed that systems using the POS chunking method outperformed those using the n-gram chunking method for feature extraction. We leveraged Dave and Varma's [50] contextual method in this study and extended it to capture additional POS patterns, NLP preprocessing techniques, and sentiment scoring using a lexicon-based technique.
Finally, to uncover insights about the type of information shared on Twitter during the peak of the H1N1 (swine flu) pandemic in 2009, Ahmed et al [51] generated 8 broad themes using a coding method involving expert reviewers. Similarly, Bekhuis et al [41] involved two dentists to manually and iteratively classify clinical phrases into categories and subcategories. We also used this method in the key phrase categorization stage of our study to group related key phrases into categories or broad themes.

Overview
The main goal of this paper is to understand and reflect on people's personal experiences and opinions with respect to the COVID-19 pandemic using social media data. To achieve this, we applied various standard and well-known computational techniques that are highlighted in the following section and summarized in Figure 1.

•
We collected COVID-19-related comments or posts from Twitter, Facebook, YouTube, and three online discussion forums using programming languages (Python and C#) and relevant application programming interfaces (APIs).

•
We performed preprocessing tasks that involve applying NLP techniques to clean the data and prepare them for the key phrase extraction phase.

•
We applied the NLP approach to extract meaningful key phrases, which are words or phrases that convey the topical content of the comments. This approach is in seven stages: grammar definition, sentence breaking and tokenization, POS tagging, lemmatization, syntactic parsing, transformation and filtering, and sentiment scoring.
• Based on the sentiment scores associated with the candidate key phrases, we automatically assigned sentiment polarity to the key phrases and then grouped negative and positive key phrases into categories or broad themes using the thematic analysis method. This helps to answer our RQs.

Data Collection
We used various automated techniques to collect 47,410,795 COVID-19-related or coronavirus-related comments from six social media platforms: Twitter, YouTube, Facebook, Archinect, LiveScience, and PushSquare. The following describes the techniques and the breakdown of the data collected from each platform: 1. Twitter: We developed a tool using C# programming language to automatically extract tweets containing relevant hashtags in real time through the Twitter Streaming API [52]. To determine trending Twitter hashtags, we searched for "Trending Twitter hashtags on COVID-19" using the Google search engine and retrieved various popular hashtags from several websites including RiteTag [53] and Insider [54]. In addition, we checked a sample of top tweets on Twitter to see other common COVID-19-related hashtags they contained.

Data Preprocessing
Next, we applied the following NLP techniques to clean and prepare data for analysis using Python: • Convert slangs to their equivalent English words using online slang dictionaries [56,57], which contain 5434 entries in total

Remove numeric words
After the preprocessing tasks were completed, non-English and duplicated comments were removed, thereby reducing the total number of comments to 8,021,341.

Key Phrase Extraction
Next, we randomly selected 1,051,616 comments (representing approximately 13% of the entire data set) and then extracted meaningful key phrases that conveyed the topical content of the comments. We refer to the data set containing the comments as corpus and each comment as document in the remaining parts of this paper. We focused on key phrases that are opinionated (ie, express or imply positive or negative sentiment [58]) since our goal was to determine public opinions and impact with respect to the COVID-19 pandemic. We extracted candidate key phrases from our corpus using a seven-stage NLP approach, shown in Figure 2. We implemented our approach using the Python programming language. To derive meaningful key phrases, we defined the following regular grammar: <DT>? <JJ.*>* <NN.*>* <VB.*>? (<IN>? <DT>? <JJ.*>* <NN.*>*)? which specifies a meaningful POS pattern that the syntactic parser uses to deconstruct each sentence in the documents into their constituents [59]. Table 1 shows the various parts of speech (or syntactic categories) captured in the grammar. These syntactic categories are based on well-established POS tagging guidelines for English [60].
In the aforementioned regular grammar, the "?" and "*" characters represent "optional" and "zero or more occurrences," respectively. Our regular grammar is aimed at generating key phrases that capture both context and sentiment of a conversation using nouns, adjectives, and verbs. Research has shown that nouns are most useful in knowing the context of a conversation (ie, what is being discussed) [61], while verbs and adjectives are important for sentiment detection [62]. Determiners and prepositions are also captured by the grammar since they usually co-occur with noun or adjective phrases (eg, a meal for six people or a hospital on the hilltop). Next, each document is split into sentences, and then each sentence is split into tokens or words. The sentence breaking task is achieved using an unsupervised algorithm that considers abbreviations, collocations, capitalizations, and punctuations to detect sentence boundaries [63]. The tagging module associates each token with its POS. The POS tags are based on the Penn Treebank tagset [60,64], some of which are shown in Table 1. Each token is reduced to its root form, depending on its POS. This activity is called lemmatization. For example, worse and better, which are both adjectives, will become bad and good, respectively. Prior to lemmatization, each token is converted to lowercase. Although Witten et al [65] applied stemming for its tokens, we chose lemmatization over stemming since lemmatization returns root words that are always meaningful and exist in the English dictionary. Stemming, on the other hand, may return root words that have no meaning at all since it merely removes prefixes or suffixes based on a rule-based method [66].
Furthermore, the syntactic parsing module deconstructs each sentence into a parse tree and then creates chunks or phrases based on the regular grammar or POS pattern defined in the first step. In other words, the parser's chunking process involves matching phrases composed of an optional determiner, zero or more of any time of adjective, zero or more of any type of noun, any type of verb (but optional), and an optional component. This component consists of an optional preposition, an optional determiner, zero or more of any type of adjective, and zero or more of any type of noun. The output of this stage is the candidate key phrases.
In the transformation and filtering stage, key phrases that are stop words (ie, words that are commonly used, such as the, a, an, with, in, and that) are removed from candidate key phrases using a predefined list L stopwords compiled from multiple sources (eg, [67]). We excluded negation words, which are necessary for sentiment detection, such as not, from the list of stop words. In addition, a subset of L stopwords were removed from the start and end of (and from within) the remaining key phrases in the candidate key phrases such that the meaning of the key phrases is preserved. Afterward, duplicates were removed from the candidate key phrases. Although previous research excluded key phrases above length 6 [50], we included key phrases up to length 10 in our analysis to avoid losing important key phrases that would have enriched insights from this paper. Hence, key phrases containing more than 10 words were removed from the candidate key phrases. Since our focus is on opinionated key phrases (ie, positive and negative key phrases), we applied a filtering technique that involves computing sentiment score for each key phrase and discarding nonopinionated key phrases.
Finally, to identify negative and positive key phrases in the candidate key phrases, the scoring module computes a sentiment score, S score , ranging from -1 to 1 for each key phrase using the Valence Aware Dictionary for Sentiment Reasoning lexicon-based algorithm [68]. Afterward, each key phrase is assigned a polarity (negative or positive) based on the S score using the criteria recommended by Hutto and Gilbert [68]. Specifically, a key phrase is negative if S score <-0.05, while a key phrase is positive if S score >0.05. A neutral key phrase (with S score between -0.05 and 0.05) was removed from the candidate key phrases since it is not opinionated.

Key Phrase Categorization
To answer our RQs, we categorized the final candidate key phrases into categories or broad themes using a thematic analysis approach used by Bekhuis et al [41] to classify clinical phrases into categories. In this approach, expert reviewers manually examine the key phrases and then assign them to appropriate categories. We recruited four reviewers to perform our key phrase categorization task. Specifically, we assigned the negative key phrases to a group of two reviewers (G1) and the positive key phrases to a second group of two reviewers (G2). Each reviewer independently examined the key phrases iteratively and continued to categorize related key phrases until a saturation level was reached (ie, no new categories were emerging from the key phrases). Reviewers used coding sheets in which they indicated the category each key phrase belonged to after examining it. Category names were decided by each reviewer such that a new category was created if none of the existing categories matched the key phrase being reviewed. Since key phrases are more specific than comments, the reviewers assigned each key phrase to only one category. In other words, reviewers assign a key phrase to the most appropriate category or to a new category if none of the existing categories was suitable. After categorizing the key phrases, the reviewers in each team validated each other's work and agreed or disagreed with the category assigned to each key phrase, and offered suggestions to address every disagreement. The reviewers came together after completing their validations to apply the suggestions and ensure all category names were distinct while harmonizing names that are similar. We measured interrater reliability using the percentage agreement metric [69]. The percentage agreement score for G1 was 98.0%, while the score for G2 was 99.3%. We refer to the categories as themes and the various key phrases under each category as subthemes in the remaining part of this paper.

Key Phrase Extraction
In this section, we discuss the results of our experiments and key phrase categorization. From the large corpus used for the experiment, 427,875 negative and 520,685 positive key phrases were automatically generated. However, the majority of these key phrases were similar; hence, the reviewers reached a saturation point (during key phrase categorization) where no new categories were emerging. In total, 18,694 negative and 19,841 positive key phrases were categorized.

Positive Key Phrases
Multimedia Appendix 3 illustrates the top 130 positive key phrases and their dominance in terms of frequency of occurrence (larger size of the gray oval represents more dominance in the figure in Multimedia Appendix 3). Our results revealed that help (n=18,498) was the dominant key phrase, followed by hope (n=7708), protect (n=7130), love (n=6895), support (n=6198), good (n=5740), share (n=5187), care (n=4917), stay safe (n=4917), and so on. Multimedia Appendix 4 shows more positive key phrases, such as keep everyone safe, clean environment, trust scientific data, create cure, economic relief, encourage business, remain strong, good mask, social distancing best way, generous, respect human right, help prevent further spread, pray for health, social solidarity, support relief effort, protect health worker, good immune system, practice good hand hygiene, speak truth, expand testing, protect vulnerable people, free treatment, and ease anxiety.

Key Phrase Categorization
Overall, 34 negative and 20 positive themes emerged after the key phrase categorization phase discussed in the Methods section. Out of the 34 negative themes, 15 were health-related, psychosocial, and social issues (which were the main focus of this paper and are shown in Tables 2-4). Table 5 shows the 15 negative themes and the corresponding number of key phrases under each theme, while Table 6 shows the negative themes and the total number of comments for each theme. Frustration due to life disruptions emerged as the top negative theme with the highest number of comments, followed by increased mortality, comparison with other diseases or incidents, nature of the disease, and harassment. On the other hand, Table 7 shows the 20 positive themes, description, and sample comments. Table 8 shows the corresponding number of key phrases under each positive theme, while Table 9 shows the total number of comments for each theme. Public awareness emerged as the top positive theme based on the number of comments, followed by spiritual support, encouragement, and charity. By identifying negative and positive themes from COVID-19-related comments, we have answered RQ1 and RQ2, respectively.

Principal Results
In this paper, we analyzed social media comments to uncover insights regarding people's opinions and perceptions toward the COVID-19 pandemic using an NLP approach. Our empirical findings revealed negative and positive themes (see Tables 2-4  and Table 7) representing negative and positive impacts of the COVID-19 pandemic and coping mechanisms on the world population. To answer RQ3, we first discussed each of the negative issues (supported by research evidence) in this section and then suggest interventions to address the issues in a later section. Tables 2-4 show the negative themes grouped under health-related issues, psychosocial issues, and social issues from our results. The health-related issues included health concerns, increased mortality, struggling health systems, fitness issues, nature of disease, rising number of cases, and comparison with other diseases or incidents. The psychosocial issues were expression of fear, panic shopping, retrospection, work-from-home issues, and frustration due to life disruptions. The social issues were wrong societal attitude, domestic violence, and harassment.

Health-Related Issues
Evidence shows a rapid increase in the number of COVID-19 cases and a high case-fatality rate of 7.2% [70]. In addition, a substantial number of patients who are infected had severe pneumonia or were critically ill [70]. Another study revealed the mental health issues experienced by people and health professionals directly impacted by the COVID-19 pandemic [71], and the global health care systems' inability to deal with the outbreak [72]. The themes under this category are discussed in the following subsections. They align with existing research and uncovered additional insights with respect to the health-related issues caused by COVID-19 and witnessed by people worldwide.

Health Concerns
Based on our findings, people experienced various mental health issues (eg, anxiety, depression, stress, or obsessive compulsive disorder) during the pandemic. This is possibly due to the length of time spent staying at home (which may be traumatic for some people while causing loneliness for others), worrying about being infected with the disease and difficult living conditions, as well as guilt on the part of health care workers who feel responsible for being unable to save their patients from death. Research confirms that worry is associated with anxiety and depression [73]. Cases of mental health disorders linked to COVID-19 have also been reported [74]. Furthermore, people expressed other concerns like excessive drinking, migraines, chest pains, mild to severe fatigue, nasal mucosal ulcers, sleep disorders, and others. The following are sample comments:

Increased Mortality
People attested to an increase in death rates in many countries across different continents including North America, Europe, Asia, the Middle East, and Australia, as shown in the following sample comments. Many countries, especially those in Africa, started reporting deaths from COVID-19 (see C1264). Our findings also revealed people of varying demographics died from the disease, including teenagers, adults, and older adults, as well as those with or without underlying health conditions (see C8837 and C940).

This is why America leads the world in the death toll already, and the pace still is not showing any signs of slackening. [C3399]
UK coronavirus death rate DOUBLES as 381 die in 24 hours and boy, 13,

Struggling Health Systems
Health systems worldwide are struggling to cope with the surge in the number of patients with COVID-19 and in most cases are unable to admit patients due to limited resources [75]. Research has shown that health care burden due to COVID-19 is associated with the increase in mortality rate [76]. As revealed in the following sample comments, our findings corroborated evidence of overstretched global health systems during this pandemic.

Fitness Issues
Evidence argues that the prevalence of physical inactivity worldwide due to nationwide quarantines or lockdowns [77]. This was confirmed in our findings, which showed that people have trouble staying fit due to an inability to control eating habits or urges while at home and have a personal dislike for indoor-only workouts, as shown in the following comments. Physical inactivity has been linked to coronary heart disease, diabetes, stroke, and mental health issues [78][79][80], which, in turn, are risk factors for mortality in COVID-19 adult inpatients [81].

Nature of Disease
People expressed their opinion about the nature of COVID-19 based on their experiences and information available to them. As shown in the following sample comments, people with underlying health conditions (eg, diabetes or heart disease) are at higher risk of developing severe complications from the disease. In addition, the asymptotic attribute of COVID-19 is also discussed, and the possibility of the virus infecting some critical immune cells that may lead to the failure of sensitive organs like the lungs. People also perceived the disease as racialor nationality-independent but seems to pose more risk to men than women. The disease is also seen as highly contagious and shows symptoms such as cough, fever, fatigue, loss of smell, muscle aches, and respiratory-related symptoms (eg, shortness of breath). These findings align with clinical evidence regarding COVID-19 [82][83][84][85][86][87].

Rising Number of Cases
Our findings show that more people are getting infected with COVID-19 in many parts of the world, as shown in the following sample comments. Evidence confirms increasing numbers of COVID-19 cases in North America [88,89] and Europe [90], as well as a growing concern for vulnerable continents such as Africa [91].

Comparison With Other Diseases or Incidents
Our findings revealed that people compare COVID-19 with other diseases such as the flu (eg, Spanish flu and H1N1 swine flu) and SARS, and with more extreme incidents such as war. However, although some people tend to downplay the severity of COVID-19 (see C647), others think it is dangerous or frightening (see C922 and C45). Research has shown that COVID-19 has a higher transmissibility rate than SARS [92] and has killed more people than SARS and Middle East respiratory syndrome combined [93], thereby making it a highly contagious and lethal disease.

Expression of Fear and Panic Shopping
Based on our findings, people are fearful or scared about COVID-19, and although many expressed genuine fear (including those who had lost loved ones to the disease, contracted the disease, or had an infected family member), others attributed it to fear mongering that is further amplified by the media. As a result of this fear, many people engaged in panic buying to stockpile essential items so they can stay indoors and limit movements for some days or weeks to keep themselves and their families safe. The following are sample comments:

Work From Home Complaints
Furthermore, the pandemic triggered work from home (or remote work) measures to promote continuity of businesses during lockdown [94], but this may have negative implications on people's lifestyle and well-being. For example, people found consistently working from home exhausting, boring, and distracting with kids at home. In addition, people living in countries without stable electricity and strong internet found it difficult and more costly to work from home, as they have to fuel their generators and pay more for considerably good internet connectivity. Evidence has shown that people work longer hours at home than on site due to difficulty in maintaining clear delineation between work and nonwork domains [95], thereby leading to work-family conflict and strain [96]. The following are sample comments:

Frustration Due to Life Disruptions and Retrospection
Finally, people are generally frustrated about life disruptions caused by COVID-19 (which is the top issue based on our empirical findings as shown in Table 6). Based on our findings, this frustration is mostly due to decreased leisure and interaction with friends and family, authorities' actions and inactions, and uncertainty of upcoming situations, which leads to cognitive dissonance [97], insecurity, and mental discomfort [98]. People expressed their frustrations using words reflecting anger and unhappiness or sadness, as shown in the following sample comments. Research has shown that positive emotions (eg, happiness) and life satisfaction decreased during the COVID-19 pandemic [99]. Therefore, it is unsurprising that people missed (and crave for) their prepandemic lives, in retrospection (see C377).

Wrong Societal Attitude
Our findings revealed disapproval and concerns about people's defiance of precautionary measures or guidelines (eg, social distancing and travel guidelines) to curb the spread of COVID-19 (see C7218 and C1444) and some people's habit of eating animals assumed to be carriers of viruses (see C4013). Research has highlighted certain factors responsible for reduced compliance with public health guidelines, such as poverty, economic dislocation, lack of compensation, and mistrust of science [100][101][102].
The only good thing about Coronavirus is that it will cull the stupid people from amongst us -those that do not take it seriously and continue to gather in public, those that go overseas to attend weddings and other events when they know the risk... [C7218] The public response to this crisis in the UK has been absolutely pathetic. Showing

Harassment
Our findings also uncovered undue harassment of people from certain cultures, races, or religious background, accusing them of spreading COVID-19. The following sample comments reveal public intimidation and racist attacks toward Chinese and Asians as well as certain Indian tribes. This aligns with evidence of widespread anti-Chinese and anti-Asian xenophobic or racist attacks, especially in the United States, both physically and on social media [108][109][110][111][112][113].

Recommended Interventions for Addressing the Negative Issues
In this section, we suggest interventions that can help address the negative issues while drawing insights from the positive themes and relevant research evidence. This answers our RQ3.
As lockdown and physical distancing persists, people with health concerns should be able to receive medical attention without visiting a hospital. Considering the proliferation of smartphones and the current wave of global digitization, digital interventions using mobile, artificial intelligence (AI), internet of things (IoT), and virtual reality technologies have been shown to be effective for delivering remote health care (or telehealth) to patients [114][115][116][117][118][119]. This is based on our findings under the innovative research positive theme (see Table 7), which revealed global research efforts to create digital interventions using emerging technologies to address the health crisis caused by COVID-19. For example, mobile apps that detect mental health issues (eg, depression and anxiety) based on phone sensors (or wearable sensors) data and self-reports using machine learning and deep learning models, and then guide users through therapeutic procedures or treatments will be useful tools during and after the pandemic. In addition, these apps should allow users to book appointments with doctors, clinicians, or therapists and access remote medical advice, diagnosis, and treatments when necessary.
In addition, data-driven surveillance systems based on AI that predict the location of the next COVID-19 outbreak can enhance the effectiveness of containment efforts, thereby slowing the spread of the disease and reducing the case-fatality rate. Furthermore, the development of curative solutions or treatments (see Table 7) can be accelerated by leveraging machine learning and deep learning algorithms. For example, deep learning models can be used to predict chemical compounds that can halt viral replication and to suggest drugs that can be effective against the virus.
To address fitness issues during lockdown, physical activity (which is one of the positive themes in our results) programs or sessions with personalized feedback delivered through mobile apps would be helpful. Research has shown that smartphone-based health programs yield significant weight loss and increase physical activity [120]. There is also an urgent need to strengthen the global health care systems to cope with current and future pandemics through public and private investments in the health sector on an ongoing basis, such as provision of public health infrastructure that is robust and adequate for the target population and easily accessible and the provision of health insurance for everyone irrespective of financial status.
Public awareness (which emerged as the top positive theme in our findings) is also crucial for addressing negative issues arising from COVID-19 by providing timely and accurate information to people, which can be lifesaving. To reach a wider audience in an efficient manner and with less cost, public awareness can be delivered through mobile technologies, such as mobile-driven and voice-enabled conversational AI agents (or chatbots) with access to evidence-based and clinically validated resources (eg, precautionary or safety measures approved by public health agencies and organizations as well as government-approved policies or guidelines), can deliver accurate information regarding COVID-19 to people in their own native language (and in an interactive fashion) through their smartphones. These chatbots can also be made to route difficult questions to health experts for real-time feedback within the same chat session. This will help to improve people's understanding of the disease, including how it differs from other infectious diseases, and how to protect themselves and their families from getting infected with COVID-19. In addition, people will be empowered with information required to effectively respond to fear mongering, domestic violence, and harassment. Evidence already shows the deployment of multilingual chatbots for public health awareness on COVID-19 symptoms, diagnosis, and precautionary measures [121]. Furthermore, chatbots can also respond to emergencies by contacting appropriate security agencies and emergency response teams on behalf of the users. Moreover, chatbots can deliver evidence-based therapeutic interventions to people while coordinating with specialists behind the scenes where necessary.
For people with nonsmartphone devices, public health agencies can partner with telecommunication companies to deliver COVID-19-related information directly to their phones as text messages at regular intervals. Social media is another platform through which evidence-based information can be shared with the public but may be overshadowed by fake news or false information, which is mostly shared on social media [122]. Nevertheless, official COVID-19-related channels managed by (or in conjunction with) reputable international health organizations (eg, World Health Organization) or local health authorities within the social media platforms, many of which have already been deployed, provide accurate information or updates about COVID-19 cases, fatality rates, and safety measures and guidelines [123,124]. In addition, people receive location-based updates on these channels, including emergency alerts, in a timely and effective manner.
Finally, based on our findings (see Table 7), connection with family and friends, encouragement, spiritual support, and charity can help to ease people's frustrations, anxiety, and trauma (due to life disruptions caused by the pandemic) by addressing their emotional, physical, and spiritual needs. Evidence shows that psychological first aid and spiritual care can promote a sense of safety, calmness, self-and collective-efficacy, connectedness, and hope, as well as help people confront and overcome fear [125]. Therefore, people should endeavor to frequently communicate and follow up with loved ones (through direct voice or video calls or by using social media), encourage others in distress to stay calm and remain positive, identify people's immediate needs and offer necessary assistance, help people find hope and meaning, and ensure the safety and comfort of vulnerable populations.
Mobile technology can play a key role in facilitating easy access to relief packages. For instance, mobile apps can be deployed with geolocation and multilingual features to help people locate the nearest food bank and charity organizations offering assistance in their geographical area. In addition, charity organizations can effectively mobilize and deliver relief items to more people, including individuals that are indisposed, based on data collected through these apps. In addition, older adults, the sick, and those in self-isolation can indicate their condition while requesting for relief so that their items can be delivered to their doorstep instead of picking it up. These apps can further integrate with other local and international charity organizations to widen the coverage of relief efforts. Recruitment of volunteers can also take place through these apps. The use data collected can be further analyzed in real time and used to predict the communities that are in dire need of assistance using machine learning or deep learning techniques.

Limitations
In this study, we analyzed data from Twitter, Facebook, YouTube, and three discussion forums. However, people may have used other social media platforms such as Instagram and other discussion forums not covered in this study to disseminate information related to the COVID-19 pandemic. Therefore, our findings may not fully reflect the entire public's opinion on social media with respect to the pandemic. Nevertheless, to have a reasonably broad understanding of public opinions, we analyzed over 1 million social media comments compared to only a few thousand commonly analyzed in many related studies. In addition, the thematic analysis used for theme categorization may be more robust; however, the large number of key phrases rendered this process time-consuming despite filtering out many irrelevant key phrases during experimentation. Accordingly, the saturation level and subsequent review and confirmation of the theme categories from a second reviewer and coder were introduced as an acceptable compromise.

Conclusions
In this paper, we explored the impact of the COVID-19 pandemic on people worldwide using social media data. We analyzed over 1 million comments obtained from six social media platforms using a seven-stage NLP approach to extract candidate key phrases, which we further categorized into broad themes using thematic analysis. Our results revealed 34 negative themes, out of which 15 were health-related issues, psychosocial issues, and social issues related to the COVID-19 pandemic from the public perspective. The top health-related issues were increased mortality, comparison with other diseases or incidents, nature of disease, and health concerns, while the top psychosocial issues were frustrations due to life disruptions, panic shopping, and expression of fear. The top social issues were harassment and domestic violence. Besides the negative themes, 20 positive themes emerged from our results. Some of the positive themes were public awareness, encouragement, gratitude, cleaner environment, online learning, charity, spiritual support, and innovative research. We reflected on our findings and recommend interventions that can help address the health, psychosocial, and social issues based on the positive themes and other research evidence.
Digital interventions using emerging technologies such as mobile apps, AI, IoT, and virtual reality will play a major role in delivering remote health care (ie, telemedicine or telehealth) to people in the comfort of their homes, including empowering them to self-manage their health and wellness. This will help to curb the spread of COVID-19 and future infectious diseases since many people will stay away from hospitals (or clinics) to book appointments or see doctors (or other health care professionals) unless it is absolutely necessary to visit, thereby keeping health workers and patients safe. These technologies are also useful in providing timely and accurate information about COVID-19 symptoms, diagnosis, treatment, precautionary and safety measures and guidelines, and other relevant information to target audience worldwide. Finally, digital interventions and other interventions discussed in this paper can help address the emotional, physical, and spiritual needs of people who are traumatized or frustrated by the disruptions caused by the pandemic. They also inform governments, health professionals and agencies, and institutions on how to react to the current COVID-19 pandemic and future pandemics.