Review
Abstract
Background: Medicinal cannabis is increasingly being used for a variety of physical and mental health conditions. Social media and web-based health platforms provide valuable, real-time, and cost-effective surveillance resources for gleaning insights regarding individuals who use cannabis for medicinal purposes. This is particularly important considering that the evidence for the optimal use of medicinal cannabis is still emerging. Despite the web-based marketing of medicinal cannabis to consumers, currently, there is no robust regulatory framework to measure clinical health benefits or individual experiences of adverse events. In a previous study, we conducted a systematic scoping review of studies that contained themes of the medicinal use of cannabis and used data from social media and search engine results. This study analyzed the methodological approaches and limitations of these studies.
Objective: We aimed to examine research approaches and study methodologies that use web-based user-generated text to study the use of cannabis as a medicine.
Methods: We searched MEDLINE, Scopus, Web of Science, and Embase databases for primary studies in the English language from January 1974 to April 2022. Studies were included if they aimed to understand web-based user-generated text related to health conditions where cannabis is used as a medicine or where health was mentioned in general cannabis-related conversations.
Results: We included 42 articles in this review. In these articles, Twitter was used 3 times more than other computer-generated sources, including Reddit, web-based forums, GoFundMe, YouTube, and Google Trends. Analytical methods included sentiment assessment, thematic analysis (manual and automatic), social network analysis, and geographic analysis.
Conclusions: This study is the first to review techniques used by research on consumer-generated text for understanding cannabis as a medicine. It is increasingly evident that consumer-generated data offer opportunities for a greater understanding of individual behavior and population health outcomes. However, research using these data has some limitations that include difficulties in establishing sample representativeness and a lack of methodological best practices. To address these limitations, deidentified annotated data sources should be made publicly available, researchers should determine the origins of posts (organizations, bots, power users, or ordinary individuals), and powerful analytical techniques should be used.
doi:10.2196/35974
Keywords
Introduction
Medicinal Cannabis Pharmacovigilance
Cannabis has been widely used for a variety of purposes, including medicinal applications, throughout human history. Over the last century, its use has been prohibited in Europe, Northern America, and Australasia [
]. Since 2016, these jurisdictions have incrementally authorized the use of medicinal cannabis for certain conditions [ ]. Given the substantial public interest in cannabis as medicine, there is a pressing need to better understand its safety and efficacy.However, aside from clinical trials, there are scant data regarding the efficacy and side effects of medicinal cannabis [
- ]. One of the main methods for postmarketing safety surveillance of medications is the use of established pharmacovigilance reporting systems, which rely on reporting of adverse events by individuals [ - ]. Cannabis users are often unaware of these systems or the importance of reporting. They may find them too difficult to use or may not want to divulge personal details if these are required [ ]. Users may not even think of reporting their side effects because they consider them an inherent experience of cannabis consumption, especially if they are not using an approved medical cannabis product.Increasing the understanding of the efficacy and safety of cannabis as medicine is warranted because cannabis is a nonstandardized product, given the wide variety in growing conditions and production specifications [
]. This includes variations in climate, soil (or other growth media), water, light, and other factors that affect plant growth. Even if cannabis medicines in a country or state must adhere to mandatory standards (good manufacturing practice), some cannabis users prefer to grow or import their own cannabis [ ]. These factors make the systematic assessment of the effectiveness of medical cannabis and its side effects difficult.Social Media as a Pharmacovigilance Data Source
To gain additional insights into cannabis use and its effects, researchers are now turning to social media and web-based health forums. These platforms are a place for both patients and the general population to freely express and exchange their experiences and thus provide a valuable additional data source for monitoring public health [
]. Unlike other forms of highly curated data collection methods, such as surveys or interviews, social media provides an organic view of everyday thoughts, behaviors, and activities of people. Therefore, social media has the potential to provide insights beyond the boundaries of targeted investigations, including emergent events, observations of behavioral phenomena and subcultures, and insights for the social sciences [ ].The information contained in social media conversations is voluminous and not only potentially rich in content but also complex and varied. As an unstructured raw data source, credible information may be sparse and difficult to identify; there may be uncertainty about the origin of the data or the population they represent [
]. Furthermore, it is difficult to interpret the informal language and structure of social media posts, which are confounded by many competing sources, such as promotional posts, hashtags, and social media bots [ , ]. Social media bots automatically create content and interact with social media platform users [ ]. A study found that between 9% and 15% of Twitter accounts are bots [ ]. Notwithstanding these limitations, if these complexities can be successfully navigated, social media has the potential to be a great asset for increased understanding of cannabis as a medicine.Our previous systematic scoping review [
] used PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines [ ] to understand the utility of web-based user-generated text in providing insight into the use of cannabis as a medicine. This paper examines the techniques, analyses, and limitations of these studies.The objective of this research was to provide a review of studies that have used user-generated data in conjunction with computational methods to understand the medicinal use of cannabis in a population. We addressed the following research questions (RQs):
- RQ1: What consumer-generated data sources are used for studying cannabis?
- RQ2: What common techniques for collection and analysis of data are used?
- RQ3: What are the common limitations and challenges faced by the studies?
Methods
We searched for English-language studies that were indexed in MEDLINE, Embase, Web of Science, and Scopus databases and published between January 2010 and March 2022. Literature database queries were developed for these 4 databases. See Table S1 in
[ - ] for the details of search terms used and Table S2 for the inclusion and exclusion criteria of the selected articles. A summary of the PRISMA flowchart is shown in [ ].Results
Overview
provides a summary of each article that includes author names, publication year, data source, and duration of data collection, analysis, and number of items analyzed.
The year with the highest number of publications was 2020 (11/42, 26%), followed by 2017 and 2021 (6/40, 14%). Of the 42 studies, 6 (12%) were conducted in 2015 and 2019. The number of publications per year is shown in
.Regarding data sources, Twitter was used in 40% (17/42) of the reviewed studies, around 3 times the number of studies using either Reddit or web-based forums 14% (6/42). GoFundMe, YouTube, and Google Trends comprised 7% (3/42) of the total. Text was the focus of 83% (35/42) of the studies, whereas the others analyzed trends, videos, search logs, and images.
shows the distribution of the publications selected per data source.Study | Source (duration) | Analysis | Number of items analyzed |
McGregor et al [ | ], 2014Web-based forums, Facebook, Twitter, and YouTube (not available) |
| 3785 items |
Cavazos-Rehg et al [ | ], 2015Twitter (February to March 2014) |
| 7000 tweets |
Daniulaityte et al [ | ], 2015Twitter (October to December 2014) |
| 125,255 tweets (27,018 geolocated tweets) |
Gonzalez-Estrada et al [ | ], 2015YouTube (June 4-8, 2014) |
| 200 most viewed videos |
Krauss et al [ | ], 2015YouTube (January 22, 2015) |
| 116 videos |
Thompson et al [ | ], 2015Twitter (March 2012 to July 2013) |
| 36,939 original tweets and 10,000 retweets |
Cavazos-Rehg et al [ | ], 2016Twitter (January 2015) |
| 5000 tweets |
Lamy et al [ | ], 2016Twitter (May to July 2015) |
| 3000 tweets |
Mitchell et al [ | ], 2016Web-based forums (October 2014) |
| 268 threads |
Andersson et al [ | ], 2017Web-based forums (April 18-19, 2016) |
| 32 topics |
Dai and Hao [ | ], 2017Twitter (August 2015 to April 2016) |
| 66,000 cannabis-related and 31,184 geolocated tweets |
Greiner et al [ | ], 2017Web-based forums (November 2014 to March 2015) |
| 717 posts |
Turner and Kantardzic [ | ], 2017Twitter (August 2015 to April 2016) |
| 40,509 geolocated tweets |
Westmaas et al [ | ], 2017Web-based forums (January 2000 to December 2013) |
| 468,000 posts |
Yom Tov and Lev Ran [ | ], 2017Bing logs (November 2016 to April 2017) |
| Not available |
Cavazos-Rehg et al [ | ], 2018YouTube (June 10-11, 2015) |
| 83 videos |
Glowacki et al [ | ], 2018Twitter (August to October 2016) |
| 73,235 tweets |
Meacham et al [ | ], 2018Reddit (January 2010 to December 2016) |
| 400,000 posts |
Leas et al [ | ], 2019Google Trends (January 2004 to April 2019) |
| Not available |
Meacham et al [ | ], 2019Reddit (January 2017 to December 2019) |
| 193 questions |
Nasralah et al [ | ], 2019Twitter (January 2015 to February 2019) |
| 20,609 tweets |
Pérez-Pérez et al [ | ], 2019Twitter (February to August 2018) |
| 24,634 tweets |
Shi et al [ | ], 2019Google Trends and Buzzsumo (January 2011 to July 2018) |
| Not available |
Allem et al [ | ], 2020Twitter (May to December 2018) |
| 60,861 nonbot and 8874 bot tweets |
Janmohamed et al [ | ], 2020Blogs, news, forums, and <1% other (August 2019 to April 2021) |
| 4,027,172 documents or blogs |
Jia et al [ | ], 2020Google, Facebook, and YouTube (September 2019) |
| 51 Google websites, 126 Facebook posts, and 37 YouTube videos |
Leas et al [ | ], 2020Reddit (January 2014 to August 2019) |
| 104,917 posts |
Merten et al [ | ], 2020Pinterest (July 31, August 18, and September 1, 2018) |
| 1280 pins |
Mullins et al [ | ], 2020Twitter (June to July 2017) |
| 941 tweets |
Saposnik and Huber [ | ], 2020Google Trends (January 2004 to December 2019) |
| Not available |
Song et al [ | ], 2020GoFundMe (January 2012 to December 2019) |
| 1474 campaigns |
Tran and Kavuluru [ | ], 2020Reddit and or FDA comments (January to April 2019) |
| 64,099 Reddit and 3832 FDA comments |
van Draanen et al [ | ], 2020Twitter (January 2017 to June 2019) |
| 1,200,127 tweets |
Zenone et al [ | ], 2020GoFundMe (January 2017 to March 2019) |
| 155 campaigns |
Pang et al [ | ], 2021Twitter (December 2019 to December 2020) |
| 17,238 tweets |
Rhidenour et al [ | ], 2021Reddit (January 2008 to December 2018) |
| 974 posts |
Smolev et al [ | ], 2021Facebook (November 2018 to November 2019) |
| 7694 posts |
Soleymanpour et al [ | ], 2021Twitter (July 2019) |
| 2,200,000 tweets |
Zenone et al [ | ], 2021GoFundMe (June 2017 to May 2019) |
| 164 campaigns |
Turner et al [ | ] 2021Twitter (October 2019 to January 2020) |
| 167,755 personal 143,322 commercial tweets |
Allem et al [ | ], 2022Twitter (January to September 2020) |
| 353,353 tweets |
Meacham et al [ | ] 2022Reddit (December 2015 to August 2019) |
| 908 posts from opioid recovery subreddits and 4224 posts from opioid use subreddits |
aADHD: attention-deficit hyperactivity disorder.
bPTSD: posttraumatic stress disorder.
cCBD: cannabidiol.
dFDA: Food and Drug Administration.
Year | Count, n (%) |
2014 | 1 (2) |
2015 | 5 (12) |
2016 | 3 (7) |
2017 | 6 (14) |
2018 | 3 (7) |
2019 | 5 (12) |
2020 | 11 (26) |
2021 | 6 (14) |
2022 | 2 (5) |
Source | Count, n (%) |
17 (41) | |
6 (14) | |
Web-based forums | 6 (14) |
GoFundMe | 3 (7) |
YouTube | 3 (7) |
Google Trends | 3 (7) |
Google, Facebook, and YouTube | 1 (2) |
Bing Search Engine | 1 (2) |
1 (2) | |
1 (2) |
Social Media Data Collection Strategies
Some studies obtained all their associated data from a specific subreddit [
, , ] or a web-based forum [ ] and subsequently sampled the data. Of 42 studies, 1 (2%) Twitter study collected tweets using a geolocation boundary box and then filtered the data for cannabis-related keywords [ ].Keyword-based filtering was used by many studies. Terms used for filtering were either common expressions for cannabis from dictionaries, such as Urban Dictionary, or were based on similar research in this domain. Of the 42 studies,1 (2%) study [
] used Urban Dictionary and web forums to create a comprehensive list of 123 terms related to cannabis consumption. Another study [ ] first found all the terms related to marijuana by searching on Thesaurus.com and then used the word embedding likeness perusal software [ ] to generate synonyms.In a nonmedical cannabis-related study, word embeddings created from Twitter and Reddit data sets discovered synonyms and slang terms that could not be identified using other means. The study recommends this method of synonym discovery in advance for any data collection based on keyword filtering [
].Of the 42 studies, 3 (7%) studies were user focused, with data derived from specific highly influential users [
], opioid-dependent users [ ], or a US veteran-specific subreddit [ ].The largest data set manually annotated by the researchers was collected using cannabis-related keywords and consisted of 36,939 original tweets and 10,000 retweets [
]. Apart from that study, the average size of annotated data sets was approximately 1450 records. Of the 42 studies, 2 (5%) studies [ , ] used crowdsourcing services to annotate tweets, whereas the rest conducted in-house annotation. The duration of data collection ranged from 1 month to 6 years. Of the 42 studies, 2 (5%) of these studies made their annotated data available to other researchers [ , ].Types of Analysis
Overview
The studies included in this review used a variety of analytical methods, including qualitative analysis, quantitative content analysis, machine learning, rule-based, and statistical analysis. The types of analysis include sentiment assessment, thematic analysis, content analysis, named entity recognition, social networks, and geographic analysis. Table S3 in
summarizes the analyses.Discovering Themes
Themes were identified in 62% (26/42) of the studies. Manual coding of the themes was performed by 69% (18/26) of the studies, either by using pre-existing categories or by observing a sample of the data and generating a codebook [
, , , , , , , , , - , , - , ]. Of the 26 studies, 2 (7%) studies used the services of social media data analytics companies [ , ].Of the 26 studies, 4 (15%) studies used topic modeling to infer themes or topics [
, , , ]. The algorithm of choice for this task is the latent Dirichlet allocation [ ]. The choice of the number of topics was based on intrinsic evaluation metrics (eg, coherence and perplexity) and iterative qualitative analysis informed by prior experience with topic models. Of the 26 studies, 1 (4%) study used temporal topic modeling techniques to study changes in topics over time, with the goal of analyzing how web-based vaping narratives changed during the COVID-19 pandemic [ ].Of the 26 studies, 1 (4%) study identified themes by using rule-based methods. Frequency counts of the most common unigrams and bigrams were generated and formed the basis of the topics [
]. Another study used SAS Text Miner software, a text-topic node algorithm, to discover topics [ ].Demographic Analysis
Socioeconomic and demographic analyses of the study population were performed in 26% (11/42) of the studies. Of the 11 studies, 2 (27%) studies used the provided gender, age, and other user characteristics from user profiles or inferred from posts by users [
, ]. Of the 11 studies, 2 (27%) video-based studies used the perceived age and gender of the subjects after observing the videos [ , ].Of the 11 studies, 2 (18%) studies that used social media analytics providers obtained age and gender data by using the supplied analysis [
, ]. Of the 11 studies, 2 (18%) of the Twitter-based studies used a commercial tool called DemographicsPro, which uses proprietary algorithms to infer user demographic characteristics [ , ]. Other studies used existing census data [ ], demographic information obtained from survey data [ ], and a 2-step method based on a gender-name lexicon and a face recognition algorithm applied to users’ profile information to identify the users’ gender [ ].Geographic Analysis
Geolocation data analysis was performed in 40% (17/42) of the studies. User profiles or message metadata were used in 52% (9/17) of the studies [
, , , , , , , , ]. Of the 17 studies, 2 (12%) studies used information provided by social media analytics companies [ , ]. The DemographicsPro tool was used in 5% (1/17) of studies [ ]. Of the 17 studies, 3 (17%) studies used location information provided by Google Trends [ , , ]. Another (1/17, 5%) study collected geographical information from survey data [ ]. Of the 17 studies,1 (5%) video-based study used the geographic location of video channels [ ].Sentiment Assessment
An individual’s perception of a topic can be characterized as having a positive, negative, or neutral sentiment. The analysis of these sentiments is often performed using automated language tools and is named “sentiment analysis” [
].Out of the 12 studies that performed sentiment analysis, 5 (42%) used automated methods. Of the 12 studies, 1 (8%) study trained a binary Naive Bayes classifier on a sample of 1000 “marijuana” related tweets to classify posts into 2 opinion polarities, positive and negative or neutral [
]. Another study used sentiment analysis provided by a social media analytics company [ ]. Of the 12 studies, 3 (25%) studies used Valence Aware Dictionary and Sentiment Reasoner (VADER) [ ], a lexicon and rule-based sentiment analysis tool [ , , ]. The VADER performance was compared with in-house machine learning classifiers trained on 3000 manually coded cannabis-related tweets, which showed a 30% performance improvement over VADER. Although VADER is widely used for general tweet sentiment analysis, its performance suffers in substance-use-related domains where negative words are often used to carry positive sentiments. For example, “I took CBD oil, that stuff was bad” [ ]—in this sentence, “bad” actually means good.User Analysis
For conducting user analysis, 57% (24/42) of the studies examined either the subject of the posts, as from individuals or others (ie, from self, retail, media, or professionals), or who the post was about (self, others, or general) [
, , - , , , - , , - , , , , , - ].When manual data labeling was performed, the determination of both the poster and subject of the post was part of the labeling process. Self-reporting and self-use were easily determined by observation of videos, as were most texts based on the structure of the language. For example, a study [
] first identified whether the subject of the tweet was about the self, other, or general and then identified whether the tweets were about actual cannabis use. This study included further categories of tone, related behavior, perceived impact, and social context. Automated labeling approaches look for phrases that indicate self-reporting. For example, a study on opioid addiction [ ] looked for phrases such as “I am addicted” and “I have been addicted” in the context of opioid mentions. Classifiers were used in another study [ ] to separate marketing tweets from nonmarketing tweets; however, their focus was on marketing tweets.None of the studies used advanced natural language processing techniques to establish subjects and personal mentions. Social media bots are automated accounts that generate artificial activities on social media platforms [
]. Bot detection was used in only 4% (1/24) of studies, which used Twitter as a data source [ ].Other Analyses
Of the 42 studies, 2 (5%) studies examined the social networks of contributors to conversations. This allowed the identification of target communities and user interactions [
, ]. Of the 42 studies, 3 (7%) studies examined the impact of governmental cannabis legalization policies on the sentiments and opinions of people or on the volume of social media posts [ , , ]. Term frequency and count analysis of words and phrases was performed in 12% (5/42) of studies [ , , , , ].Ethical Considerations
Institutional review boards (or their equivalents) ensured that research using human participants is conducted in an ethical manner [
]. Approval for and overseeing of a study by an institutional review board ensures that researchers adopt an ethically appropriate research protocol that respects the rights and interests of social media users; 62% (26/42) of the studies mentioned an ethics approval review being sought or the study being exempt from ethics requirements. There was no mention of ethics approval in 38% (16/42) of the studies.External Validity
The use of standard reporting systems, such as the US Food and Drug Administration reports, helps to assess whether social media research findings can be generalized to real-world data. When a suitable ground-truth data set is not available, validating results against >1 social media platform improves the generalizability and validity of the results. Only a few studies used >1 social media data source or validated their findings against other data sources. Of the 42 studies, 2 (5%) studies used Food and Drug Administration data as an external ground-truth data source to validate their results [
, ]. Of the 42 studies, 1 (2%) study analyzed several web-based forums [ ], and 2 (5%) other studies used several social media platforms as their data sources [ , ].Discussion
In this study, we reviewed the technical aspects of peer-reviewed published works that used social media and other forms of user-generated data to understand the medicinal use of cannabis. All the studies concluded that these consumer-generated data sources are useful and provide a complementary resource for studying cannabis and medical conditions for which cannabis is used.
Principal Findings
The findings of this study are presented by answering the RQs.
RQ1: What Consumer-Generated Data Sources Are Used for Studying Cannabis?
Sources of consumer-generated data for cannabis research used by the reviewed studies include social media platforms, such as Twitter, Reddit, and YouTube; search queries, including Google Trend and Bing query logs; and web-based forums, crowdfunding platforms, blogs, and websites. Twitter was used in most of the studies. One of the studies concluded that, compared with unmoderated platforms, moderated sites focused more on evidence-based information and controlled misleading content [
].RQ2: What Common Techniques for Collection and Analysis of Data Are Used?
Some studies have used social media analytics companies for some or all of their data collection and processing tasks. Other studies used application program interfaces to interact with Twitter and Reddit. Although Facebook allows researchers to access public posts from public pages through a dedicated platform [
], 2% (1/42) of studies [ ] analyzed private Facebook posts—the method used to obtain data was not reported.Approximately half of the studies used data sets of <8000 records and many of them used 1000 records. These studies either focused on understanding the characteristics and needs of users or the quality of information on the web, or they were directed by an RQ such as “Are individuals using CBD for diagnoseable conditions which have evidence-based therapies?” These analyses play a critical role in understanding the domain but are difficult to replicate and generalize.
More recent neural network–based natural language processing techniques have not been used in the studies in this review. These modern machine learning methods have the advantage that they require minimal data preparation and are characterized by the capacity to learn the nuance of language. However, to function effectively, they typically require high-quality annotated data—a scarce and expensive resource. Textual social media data are highly amenable to these techniques. Creating and sharing deidentified annotated data sets for this purpose should be encouraged within appropriate ethical, regulatory, and legal frameworks [
].RQ3: What Are the Common Limitations and Challenges Faced by the Studies?
These limitations are mentioned in order of frequency.
Sample Representativeness
Most research on social media uses samples of available data. However, the extent to which the data samples are representative of the general population is often unclear. The limiting factors mentioned in these studies include sampling bias that is introduced as a result of the choice of keywords, data collection duration, and population biases.
Population biases often refer to the demographic composition of people using social media platforms being different from the general population and the difficulties in determining the demographic characteristics of users. Accessing accurate geographical locations has also been mentioned in previous studies. Obtaining these data is limited because even when users explicitly include demographic information (eg, with Facebook) or geographical information in their posts or profiles, these may be fabricated.
The choice of platform itself also imposes limitations. For instance, platform-specific features, such as sampling strategies, limit the amount of data that can be collected and the behavior and conversation of users depending on the platform or context. Of the 42 studies,1 (2%) study mentioned that the forums they investigated could be very procannabis and are likely inhabited by more experienced cannabis users [
]. Another study stated that individuals posting on YouTube about cannabis are likely to seek social networking opportunities [ ].Complications also arise because platform-specific algorithms spot and further promote popular themes and users to deliberately manage behavior and attract more platform engagement. This needs to be ameliorated by detecting and accounting for the algorithms and potentially by sampling from >1 platform.
Methodology Constraints
The use of small data sets by some of the studies impacts the generalizability of the results, and some of the researchers acknowledged this and indicated a plan to replicate their studies with more data and the use of automated methods. Consequently, we observed that although such studies may be sampling social media data for hypothesis generation, they do not leverage one of the most important features of social media data, which is the ability to observe the continuous generation of big data to create long-term data-centric insights [
].Biases that could have been introduced by the choice of theme were also mentioned in the studies. Most researchers have attempted to mitigate this by creating annotation guidelines, having >1 person labeling data, and resolving disagreements.
Actual Use Detection
A limitation mentioned by several studies is that web-based search activities and social media posts containing cannabis-related keywords do not necessarily represent the actual use of cannabis by the poster. Depending on the context and goals of the research (for instance, if the research seeks to study a cannabis-consuming population), advanced text processing techniques are required to establish when personal cannabis use can be inferred. For such studies, establishing its use should be a crucial initial step. However, the detection of personal use is challenging, especially in the informal, diverse, and specialized language used by niche communities.
Source Identification
Identifying the source of posts (ie, whether they were generated organically by individual users or by organizations or bots) was a commonly mentioned limitation. Content generated by health and commercial organizations, power users, and nonindividual accounts was understood to comprise a considerable amount of social media post volume on the web.
Limitations
This review used 4 literature databases in the search process to allow the maximum coverage of existing publications. However, we cannot be certain that we have covered all relevant publications. The choice of keywords for the literature search could also have impacted capturing all the relevant studies in this domain, for instance, infodemiology and infoveillance were not in the keywords. Articles included in this study were selected following a systematic approach and underwent a bias assessment for quality; however, biases could not be completely avoided. This study was also limited to English-language articles.
Conclusions
The number of studies in this field has steadily increased over the last few years. Social media conversations are wide ranging and offer opportunities for insights that cannot be obtained through formal information gathering. Researchers have realized the value of social media conversations as a place for users to freely express their experiences and concerns without risking judgment or penalty and that social media is the natural forum for many users of cannabis as medicine to share their insights into the benefits and issues they experience and perceive.
Manual qualitative analysis, statistical analysis, supervised and unsupervised machine learning, and rule-based methods are among methodologies used in these studies. Analyses of social media data that are limited to small data samples, although providing an effective means of hypothesis generation, are difficult to reliably reproduce and generalize. Where possible, the sharing of high-quality deidentified annotated data to allow the use of generalizable analytical techniques should be encouraged to advance this field.
To improve their validity and generalizability, studies could add additional social media data sources and check their results against established reporting systems. Studies could take advantage of emerging data analysis strategies that leverage big data, such as deep learning and transfer-learning-based approaches.
Acknowledgments
This review was supported by the Australian Centre for Cannabinoid Clinical and Research Excellence, funded by the National Health and Medical Research Council through the Centre of Research Excellence scheme (NHMRC CRE APP1135054).
Conflicts of Interest
None declared.
Supporting information (review keywords, inclusion and exclusion criteria, papers summary).
OCX File , 45 KBReferences
- Li HL. An archaeological and historical account of cannabis in China. Econ Bot. Oct 1973;28(4):437-448. [FREE Full text] [CrossRef]
- Hallinan CM, Gunn JM, Bonomo YA. Implementation of medicinal cannabis in Australia: innovation or upheaval? Perspectives from physicians as key informants, a qualitative analysis. BMJ Open. Oct 22, 2021;11(10):e054044. [FREE Full text] [CrossRef] [Medline]
- Banerjee S, McCormack S. Medical Cannabis for the Treatment of Chronic Pain: A Review of Clinical Effectiveness and Guidelines. Ottawa, Canada. Canadian Agency for Drugs and Technologies in Health; 2019.
- Kleeman-Forsthuber LT, Dennis DA, Jennings JM. Medicinal cannabis in orthopaedic practice. J Am Acad Orthop Surg. May 01, 2020;28(7):268-277. [CrossRef] [Medline]
- Pawliuk C, Chau B, Rassekh SR, McKellar T, Siden HH. Efficacy and safety of paediatric medicinal cannabis use: a scoping review. Paediatr Child Health. Jul 2021;26(4):228-233. [FREE Full text] [CrossRef] [Medline]
- Pratt M, Stevens A, Thuku M, Butler C, Skidmore B, Wieland LS, et al. Benefits and harms of medical cannabis: a scoping review of systematic reviews. Syst Rev. Dec 10, 2019;8(1):320. [FREE Full text] [CrossRef] [Medline]
- Martin JH, Lucas C. Reporting adverse drug events to the Therapeutic Goods Administration. Aust Prescr. Mar 2021;44(1):2-3. [FREE Full text] [CrossRef] [Medline]
- EudraVigilance. European MedicinesAgency. URL: https://www.ema.europa.eu/en/human-regulatory/research-development/pharmacovigilance/eudravigilance [accessed 2022-05-24]
- FDA Adverse Event Reporting System (FAERS) Public Dashboard. U.S. Food & Drug Administration. 2021. URL: https://tinyurl.com/yh22mc2c [accessed 2022-05-24]
- Al Dweik R, Stacey D, Kohen D, Yaya S. Factors affecting patient reporting of adverse drug reactions: a systematic review. Br J Clin Pharmacol. Apr 2017;83(4):875-883. [FREE Full text] [CrossRef] [Medline]
- Chandra S, Lata H, ElSohly MA, Walker LA, Potter D. Cannabis cultivation: methodological issues for obtaining medical-grade product. Epilepsy Behav. May 2017;70(Pt B):302-312. [CrossRef] [Medline]
- Hakkarainen P, Frank VA, Barratt MJ, Dahl HV, Decorte T, Karjalainen K, et al. Growing medicine: small-scale cannabis cultivation for medical purposes in six different countries. Int J Drug Policy. Mar 2015;26(3):250-256. [CrossRef] [Medline]
- Paul MJ, Dredze M. Social monitoring for public health. In: Marchionini G, editor. Synthesis Lectures on Information Concepts, Retrieval, and Services. San Rafael, CA, USA. Morgan and Claypool Publishers; Aug 31, 2017;1-183.
- Bode L, Davis-Kean P, Singh L, Berger-Wolf T, Budak C, Chi G, et al. Study designs for quantitative social science research using social media. PsyArXiv. 2020.:1-27. [CrossRef]
- Mneimneh Z, Pasek J, Singh L, Best R, Bode L, Bruch E, et al. Data acquisition, sampling, and data preparation considerations for quantitative social science research using social media data. PsyArXiv. Mar 15, 2021.:1-45. [CrossRef]
- Conway M, Hu M, Chapman WW. Recent advances in using natural language processing to address public health research questions using social media and consumergenerated data. Yearb Med Inform. Aug 2019;28(1):208-217. [FREE Full text] [CrossRef] [Medline]
- Allem JP, Ferrara E. Could social bots pose a threat to public health? Am J Public Health. Aug 2018;108(8):1005-1006. [CrossRef] [Medline]
- Ferrara E, Varol O, Davis C, Menczer F, Flammini A. The rise of social bots. Commun ACM. Jun 24, 2016;59(7):96-104. [CrossRef]
- Varol O, Ferrara E, Davis CA, Menczer F, Flammini A. Online human-bot interactions: detection, estimation, and characterization. arXiv. 2017.
- Khademi Habibabadi S, Bonomo YA, Conway M, Hallinan CM. Social media discourse and internet search queries on cannabis as a medicine: A systematic scoping review. medRxiv. 2022. [CrossRef]
- Liberati A, Altman DG, Tetzlaff J, Mulrow C, Gøtzsche PC, Ioannidis JP, et al. The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate healthcare interventions: explanation and elaboration. BMJ. Jul 21, 2009;339:b2700. [FREE Full text] [CrossRef] [Medline]
- McGregor F, Somner JE, Bourne RR, Munn-Giddings C, Shah P, Cross V. Social media use by patients with glaucoma: what can we learn? Ophthalmic Physiol Opt. Jan 2014;34(1):46-52. [CrossRef] [Medline]
- Cavazos-Rehg PA, Krauss M, Fisher SL, Salyer P, Grucza RA, Bierut LJ. Twitter chatter about marijuana. J Adolesc Health. Mar 2015;56(2):139-145. [FREE Full text] [CrossRef] [Medline]
- Daniulaityte R, Nahhas RW, Wijeratne S, Carlson RG, Lamy FR, Martins SS, et al. "Time for dabs": analyzing Twitter data on marijuana concentrates across the U.S. Drug Alcohol Depend. Oct 01, 2015;155:307-311. [FREE Full text] [CrossRef] [Medline]
- Gonzalez-Estrada A, Cuervo-Pardo L, Ghosh B, Smith M, Pazheri F, Zell K, et al. Popular on YouTube: a critical appraisal of the educational quality of information regarding asthma. Allergy Asthma Proc. 2015;36(6):e121-e126. [CrossRef] [Medline]
- Krauss MJ, Sowles SJ, Mylvaganam S, Zewdie K, Bierut LJ, Cavazos-Rehg PA. Displays of dabbing marijuana extracts on YouTube. Drug Alcohol Depend. Oct 01, 2015;155:45-51. [FREE Full text] [CrossRef] [Medline]
- Thompson L, Rivara FP, Whitehill JM. Prevalence of marijuana-related traffic on Twitter, 2012-2013: a content analysis. Cyberpsychol Behav Soc Netw. Jul 2015;18(6):311-319. [FREE Full text] [CrossRef] [Medline]
- Cavazos-Rehg PA, Sowles SJ, Krauss MJ, Agbonavbare V, Grucza R, Bierut L. A content analysis of tweets about high-potency marijuana. Drug Alcohol Depend. Oct 01, 2016;166:100-108. [FREE Full text] [CrossRef] [Medline]
- Lamy FR, Daniulaityte R, Sheth A, Nahhas RW, Martins SS, Boyer EW, et al. "Those edibles hit hard": exploration of Twitter data on cannabis edibles in the U.S. Drug Alcohol Depend. Jul 01, 2016;164:64-70. [FREE Full text] [CrossRef] [Medline]
- Mitchell JT, Sweitzer MM, Tunno AM, Kollins SH, McClernon FJ. "I Use Weed for My ADHD": a qualitative analysis of online forum discussions on cannabis use and ADHD. PLoS One. May 26, 2016;11(5):e0156614. [FREE Full text] [CrossRef] [Medline]
- Andersson M, Persson M, Kjellgren A. Psychoactive substances as a last resort-a qualitative study of self-treatment of migraine and cluster headaches. Harm Reduct J. Sep 05, 2017;14(1):60. [FREE Full text] [CrossRef] [Medline]
- Dai H, Hao J. Mining social media data on marijuana use for Post Traumatic Stress Disorder. Comput Human Behav. May 2017;70(C):282-290. [CrossRef]
- Greiner C, Chatton A, Khazaal Y. Online self-help forums on cannabis: a content assessment. Patient Educ Couns. Oct 2017;100(10):1943-1950. [CrossRef] [Medline]
- Turner J, Kantardzic M. Geo-social analytics based on spatio-temporal dynamics of marijuana-related tweets. In: Proceedings of the 2017 International Conference on Information System and Data Mining. 2017. Presented at: ICISDM '17; April 1-3, 2017;28-38; Charleston, SC, USA. [CrossRef]
- Westmaas JL, McDonald BR, Portier KM. Topic modeling of smoking- and cessation-related posts to the American Cancer Society's Cancer Survivor Network (CSN): implications for cessation treatment for cancer survivors who smoke. Nicotine Tob Res. Aug 01, 2017;19(8):952-959. [CrossRef] [Medline]
- Yom-Tov E, Lev-Ran S. Adverse reactions associated with cannabis consumption as evident from search engine queries. JMIR Public Health Surveill. Oct 26, 2017;3(4):e77. [FREE Full text] [CrossRef] [Medline]
- Cavazos-Rehg PA, Krauss MJ, Sowles SJ, Murphy GM, Bierut LJ. Exposure to and content of marijuana product reviews. Prev Sci. Feb 2018;19(2):127-137. [FREE Full text] [CrossRef] [Medline]
- Glowacki EM, Glowacki JB, Wilcox GB. A text-mining analysis of the public's reactions to the opioid crisis. Subst Abus. 2018;39(2):129-133. [CrossRef] [Medline]
- Meacham MC, Paul MJ, Ramo DE. Understanding emerging forms of cannabis use through an online cannabis community: an analysis of relative post volume and subjective highness ratings. Drug Alcohol Depend. Jul 01, 2018;188:364-369. [FREE Full text] [CrossRef] [Medline]
- Leas EC, Nobles AL, Caputi TL, Dredze M, Smith DM, Ayers JW. Trends in Internet searches for cannabidiol (CBD) in the United States. JAMA Netw Open. Oct 02, 2019;2(10):e1913853. [FREE Full text] [CrossRef] [Medline]
- Meacham MC, Roh S, Chang JS, Ramo DE. Frequently asked questions about dabbing concentrates in online cannabis community discussion forums. Int J Drug Policy. Dec 2019;74:11-17. [FREE Full text] [CrossRef] [Medline]
- Nasralah T, El-gayar O, Wang Y. What social media can tell us about opioid addicts: Twitter data case analysis. In: Proceedings of the 25th Americas' Conference on Information Systems. 2019. Presented at: AMCIS '19; August 15-17, 2019;15; Cancún, Mexico.
- Pérez-Pérez M, Pérez-Rodríguez G, Fdez-Riverola F, Lourenço A. Using Twitter to understand the human bowel disease community: exploratory analysis of key topics. J Med Internet Res. Aug 15, 2019;21(8):e12610. [FREE Full text] [CrossRef] [Medline]
- Shi S, Brant A, Sabolch A, Pollom E. False news of a cannabis cancer cure. Cureus. Jan 19, 2019;11(1):e3918. [FREE Full text] [CrossRef] [Medline]
- Allem JP, Escobedo P, Dharmapuri L. Cannabis surveillance with Twitter data: emerging topics and social bots. Am J Public Health. Mar 2020;110(3):357-362. [CrossRef] [Medline]
- Janmohamed K, Soale AN, Forastiere L, Tang W, Sha Y, Demant J, et al. Intersection of the Web-based vaping narrative with COVID-19: topic modeling study. J Med Internet Res. Oct 30, 2020;22(10):e21743. [FREE Full text] [CrossRef] [Medline]
- Jia JS, Mehran N, Purgert R, Zhang QE, Lee D, Myers JS, et al. Marijuana and glaucoma: a social media content analysis. Ophthalmol Glaucoma. 2021;4(4):400-404. [CrossRef] [Medline]
- Leas EC, Hendrickson EM, Nobles AL, Todd R, Smith DM, Dredze M, et al. Self-reported cannabidiol (CBD) use for conditions with proven therapies. JAMA Netw Open. Oct 01, 2020;3(10):e2020977. [FREE Full text] [CrossRef] [Medline]
- Merten JW, Gordon BT, King JL, Pappas C. Cannabidiol (CBD): perspectives from pinterest. Subst Use Misuse. 2020;55(13):2213-2220. [CrossRef] [Medline]
- Mullins CF, Ffrench-O'Carroll R, Lane J, O'Connor T. Sharing the pain: an observational analysis of Twitter and pain in Ireland. Reg Anesth Pain Med. Aug 2020;45(8):597-602. [CrossRef] [Medline]
- Saposnik FE, Huber JF. Trends in Web searches about the causes and treatments of autism over the past 15 years: exploratory infodemiology study. JMIR Pediatr Parent. Dec 07, 2020;3(2):e20913. [FREE Full text] [CrossRef] [Medline]
- Song S, Cohen AJ, Lui H, Mmonu NA, Brody H, Patino G, et al. Use of GoFundMe ® to crowdfund complementary and alternative medicine treatments for cancer. J Cancer Res Clin Oncol. Jul 2020;146(7):1857-1865. [CrossRef] [Medline]
- Tran T, Kavuluru R. Social media surveillance for perceived therapeutic effects of cannabidiol (CBD) products. Int J Drug Policy. Mar 2020;77:102688. [FREE Full text] [CrossRef] [Medline]
- van Draanen J, Tao H, Gupta S, Liu S. Geographic differences in cannabis conversations on Twitter: infodemiology study. JMIR Public Health Surveill. Oct 05, 2020;6(4):e18540. [FREE Full text] [CrossRef] [Medline]
- Zenone M, Snyder J, Caulfield T. Crowdfunding cannabidiol (CBD) for cancer: hype and misinformation on GoFundMe. Am J Public Health. Oct 2020;110(S3):S294-S299. [CrossRef] [Medline]
- Pang RD, Dormanesh A, Hoang Y, Chu M, Allem JP. Twitter posts about cannabis use during pregnancy and postpartum:a content analysis. Subst Use Misuse. 2021;56(7):1074-1077. [CrossRef] [Medline]
- Rhidenour KB, Blackburn K, Barrett AK, Taylor S. Mediating medical marijuana: exploring how veterans discuss their stigmatized substance use on Reddit. Health Commun. Sep 2022;37(10):1305-1315. [CrossRef] [Medline]
- Smolev ET, Rolf L, Zhu E, Buday SK, Brody M, Brogan DM, et al. "Pill pushers and CBD oil"-a thematic analysis of social media interactions about pain after traumatic brachial plexus injury. J Hand Surg Glob Online. Jan 2021;3(1):36-40. [FREE Full text] [CrossRef] [Medline]
- Soleymanpour M, Saderholm S, Kavuluru R. Therapeutic claims in cannabidiol (CBD) marketing messages on Twitter. Proceedings (IEEE Int Conf Bioinformatics Biomed). Dec 2021;2021:3083-3088. [FREE Full text] [CrossRef] [Medline]
- Zenone MA, Snyder J, Crooks VA. What are the informational pathways that shape people's use of cannabidiol for medical purposes? J Cannabis Res. May 06, 2021;3(1):13. [FREE Full text] [CrossRef] [Medline]
- Allem JP, Majmundar A, Dormanesh A, Donaldson SI. Identifying health-related discussions of cannabis use on Twitter by using a medical dictionary: content analysis of tweets. JMIR Form Res. Mar 25, 2022;6(2):e35027. [FREE Full text] [CrossRef] [Medline]
- Turner J, Kantardzic M, Vickers-Smith R. Infodemiological examination of personal and commercial tweets about cannabidiol: term and sentiment analysis. J Med Internet Res. Dec 20, 2021;23(12):e27307. [FREE Full text] [CrossRef] [Medline]
- Meacham MC, Nobles AL, Tompkins DA, Thrul J. "I got a bunch of weed to help me through the withdrawals": naturalistic cannabis use reported in online opioid and opioid recovery community discussion forums. PLoS One. Feb 8, 2022;17(2):e0263583. [FREE Full text] [CrossRef] [Medline]
- Boyd RL. WELP: Word Embedding Likeness Perusal (v1.03). Ryan Boyd. 2018. URL: https://github.com/ryanboyd/WELP [accessed 2022-10-30]
- Adams N, Artigiani EE, Wish ED. Choosing your platform for social media drug research and improving your keyword filter list. J Drug Issues. Mar 13, 2019;49(3):477-492. [CrossRef]
- Blei DM, Ng AY, Jordan MI. Latent dirichlet allocation. J Mach Learn Res. 2003;3:993-1022. [FREE Full text] [CrossRef]
- Liu B, Zhang L. A survey of opinion mining and sentiment analysis. In: Aggarwal CC, Zhai C, editors. Mining Text Data. New York, NY, USA. Springer; 2012;415-463.
- Hutto CJ, Gilbert E. VADER: a parsimonious rule-based model for sentiment analysis of social media text. In: Proceedings of the 8th International AAAI Conference on Weblogs and Social Media. 2014. Presented at: AAAI '14; June 1-4, 2014; Ann Arbor, MI, USA.
- Daniulaityte R, Chen L, Lamy FR, Carlson RG, Thirunarayan K, Sheth A. "When 'bad' is 'good'": identifying personal communication and sentiment in drug-related tweets. JMIR Public Health Surveill. Oct 24, 2016;2(2):e162. [FREE Full text] [CrossRef] [Medline]
- Grady C. Institutional review boards: purpose and challenges. Chest. Dec 2015;148(5):1148-1155. [FREE Full text] [CrossRef] [Medline]
- CrowdTangle | Content Discovery and Social Monitoring Made Easy. URL: https://www.crowdtangle.com/ [accessed 2022-05-30]
- Gonzalez-Hernandez G, Sarker A, O'Connor K, Savova G. Capturing the patient's perspective: a review of advances in natural language processing of health-related text. Yearb Med Inform. Aug 2017;26(1):214-227. [FREE Full text] [CrossRef] [Medline]
- Sarker A, DeRoos A, Perrone J. Mining social media for prescription medication abuse monitoring: a review and proposal for a data-centric framework. J Am Med Inform Assoc. Feb 01, 2020;27(2):315-329. [FREE Full text] [CrossRef] [Medline]
Abbreviations
PRISMA: Preferred Reporting Items for Systematic Reviews and Meta-Analyses |
RQ: research question |
VADER: Valence Aware Dictionary and Sentiment Reasoner |
Edited by G Eysenbach; submitted 24.12.21; peer-reviewed by A Dormanesh, K O'Connor, J Thrul; comments to author 26.03.22; revised version received 16.06.22; accepted 27.07.22; published 16.11.22.
Copyright©Sedigheh Khademi Habibabadi, Christine Hallinan, Yvonne Bonomo, Mike Conway. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 16.11.2022.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.