Published on in Vol 20, No 11 (2018): November

Hookah-Related Posts to Twitter From 2017 to 2018: Thematic Analysis

Hookah-Related Posts to Twitter From 2017 to 2018: Thematic Analysis

Hookah-Related Posts to Twitter From 2017 to 2018: Thematic Analysis

Original Paper

1Keck School of Medicine of University of Southern California, Los Angeles, CA, United States

2Department of Computer Science, University of Southern California, Los Angeles, CA, United States

Corresponding Author:

Jon-Patrick Allem, PhD, MA

Keck School of Medicine of University of Southern California

2001 North Soto Street

Los Angeles, CA, 90032

United States

Phone: 1 8586030812


Background: Hookah (or tobacco waterpipe) use has recently become prevalent in the United States. The contexts and experiences associated with hookah use are unclear, yet such information is abundant via publicly available hookah users’ social media postings.

Objective: In this study, we utilized Twitter data to characterize Twitter users’ recent experiences with hookah.

Methods: Twitter posts containing the term “hookah” were obtained from April 1, 2017 to 29 March, 2018. Text classifiers were used to identify clusters of topics that tended to co-occur in posts (n=176,706).

Results: The most prevalent topic cluster was Person Tagging (use of @username to tag another Twitter account in a post) at 21.58% (38,137/176,706) followed by Promotional or Social Events (eg, mentions of ladies’ nights, parties, etc) at 20.20% (35,701/176,706) and Appeal or Abuse Liability (eg, craving, enjoying hookah) at 18.12% (32,013/176,706). Additional topics included Hookah Use Behavior (eg, mentions of taking a “hit” of hookah) at 11.67% (20,603/176,706), Polysubstance Use (eg, hookah use along with other substances) at 10.95% (19,353/176,706), Buying or Selling (eg, buy, order, purchase, sell) at 9.37% (16,552/176,706), and Flavors (eg, mint, cinnamon, watermelon) at 1.66% (2927/176,706). The topic Dislike of Hookah (eg, hate, quit, dislike) was rare at 0.59% (1043/176,706).

Conclusions: Social events, appeal or abuse liability, flavors, and polysubstance use were the common contexts and experiences associated with Twitter discussions about hookah in 2017-2018. Considered in concert with traditional data sources about hookah, these results suggest that social events, appeal or abuse liability, flavors, and polysubstance use warrant consideration as targets in future surveillance, policy making, and interventions addressing hookah.

J Med Internet Res 2018;20(11):e11669



Hookah (or tobacco waterpipe) use has recently grown in popularity in the United States, especially among youth and young adults [1,2]. While exposure to hookah smoke has similar health risks to that of combustible cigarettes [3,4], it is perceived as safer than cigarettes in certain vulnerable groups [5] and is subject to fewer regulations [6]. For example, hookah is offered in many flavors, whereas flavored cigarettes are banned in the United States.

Publicly accessible data from individuals who post information on social media websites (eg, Twitter, Instagram, YouTube) can be efficiently harnessed to quickly capture and describe the context of tobacco use [7-9]. Previous analyses of hookah- related posts to social media websites through the year 2017 provide some information about hookah-related contexts, including the importance and appreciation of stylized waterpipes [10,11], use of hookah in social settings [10], copromotion with alcohol [10], and primarily positive user experiences [12-16]. However, cultural trends, the tobacco product consumer marketplace, and tobacco product health policies are evolving constantly and rapidly. The contexts and experiences associated with hookah use rapidly change as well, making it important to provide up-to-date information on such issues to inform targets for surveillance, policy making, and interventions addressing hookah.

In this study, we demonstrate the utility of collecting data from Twitter to document and describe hookah-related conversations from 2017 to 2018. Our goal was to determine the public’s recent experiences with hookah including understanding the social and environmental contexts in which hookah use occurs. Twitter is used by 24% of US adults (23% of men, 24% of women, 24% of white individuals, 26% of African American individuals, and 20% of Hispanic individuals), with 46% of users on the platform daily [17]. Findings from this study should inform tobacco control policy and prevention efforts and demonstrate the utility in using Twitter data for rapid surveillance of health behaviors and tobacco-related products like hookah.

Data Collection

Twitter posts containing the term “hookah” (or “#hookah”) were obtained from Twitter’s Streaming Application Program Interface (API; the filtered stream using the Twitter4J library for collecting tweets with no gaps in the collection time) from April 1, 2017 to 29 March, 2018. There were a total of 963,954 posts during this time.

Data Processing

We removed retweets and non-English posts, resulting in 348,834 unique posts that were used for analysis. While the word waterpipe is used in academic papers and presentations to refer to hookah, it is uncommon for individuals to use this term on social media, and it was, therefore, not included in this study [18]. To clean the data, we removed tweets from accounts identified as social bots [19,20] using Botometer (also known as Bot or Not) [21], resulting in a final analytical sample of 176,706 tweets from 90,718 unique users.

The final sample was prepared for analysis, which included the process of basic normalization (eg, remove punctuation, lowercase all text), stop word removal (eg, the words “a” and “the”), normalization of Twitter user mentions (eg, “@janedoe” is converted to “@username”), lemmatization (eg, “cat,” “cats,” “cat’s,” are all converted to “cat”), and nonprintable character removal (eg, emojis) [13]. All analyses relied on public, anonymized data; adhered to the terms and conditions, terms of use, and privacy policies of Twitter; and were performed under Institutional Review Board approval from the authors’ university. To protect privacy, no tweets were reported verbatim in this report.

Topic Identification Methodology

Initially, we analyzed the tweets using word frequencies (of single words and double-word combinations, also known as one grams and bigrams) and visualized the data through word clouds to identify common topics (Multimedia Appendix 1). From this assessment, the authors came to an expert consensus on several topics including Person Tagging (eg, the use of @username to tag another Twitter account in a post), Buying or Selling (eg, words indicative of buying, selling, or purchasing hookah), Appeal or Abuse Liability (eg, words indicative of craving, wanting, needing, enjoying, and loving hookah), Hookah Use Behavior (eg, mentions of taking a “hit” of hookah or smoking hookah), Promotional or Social Events (eg, mentions of ladies’ nights, parties, etc), Polysubstance Use (eg, words indicative of alcohol, marijuana, or other substance use along with hookah), and Flavors (eg, use of the words “cinnamon,” “blueberry,” and “watermelon”; Textbox 1). In line with prior research [22,23], we looked for words and phrases that suggested Dislike of Hookah (eg, “don’t hookah” and “quit hookah”).

Next, we used Word2Vec, a language modeling technique developed by Google that allows users to learn text representations for creating text classifiers [24]. Word2Vec creates embeddings (eg, numerical representations of words that help capture meaning, semantic relationships, and context) for text by using each word in a corpus to predict the words that usually surround it. In other words, Word2Vec creates word embeddings where semantic relationships between words are preserved. One advantage of this technique is that words that are synonyms will have similar embeddings, whereas words that are antonyms will have dissimilar embeddings. Similarly, in the Word2Vec representations of words, the relationship between “king” and “queen” is equal to the relationship between “man” and “woman.”

We used Word2Vec to find similar words for the one grams and bigrams that we identified per topic in the word cloud stage. This process, along with visual inspection and manual edits, allowed us to expand our word list per topic by identifying words that appeared, in posts, in a similar context as our original keywords. For example, through this process we found that the words “crave,” “love,” “enjoy,” and “need” appeared in posts that were similar to posts that contained the words “want” and “hookah”.

Classification was done by checking for the presence of any one of the keywords (one grams and bigrams) in a tweet. If a tweet consisted of any of the keywords associated with a topic, the tweet was classified as part of that topic. In other words, we used a rule-based classification script written in Python where each tweet was checked for the presence of a specified set of n-grams representing a theme. For each analysis, we present findings in a confusion matrix where the diagonal line indicates the prevalence of a topic and the off-diagonal lines indicate topic overlap. For example, a hypothetical post such as “I’m craving hookah and a beer right now” could be classified under Appeal or Abuse Liability and Polysubstance Use. The number of posts containing both contents would be found at the intersection of the matrix for these 2 topics or at 2.14% (3824/176,706).

Themes and common words found in posts along with the word “hookah”; these words are meant to provide further context for each theme, are not exhaustive, and are listed in alphabetical order.

Person tagging

  • @username

Promotional events

  • Bar
  • Food
  • Friday
  • Lounge
  • Night
  • Party
  • Saturday

Appeal or abuse liability

  • Crave
  • Enjoy
  • Everyday
  • Get
  • Like
  • Love
  • Need
  • Want

Hookah use behavior

  • Hit
  • Pass
  • Puff
  • Smoke
  • Used

Polysubstance use

  • Alcohol
  • Beer
  • Blunt
  • Cigs
  • Cocktails
  • Drinks
  • JUUL
  • Liquor
  • Margaritas
  • Vodka
  • Weed
  • Wine
  • Vape

Buying or Selling

  • Bought Buy
  • Order
  • Paying
  • Purchase
  • Sell


  • Flavors
  • Mint
  • Cinnamon
  • Watermelon
  • Blueberry
  • Guava
  • Grape
  • Apple
  • Fruit
  • Peach
  • Orange
  • Mango
  • Candy
Textbox 1. Themes and common words found in posts along with the word “hookah”; these words are meant to provide further context for each theme, are not exhaustive, and are listed in alphabetical order.

The total coverage of the 8 topics that we identified constituted 65.45% (115,658/176,706) of all tweets in the corpus of tweets (Figure 1). The remaining 34.59% (61,048/176,706) of tweets were too varied to be classified into a single topic with meaningful coverage (coverage of each subsequent topic was less than 1% of the total tweets). The most prevalent topic was Person Tagging at 21.58% (38,137/176,706), followed by Promotional or Social Events at 20.20% (35,701/176,706), Appeal or Abuse Liability at 18.12% (32,013/176,706), and Hookah Use Behavior at 11.67% (20,603/176,706).

Figure 1. Prevalence of topics.
View this figure

About 10.95% (19,353/176,706) of the corpus was Polysubstance Use, while Buying or Selling comprised 9.37% (16,552/176,706) and Flavors comprised 1.66% (2927/176,706) of the tweets. The least common topic was Dislike of Hookah at 0.59% (1043/176,706). The most common topic overlap was between Person Tagging and Promotional or Social Events at 4.34% (7666/176,706), followed by Buying or Selling and Appeal or Abuse Liability at 4.12% (7276/176,706) and Promotional or Social Events and Appeal or Abuse Liability at 3.52% (6225/176,706).

Principal Findings

The topics identified in this study of hookah-related posts to Twitter from 2017 to 2018 provide several insights about the public’s recent experience with hookah. The most prevalent topic was Person Tagging or an individual Twitter user directly communicating to another user (a follower or friend) about hookah, while the most common topic overlap was Person Tagging and Promotional or Social Events. These findings demonstrate that Twitter users communicate shared values around, and experiences with, hookah. In other words, such posts may notify others about hookah-related events and align people into a community around hookah. Similarly, recent research characterizing JUUL-related posts to Twitter found instances of Person Tagging where posts suggested that people were notifying their friends of when they were using or purchasing JUUL-related products [22]. Collectively, these interpersonal communications suggest that people bond around tobacco-related products on Twitter and that there may be co-use of tobacco among many people or social influences in which one person motivates another to use tobacco.

Hookah Use Behavior and Polysubstance Use were identified as topics of discussion and may represent a syndrome of risky behavior among select Twitter users. These findings are in line with earlier research on hookah posts to Tumblr [18] and Instagram [10] as well as survey-based research that demonstrated that those who use hookah were significantly more likely to use other substances including alcohol, cigarettes, marijuana, and cocaine compared with those who refrained from hookah use [25]. Individuals who combine the use of hookah with other substances may be at risk for substance misuse; for example, hookah use facilitates greater intake of alcohol and vice versa [26].

Posts in this study reflected Twitter users’ interest in flavors, which is similar to earlier research on tobacco-related post to Twitter [22,27]. A recent study identified that flavors were a common reason for hookah use among a nationally representative sample of young adults (aged 18-24 years) [28]. Research has also documented that flavored tobacco products like hookah are perceived to be less harmful than cigarettes [29]. Restricting flavors, such as those identified in this study (Cinnamon, Watermelon, Blueberry, etc), to reduce the appeal of hookah may be a policy consideration to explore in the future.

Many posts found in this study reflected that Twitter users craved, enjoyed, or wanted hookah; this finding, when coupled with the finding that posts indicative of disliking hookah were rare, suggests that there is a current need for targeted interventions to discourage the appeal of hookah use. The common discussions about hookah’s appeal may help normalize hookah use on Twitter, which may have consequences for offline behaviors [30].


This study focused on posts to Twitter, and findings may not generalize to other social media platforms. The posts analyzed in this study were collected from a 12-month period and may not generalize to other time periods. While only one root word “hookah” (or “#hookah”) was used in data collection, research has indicated that this is the common term to refer to waterpipe use on social media [10,13,18]. Data collection relied on Twitter’s Streaming API, which prevented collection of tweets from private Twitter accounts. As a result, findings may not represent the attitudes and behaviors of individuals with private accounts.


Social events, appeal or abuse liability, flavors, and polysubstance use were common contexts and experiences associated with Twitter discussions about hookah in 2017-2018. Considered in concert with traditional data sources about hookah, these results suggest that social events, appeal or abuse liability, flavors, and polysubstance use warrant consideration as targets in future surveillance, public policy, and interventions addressing hookah. This study also highlights a clear benefit of using social media data in public health surveillance. Data from social media can serve as an ongoing system to inform public health researchers about tobacco products or ways in which these products are used by the public in near real time.


Research reported in this publication was supported by Grant #P50CA180905 from the National Cancer Institute and the Food and Drug Administration (FDA) Center for Tobacco Products. The National Institutes of Health (NIH) or FDA had no role in study design, collection, analysis, and interpretation of data; writing the report; and the decision to submit the report for publication. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH or FDA.

Authors' Contributions

JPA and LD conceived the study and analyzed the data. JPA drafted the initial manuscript. LD, AML, TBC, and JBU revised the manuscript for important intellectual content and approved the final manuscript. JBU and TBC received funding for the study.

Conflicts of Interest

None declared.

Multimedia Appendix 1

Hookah word clouds.

PDF File (Adobe PDF File), 2MB

  1. Salloum RG, Asfar T, Maziak W. Toward a Regulatory Framework for the Waterpipe. Am J Public Health 2016 Dec;106(10):1773-1777. [CrossRef] [Medline]
  2. Allem JP, Unger JB. Emerging adulthood themes and hookah use among college students in Southern California. Addict Behav 2016 Dec;61:16-19 [FREE Full text] [CrossRef] [Medline]
  3. El-Zaatari ZM, Chami HA, Zaatari GS. Health effects associated with waterpipe smoking. Tob Control 2015 Mar;24 Suppl 1:i31-i43 [FREE Full text] [CrossRef] [Medline]
  4. Maziak W. The global epidemic of waterpipe smoking. Addict Behav 2011;36(1-2):1-5 [FREE Full text] [CrossRef] [Medline]
  5. Mohammed KA, Geneus CJ, Yadgir S, Subramaniam DS, Burroughs TE. Correlates of Hookah Pipe Awareness and Perceived Harmfulness Among U.S. Adults. Am J Prev Med 2017 Apr;52(4):513-518. [CrossRef] [Medline]
  6. Primack BA, Hopkins M, Hallett C, Carroll MV, Zeller M, Dachille K, et al. US health policy related to hookah tobacco smoking. Am J Public Health 2012 Sep;102(9):e47-e51 [FREE Full text] [CrossRef] [Medline]
  7. Ayers JW, Leas EC, Allem JP, Benton A, Dredze M, Althouse BM, et al. Why do people use electronic nicotine delivery systems (electronic cigarettes)? A content analysis of Twitter, 2012-2015. PLoS One 2017;12(3):e0170702 [FREE Full text] [CrossRef] [Medline]
  8. Allem JP, Escobedo P, Chu K, Boley CT, Unger JB. Images of Little Cigars and Cigarillos on Instagram Identified by the Hashtag #swisher: Thematic Analysis. J Med Internet Res 2017 Jul 14;19(7):e255 [FREE Full text] [CrossRef] [Medline]
  9. Allem JP, Escobedo P, Cruz TB, Unger JB. Vape pen product placement in popular music videos. Addict Behav 2017 Nov 03. [CrossRef] [Medline]
  10. Allem JP, Chu K, Cruz TB, Unger JB. Waterpipe Promotion and Use on Instagram: #Hookah. Nicotine Tob Res 2017 Oct 01;19(10):1248-1252 [FREE Full text] [CrossRef] [Medline]
  11. Guidry J, Jin Y, Haddad L, Zhang Y, Smith J. How Health Risks Are Pinpointed (or Not) on Social Media: The Portrayal of Waterpipe Smoking on Pinterest. Health Commun 2016;31(6):659-667. [CrossRef] [Medline]
  12. Ben Taleb Z, Laestadius LI, Asfar T, Primack BA, Maziak W. #Hookahlife: The Rise of Waterpipe Promotion on Instagram. Health Educ Behav 2018 Jun 01:1090198118779131. [CrossRef] [Medline]
  13. Allem JP, Ramanujam J, Lerman K, Chu K, Boley CT, Unger JB. Identifying Sentiment of Hookah-Related Posts on Twitter. JMIR Public Health Surveill 2017 Oct 18;3(4):e74 [FREE Full text] [CrossRef] [Medline]
  14. Chen A, Zhu S, Conway M. Combining Text Mining and Data Visualization Techniques to UnderstandConsumer Experiences of Electronic Cigarettes and Hookah in OnlineForums. Online J Public Health Inform 2015;7(1):e117. [CrossRef]
  15. Krauss MJ, Sowles SJ, Moreno M, Zewdie K, Grucza RA, Bierut LJ, et al. Hookah-Related Twitter Chatter: A Content Analysis. Prev Chronic Dis 2015 Jul 30;12:E121 [FREE Full text] [CrossRef] [Medline]
  16. Myslín M, Zhu S, Chapman W, Conway M. Using twitter to examine smoking behavior and perceptions of emerging tobacco products. J Med Internet Res 2013 Aug 29;15(8):e174 [FREE Full text] [CrossRef] [Medline]
  17. Pew Research Center. Pew Research Center. 2017. Social media fact sheet   URL: [accessed 2018-10-29] [WebCite Cache]
  18. Primack BA, Carroll MV, Shensa A, Davis W, Levine MD. Sex Differences in Hookah-Related Images Posted on Tumblr: A Content Analysis. J Health Commun 2016;21(3):366-375 [FREE Full text] [CrossRef] [Medline]
  19. Allem JP, Ferrara E. The Importance of Debiasing Social Media Data to Better Understand E-Cigarette-Related Attitudes and Behaviors. J Med Internet Res 2016 Dec 09;18(8):e219 [FREE Full text] [CrossRef] [Medline]
  20. Allem JP, Ferrara E. Could Social Bots Pose a Threat to Public Health? Am J Public Health 2018 Aug;108(8):1005-1006. [CrossRef] [Medline]
  21. Davis CA, Varol O, Ferrara E, Flammini A, Menczer F. Botornot: A system to evaluate social bots. 2016 Presented at: the 25th International Conference Companion on World Wide Web. International World Wide Web Conferences Steering Committee; 2016 Apr 11; Montreal, Canada p. 273-274.
  22. Allem JP, Dharmapuri L, Unger JB, Cruz TB. Characterizing JUUL-related posts on Twitter. Drug Alcohol Depend 2018 Dec 01;190:1-5. [CrossRef] [Medline]
  23. Allem JP, Ferrara E, Uppu SP, Cruz TB, Unger JB. E-Cigarette Surveillance With Social Media Data: Social Bots, Emerging Topics, and Trends. JMIR Public Health Surveill 2017 Dec 20;3(4):e98 [FREE Full text] [CrossRef] [Medline]
  24. Mikolov T, Sutskever I, Chen K, Corrado G, Dean J. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems 2013:3111-3119.
  25. Goodwin RD, Grinberg A, Shapiro J, Keith D, McNeil MP, Taha F, et al. Hookah use among college students: prevalence, drug use, and mental health. Drug Alcohol Depend 2014 Aug 01;141:16-20. [CrossRef] [Medline]
  26. Jackson KM, Colby SM, Sher KJ. Daily patterns of conjoint smoking and drinking in college student smokers. Psychol Addict Behav 2010 Sep;24(3):424-435 [FREE Full text] [CrossRef] [Medline]
  27. Kavuluru R, Han S, Hahn EJ. On the popularity of the USB flash drive-shaped electronic cigarette Juul. Tob Control 2018 Apr 13:-. [CrossRef] [Medline]
  28. Silveira ML, Hilmi NN, Conway KP. Reasons for Young Adult Waterpipe Use in Wave 1 (2013-2014) of the Population Assessment of Tobacco and Health Study. Am J Prev Med 2018 Nov;55(5):650-655. [CrossRef] [Medline]
  29. Kowitt SD, Meernik C, Baker HM, Osman A, Huang L, Goldstein AO. Perceptions and Experiences with Flavored Non-Menthol Tobacco Products: A Systematic Review of Qualitative Studies. Int J Environ Res Public Health 2017 Dec 23;14(4):- [FREE Full text] [CrossRef] [Medline]
  30. Unger JB, Urman R, Cruz TB, Majmundar A, Barrington-Trimis J, Pentz MA, et al. Talking about tobacco on Twitter is associated with tobacco product use. Prev Med 2018 Sep;114:54-56. [CrossRef] [Medline]

API: application program interface

Edited by G Eysenbach; submitted 23.07.18; peer-reviewed by L Laestadius, J Colditz; comments to author 13.09.18; revised version received 05.10.18; accepted 08.10.18; published 19.11.18


©Jon-Patrick Allem, Likhit Dharmapuri, Adam M Leventhal, Jennifer B Unger, Tess Boley Cruz. Originally published in the Journal of Medical Internet Research (, 19.11.2018.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on, as well as this copyright and license information must be included.