Original Paper
Abstract
Background: Diarrhea, a common symptom of gastrointestinal infections, can lead to severe complications and is a major cause of emergency department (ED) visits.
Objective: This study explored the temporal association between internet search queries for diarrhea and its synonyms and ED visits for diarrhea-related symptoms.
Methods: We used data from the National Emergency Department Information System (NEDIS) and NAVER (Naver Corporation), South Korea’s leading search engine, from January 2017 to December 2021. After identifying diarrhea synonyms using ChatGPT, we compared weekly trends in relative search volumes (RSVs) for diarrhea, including its synonyms and weekly ED visits. Pearson correlation analysis and Granger causality tests were used to evaluate the relationship between RSVs and ED visits. We developed an Autoregressive Integrated Moving Average with Exogenous Variables (ARIMAX) model to further predict these associations. This study also examined the age-based distribution of search behaviors and ED visits.
Results: A significant correlation was observed between the weekly RSV for diarrhea and its synonyms and weekly ED visits for diarrhea-related symptoms (ranging from 0.14 to 0.51, P<.05). Weekly RSVs for diarrhea synonyms, such as “upset stomach,” “watery diarrhea,” and “acute enteritis,” showed stronger correlations with weekly ED visits than weekly RSVs for the general term “diarrhea” (ranging from 0.20 to 0.41, P<.05). This may be because these synonyms better reflect layperson terminology. Notably, weekly RSV for “upset stomach” was significantly correlated with weekly ED visits for diarrhea and acute diarrhea at 1 and 2 weeks before the visit (P<.05). An ARIMAX model was developed to predict weekly ED visits based on weekly RSVs for diarrhea synonyms with lagged effects to capture their temporal influence. The age group of <50 years showed the highest activity in both web-based searches and ED visits for diarrhea-related symptoms.
Conclusions: This study demonstrates that weekly RSVs for diarrhea synonyms are associated with weekly ED visits for diarrhea-related symptoms. By encompassing a nationwide scope, this study broadens the existing methodology for syndromic surveillance using ED data and provides valuable insights for clinicians.
doi:10.2196/65101
Keywords
Introduction
The primary causes of admission to emergency departments (EDs) in South Korea [
] and the United States [ ] are “gastroenteritis and colitis of infectious and unspecified origin.” Diarrhea, a common symptom of gastrointestinal infections, can lead to severe dehydration and hypovolemic shock [ ]. The Global Burden of Disease Study reports approximately 2.39 billion annual cases of diarrhea worldwide, resulting in approximately 480,000 deaths among children younger than 5 years in 2019 [ ].The internet is a critical tool for analyzing health information–seeking behaviors, including those related to diarrhea [
]. Web-based surveillance tools effectively analyze human behavior [ ], gauge disease prevalence [ - ], and forecast infectious disease outbreaks [ , ]. Previous applications of artificial intelligence (AI) models in public health surveillance used models that analyzed data from various sources, including electronic health records, social media, and travel data, to predict future incidences of respiratory infections, such as COVID-19 and influenza, but not gastrointestinal infections [ ]. Gastrointestinal symptoms, such as abdominal pain, diarrhea, and vomiting, are indicators for identifying symptom outbreaks and monitoring public health trends [ ]. However, many symptoms related to diarrhea have been overlooked, and identifying appropriate keywords for gastrointestinal symptoms remains challenging [ , ].Many studies have explored large language models and their applications in gastrointestinal diseases. However, only a limited number of studies have focused on monitoring gastrointestinal symptoms through web-based surveillance [
, ]. Advancements in AI have shown that large language models such as OpenAI’s ChatGPT can understand human language and address health-related research problems. Compared to traditional expert-driven or systematic review–based methods [ - ], ChatGPT is particularly well-suited for generating synonyms, as it is trained on large-scale natural language data and can recognize colloquial, idiomatic, and context-specific expressions commonly used by the general public [ - ]. This capacity enables the generation of search terms that more accurately reflect how individuals describe symptoms in real-world web-based searches. This study aimed to investigate the association between real-time data on diarrhea-related symptoms obtained from the National Emergency Department Information System (NEDIS) and relative search volumes (RSVs) of diarrhea and its synonyms on NAVER, South Korea’s most widely used search engine. By examining these relationships, the study seeks to enhance the potential of web-based surveillance for predicting gastrointestinal outbreaks and informing ED resource allocation.Methods
Data Source and Setting
The NEDIS was established in 2003 under the Emergency Medical Service Act to assess the quality of care delivered in EDs across South Korea [
]. Data from each patient visit is automatically transmitted from the visited EDs to a central government server within 2 to 14 days after the patient leaves the ED or hospital [ ]. The transmitted data includes patient demographics (sex, age, and insurance type), primary complaints, vital signs, triage details, ED visit outcomes, and diagnosis codes according to the Korean Standard Classification of Diseases and Causes of Death 7th edition [ , ]. Between January 2017 and December 2021, the NEDIS database included information on 420,819 ED visits for diarrhea-related symptoms. NAVER was selected as the primary search engine for this study not only because it is the most dominant platform in South Korea, but also due to its overwhelming market share relative to alternatives. NAVER accounted for approximately 70% of the domestic search engine market, far surpassing Google, Daum, and other platforms [ , ]. This high penetration ensures that NAVER-based data should reflect the majority of population-level search behaviors in Korea, thereby providing a robust foundation for syndromic surveillance. NAVER Data Lab offers a search term trend service that tracks the longitudinal trend of RSVs for various topics, starting from January 2016. RSVs were standardized, with the highest search volume for a subject term set to 100 during a specified period [ , ], resulting in RSVs represented as relative percentages. This study used weekly trend data from NAVER Data Lab, identifying 4 Korean search terms for diarrhea and their synonyms based on GPT-4 (Figure S1 and Table S1 in ). This study was granted a waiver by the Institutional Review Board of Seoul National University Hospital (IRB No E-2406-089-1544).Diarrhea-Related Symptoms From NEDIS Data
In the NEDIS, Unified Medical Language System (UMLS) codes were collected for chief complaints, with up to 3 complaints recorded per visit. Based on the chief complaint, we selected UMLS-coded symptoms indicative of diarrhea (C0011991), acute diarrhea (C0740441), watery diarrhea (C0239182), and vomiting with diarrhea (C0474496). Based on the data from the NEDIS database, 420,819 visits were for diarrhea-related symptoms between January 2017 and December 2021. The data were divided into 5 age groups to compare searching and ED visits: 0-18 years (83,425 visits), 19-29 years (67,638 visits), 30-39 years (56,513 visits), 40-49 years (44,223 visits), and >50 years (169,020 visits).
Data Acquisition of Diarrhea Synonyms From NAVER
GPT-4 was used on March 5, 2024, to identify Korean synonyms for diarrhea (Figure S1 in
). To identify synonyms related to diarrhea for use in this study, we used ChatGPT to generate an initial list of potential terms. This list was then reviewed collaboratively by all authors, including physicians, to ensure that the terms met the following criteria: (1) relevance to NEDIS data: the terms needed to correspond to chief complaints recorded in the NEDIS database; (2) RSVs viability: each term’s weekly RSV had to consistently exceed zero. Through this rigorous process, we selected 4 terms that aligned with these criteria, ensuring they were both clinically relevant and representative of real-world users’ search behavior. A total of 3 synonyms were chosen for comparison with the ED visits for diarrhea-related symptoms based on UMLS codes—upset stomach, watery diarrhea, and acute enteritis. An upset stomach involves digestive discomfort, including nausea, bloating, pain, and diarrhea. Watery diarrhea refers to loose liquid bowel movements with high water content. Acute enteritis means sudden inflammation of the small intestine, typically causing diarrhea and abdominal pain. These terms were used to collect weekly RSVs via the NAVER API from January 2017 through December 2021 (Table S1 in ). Following the guideline that acute diarrhea lasts for less than 14 days [ ], we subdivided the lag times into 1- and 2-week intervals to account for the temporal relationship between RSVs and ED visits for acute diarrhea. To validate the synonyms generated by ChatGPT, we compared the RSVs of the general term “diarrhea” with those of the ChatGPT-suggested synonyms (“upset stomach,” “watery diarrhea,” and “acute enteritis”) during the study period. We hypothesized that valid synonyms would exhibit similar search patterns, evidenced by strong correlations. The analysis statistically revealed significant correlations (ranging from 0.4 to 0.7; P<.05) using the Pearson method, supporting the validity of these synonyms in representing similar search behaviors.Statistical Analyses
To compare the trends between the mean of the total weekly RSVs, including diarrhea and its synonyms, and total weekly ED visits for diarrhea-related symptoms, we calculated total weekly diarrhea-related symptoms based on chief complaints. To calculate the mean of the total weekly RSV, we averaged weekly RSVs for diarrhea and its synonyms obtained from the NAVER API. The correlation between weekly ED visits for diarrhea-related symptoms and weekly RSVs for diarrhea and its synonyms was evaluated using Pearson correlation analysis. The Granger causality test was used to determine the temporal relationship between weekly RSVs, 1 and 2 weeks before weekly ED visits for diarrhea-related symptoms. An Autoregressive Integrated Moving Average with Exogenous Variables (ARIMAX) model was developed to predict the associations between weekly RSV for diarrhea synonyms and weekly ED visits for diarrhea-related symptoms. The ARIMAX model extends the ARIMA framework by incorporating exogenous variables to improve prediction accuracy [
]. The model is specified as ARIMAX(p, d, q)(P, D, Q)[s], where:- p and P denote the nonseasonal and seasonal autoregressive orders, respectively,
- d and D represent the nonseasonal and seasonal differencing orders, respectively,
- q and Q indicate the nonseasonal and seasonal moving average orders, respectively, and
- s corresponds to the seasonal period.
Unlike ARIMA, ARIMAX allows for the incorporation of external variables (exposures) to predict the outcome variable. In this study, weekly RSV for diarrhea synonyms, including upset stomach, watery diarrhea, and acute enteritis, were used as exposures to predict weekly ED visits for diarrhea and watery diarrhea as outcomes. The ARIMAX model was fitted automatically by selecting the optimal p, d, q, P, D, Q, and s parameters using the Hyndman-Khandakar algorithm [
]. This algorithm first determines the appropriate degree of differencing (d) by applying repeated KPSS (Kwiatkowski-Phillips- Schmidt-Shin) tests within the range 0-2 [ ]. It then uses a stepwise search to find the AR and MA orders (p and q), as well as any seasonal components (P, Q, and seasonal period s), that minimize the Corrected Akaike Information Criterion (AICc). The procedure begins by fitting several initial candidate models, selects the best-performing model (lowest AICc) as the “current model,” and iteratively refines it by adjusting model orders and the inclusion or exclusion of a constant term. This process continues until no further improvement in AICc is achieved, thus identifying the final model specification. The ARIMAX model was first fitted on each weekly RSV for diarrhea synonyms up to 2020, to establish the association between weekly RSVs for diarrhea synonyms and weekly ED visits for diarrhea and watery diarrhea. Subsequently, the model was used to forecast weekly ED visits in 2021, using RSVs for diarrhea synonyms. The predicted values were then compared with the observed ED visits for diarrhea and watery diarrhea to evaluate the model's predictive performance. Accuracy metrics, including mean absolute percentage error and symmetric mean absolute percentage error, were calculated to quantify prediction error. To explore the potential lagged effects of weekly RSV for diarrhea synonyms on ED visits for diarrhea and watery diarrhea, sensitivity analyses were conducted by introducing lag times of 0, 1, and 2 weeks. In addition, we compared the age group-based distribution between ED visits for diarrhea-related symptoms and RSVs for diarrhea and its synonyms. All reported P values were 2-sided with a type I error threshold of α<.05, and were considered statistically significant. Statistical analyses were performed using SAS version 9.4 (SAS Institute) and “forecast” R packages (R software, version 4.4.1 R Core Team) [ ].Results
In
the total weekly ED visits for diarrhea-related symptoms from the NEDIS and the mean of the total weekly RSVs, including diarrhea and its synonyms from NAVER, from January 2017 to December 2021 has been illustrated. The trend in the mean of the total weekly RSVs, including diarrhea and its synonyms, closely mirrored the trend in total weekly ED visits for diarrhea-related symptoms.Table S2 in
shows the correlation between weekly ED visits for diarrhea-related symptoms and weekly RSVs for diarrhea and its synonyms. Weekly ED visits for diarrhea had a significantly higher correlation with weekly RSVs for “upset stomach,” “watery diarrhea,” and “acute enteritis” than with those for “diarrhea” (r=0.41, P<.001; r=0.30, P<.001; r=0.22, P<.001, respectively, versus r=0.20, P<.001). The weekly RSV for “upset stomach” was significantly correlated with weekly ED visits for diarrhea at lags of 1 and 2 weeks (P<.05). For weekly RSV for “acute enteritis,” only a 2-week previous correlation was significant (P=.02). Weekly RSV for “watery diarrhea” strongly correlated with weekly ED visits for watery diarrhea (r=0.51, P<.001), with significant correlations at 1 and 2 weeks before the visits (1 week previous, P=.002; 2 weeks previous P=.009). Weekly ED visits for vomiting with diarrhea did not show a significant correlation with weekly RSVs, except for “upset stomach” (r=0.21, P<.001; 1 week previous, P=.04).Weekly RSVs for “upset stomach,” “watery diarrhea,” and “acute enteritis” appear significantly associated with weekly ED visits for diarrhea and watery diarrhea across lag times of 0 to 2 weeks in the ARIMAX models (Table S3 in
and ). The ARIMAX models with the best fit and smallest error metrics based on root-mean-square error and symmetric mean absolute percentage error vary by weekly RSVs and lag times. In weekly RSV for “upset stomach” and weekly ED visits due to diarrhea, the best-fitting model is ARIMAX(0,1,3; 0,0,0) [52] at Lag 2, with a coefficient of 16.215 (SE 4.813, P<.001; A). In weekly RSV for “watery diarrhea” and weekly ED visits due to watery diarrhea, ARIMAX(1,0,1; 0,0,0) [52] at Lag 1 demonstrates a strong association with a coefficient of 1.848 (SE 0.153, P<.001; B). Similarly, association between weekly RSV for “acute enteritis” and weekly ED visits due to watery diarrhea shows significant lag times at Lag 1 and Lag 2 using ARIMAX(3,0,2; 0,0,0) [52], with coefficients of 1.123 (SE 0.141; P<.001) and 1.034 (SE 0.200; P<.001; C).In
the age group-based distribution of RSVs for diarrhea and its synonyms from NAVER and ED visits for diarrhea-related symptoms from NEDIS were presented, from January 2017 to December 2021. The age group with the highest proportion of RSV for diarrhea and its synonyms and ED visits for diarrhea-related symptoms was over 50 years old. The RSVs for “diarrhea” and “upset stomach” each accounted for 26% of total search volume, which was higher than that of other diarrhea synonyms. In comparison, the proportions of ED visits for acute diarrhea and watery diarrhea were 46% and 40%, respectively.


Discussion
Principal Findings
Our study, conducted from January 2017 to December 2021, revealed significant correlations between the mean of the total weekly RSVs, including diarrhea and its synonyms, and total weekly ED visits for diarrhea-related symptoms. The trend in mean total weekly RSVs closely mirrored that of the total weekly ED visits. Notably, the correlations between weekly ED visits and weekly RSVs were stronger for synonyms such as “upset stomach,” “watery diarrhea,” and “acute enteritis” than for the general term “diarrhea.” The weekly RSV for “upset stomach” was significantly correlated with weekly ED visits for diarrhea at lags of 1 and 2 weeks. Similarly, the weekly RSVs for “watery diarrhea” and “acute enteritis” were significantly correlated with ED visits for watery diarrhea at lags of 1 and 2 weeks. However, weekly ED visits with vomiting and diarrhea showed no significant relationship with weekly RSVs, except those with “upset stomach.” ARIMAX models further confirmed the lag times between weekly RSVs for diarrhea synonyms and weekly ED visits, demonstrating strong predictive performance across weekly RSVs and lag times. Along with that, search engine users older than 50 years showed the highest proportion of both RSVs and ED visits for diarrhea-related symptoms. Among the search terms, “diarrhea” and “upset stomach” each accounted for 26% of the total search volume. In comparison, ED visits for acute and watery diarrhea accounted for 46% and 40%, respectively.
Many people prefer to search the internet for health information before visiting EDs [
]. A previous study has shown that the relative frequency of searches for gastrointestinal symptoms, including diarrhea, closely mirrored the changing incidences of cases in large inpatient datasets [ ]. Our findings align with those of previous studies, indicating that search trends for diarrhea symptoms reflect ED visit trends. Google Trends automatically compiles search terms related to each symptom and other related search terms; however, the exact compilation process and methodology are not publicly disclosed. Studies in Korea have shown that representative keywords for foodborne diseases, such as “Seol-sa” (diarrhea) [ ], can track population-level changes in the incidence of diarrhea through search volumes [ ]. This finding supports the idea that search engine data can be used to monitor ED visits.Our results showed that weekly ED visits for diarrhea had a higher correlation with weekly RSVs for diarrhea synonyms, including “upset stomach”, “watery diarrhea”, and “acute enteritis”, than with those for “diarrhea”. Adults suffering from diarrhea, particularly acute forms, commonly seek medical evaluation in EDs [
]. The criteria for acute diarrhea include volume depletion and 6 or more stools in 24 hours, making “watery diarrhea” a more relevant term for acute diarrhea than chronic diarrhea [ , ]. This explains why weekly RSVs for “watery diarrhea” are related to weekly ED visits for diarrhea-related symptoms, as acute diarrhea requires immediate medical evaluation [ ]. Similarly, “upset stomach” and “acute enteritis” can be the terms used for acute diarrhea, supporting the observation that searches for these keywords precede ED visits. In addition, acute diarrhea lasts for less than 14 days, while diarrhea lasting more than 14 days is termed “persistent”, and those lasting over 1 month are termed “chronic” [ ]. While vomiting episodes are typically short in duration, they do not correlate well with internet searches conducted 1–2 weeks before ED visits. Therefore, our results showed that weekly RSVs indicating acute diarrhea became notable within 1–2 weeks, whereas weekly ED visits for vomiting with diarrhea did not show a significant relationship with weekly RSVs for diarrhea and its synonyms. In addition, the weak or nonsignificant correlations between weekly ED visits for vomiting with diarrhea, and weekly RSV for diarrhea and its synonyms may reflect a discrepancy between actual symptom presentations in ED visits and real-world search behaviors. While patients who visit the ED with “vomiting with diarrhea” are experiencing both symptoms concurrently, most internet users tend to search for only one dominant symptom at a time rather than enter queries that combine multiple symptoms [ ]. As a result, compound search phrases that include both vomiting and diarrhea are rarely used, which may lead to lower relative search volumes and weaker correlations with ED visit data, despite their clinical relevance.A previous study conducted in Germany reported that individuals aged 36-55 years were the most active users of search engines for health-related information, while those older than 56 years demonstrated the lowest levels of engagement [
]. In contrast, South Korea exhibits one of the highest internet penetration rates globally, including among older adults, with more than 90% of individuals aged 50 years and older using the internet as of 2021. Recent national statistics show that 70.2% of Korean internet users search for health and medical information, with comparable rates among men (70.7%) and women (69.7%). Age-stratified data further indicate that individuals in their 30s had the highest rate of health information-seeking (80.3%), followed closely by those in their 40s (79.9%), 50s (78%), and 60s (75.7%) [ ]. This upward trend in digital engagement among older adults is supported by previous research showing a narrowing digital health divide and increasing internet use for health-related purposes in older populations [ , ]. In the context of our study, individuals older than 50 years not only exhibited the highest frequency of diarrhea-related ED visits but also demonstrated the highest levels of related web-based search activity. This may be attributable to 2 converging factors: the higher symptom burden in this age group and their growing reliance on internet-based health resources. Whereas younger adults are more inclined to seek health information through social media platforms [ ], older adults tend to prefer structured, one-way sources such as search engines, further reinforcing their visibility in search volume data.The overlap of our study period with the early phase of the COVID-19 pandemic (2020-2021) may have influenced both internet search behavior and ED usage patterns. During this time, public awareness of gastrointestinal symptoms, some of which overlap with COVID-19 manifestations, may be heightened due to widespread media coverage and health messaging. This may have led to increased search volume for symptom-related terms such as “diarrhea” or “upset stomach,” even in the absence of actual illness, thereby introducing potential noise into search trends [
]. Furthermore, studies have shown that many individuals delayed or avoided ED visits for non–COVID-19 conditions due to fear of infection or changes in health care access during the pandemic [ , ]. These behavioral shifts may have altered the typical relationship between symptom occurrence, web-based search activity, and clinical visits. As such, caution is warranted when interpreting our results from the pandemic period, as search and health care usage behaviors were likely atypical during this time.Limitations
Our study has several limitations. First, RSVs may not fully capture the motivations behind individual searches, as users may be influenced by seasonal trends, media coverage, or general health concerns rather than actual symptoms. While social media platforms like YouTube are increasingly used by younger populations for health-related information, our study could not incorporate such data due to limitations in available sources. Furthermore, potential confounding factors such as pandemics or public health campaigns were not controlled for. Nevertheless, previous research supports that individuals often turn to search engines early in the symptom experience, suggesting that, despite these limitations, RSVs remain a valuable proxy for monitoring population-level health trends when interpreted cautiously. Along with that, NAVER is the dominant search engine in South Korea; our findings may not be directly generalizable to other regions where different platforms are more commonly used. However, the underlying methodology—leveraging AI-generated symptom synonyms and analyzing their temporal relationship with health outcomes—can be adapted to platforms. Future cross-platform comparative studies are warranted to examine the consistency and reliability of search behavior across different sociotechnical contexts. In addition, the study does not account for potential misspellings or alternative phrasings, which could have resulted in underestimating actual search volumes. These factors may have affected the comprehensiveness of the data and the overall accuracy of the findings. Second, using ChatGPT carries the risk of providing incorrect information, exhibiting algorithmic bias, and perpetuating biases present in its training data [
, ]. Nevertheless, GPT-4 demonstrated higher response validity than other AI chatbots, with a correct answer rate of 82.2% in medical knowledge, indicating its potential utility in medical education [ , ]. This study used search terms suggested by ChatGPT, rather than relying solely on scientific MeSH (Medical Subject Headings) terms, to include a wide range of potential search terms. ChatGPT uses a large language model and the deep learning architecture developed by OpenAI to search across an immense collection of real-time data and is designed to respond to natural language queries with almost identical words used by general people [ ]. To address this issue, we assembled a multidisciplinary team comprising AI data scientists, epidemiologists, and a medical doctor as coauthors for our study, fostering thorough discussions to guarantee a balanced and thorough analysis of the data. Third, the NEDIS database includes data from only 40% of EDs (211 out of 522), primarily large urban hospitals [ ]. This coverage may introduce a bias toward urban populations, underrepresenting rural and smaller health care facilities. Patient populations in rural areas tend to be older, with higher rates of chronic conditions, which could result in different health-seeking behaviors and diagnostic patterns compared to urban populations [ ]. In addition, rural EDs often rely on general emergency physicians who may operate with few diagnostic resources and less specialized equipment. These disparities could influence the accuracy and completeness of data reported to NEDIS. Consequently, the findings of this study may not fully reflect health care usage trends in rural areas or smaller health care facilities. Fourth, the 4-year gap between the data collection period (2017–2021) and manuscript submission (2024) reflects the rapidly evolving nature of digital health-seeking behaviors. The emergence and widespread use of large language models, such as ChatGPT, may have altered how individuals search for health information on the web. As conversational AI tools become more prevalent, future studies should investigate how these shifts affect search engine usage patterns. While our findings remain relevant for understanding pre-AI behaviors, longitudinal comparisons with post-AI data could enhance digital surveillance approaches. Fifth, the COVID-19 pandemic has profoundly influenced health care usage and public health awareness. Behaviors such as an increased reliance on telemedicine, a heightened focus on infectious disease symptoms, and a greater public familiarity with digital health solutions have altered the generalizability of findings based on pre- and early-pandemic data. However, as the pandemic subsided, public interest in digital health solutions and symptom searching did not sustain at peak levels, potentially impacting the applicability of pre-2024 search patterns to current contexts [ , ].Despite these limitations, our study provides empirical evidence supporting the utility of ChatGPT-generated search terms in identifying colloquial synonyms that are not captured by MeSH terms. By demonstrating strong correlations between the RSVs of these terms, we highlight ChatGPT’s potential to complement traditional search term generation methodologies in health informatics. This approach addresses the limitations of traditional methods, including Google Trends studies, which often rely on static and predefined keywords. By contrast, our approach uses ChatGPT to dynamically adapt to changes in search behaviors and emerging terms. This not only reduces the reliance on manual efforts but also ensures that the selected terms are reflective of diverse and evolving search behaviors. In addition, we incorporated a validation process using RSV data to ensure the generated terms were both relevant and representative of real-world users’ search behaviors. By demonstrating significant correlations and timely associations between specific diarrhea-related search terms and ED visits, the study validates the potential of AI to enhance symptom surveillance. Moreover, our analyses of the age-based distribution of search engine users and ED visitors provide valuable demographic insights, highlighting which age groups are most likely to seek health information on the web and require medical care. Unlike global platforms such as Google, NAVER reflects local cultural and linguistic nuances, making it particularly well-suited for capturing region-specific search behaviors [
]. In addition, our study offers insights into how internet-based syndromic surveillance may help address both the digital divide and rural underrepresentation in public health data. Rural populations often face limited access to broadband, digital tools, and clinical infrastructure, which can hinder both their web-based health engagement and inclusion in clinical surveillance systems [ ]. However, symptom search data from NAVER can serve as a supplementary data stream, particularly in areas where ED data are incomplete or unavailable [ ]. In addition, by using ChatGPT to generate colloquial and culturally relevant synonyms, our study enhances the inclusivity of web-based surveillance. Previous research has shown that laypeople, especially those in older or rural populations, often use nonstandard language to describe symptoms [ ]. ChatGPT’s ability to capture such expressions increases the sensitivity of surveillance models across linguistic and educational divides.Conclusions
The study demonstrates that the RSVs for diarrhea synonyms such as “watery diarrhea,” “upset stomach,” and “acute enteritis” are significantly associated with ED visits for diarrhea-related symptoms. These findings suggest that web search trends can effectively serve as an early indicator for increased ED visits, offering a valuable tool for real-time syndromic surveillance. For clinicians, this methodology can be integrated into daily practice to anticipate surges in diarrhea-related cases. By monitoring search volumes for these key terms, health care providers can better allocate resources, prepare for potential outbreaks, and inform patient care strategies before an uptick in physical visits occurs. Furthermore, this approach is adaptable across different languages and countries by leveraging the leading search engines in respective regions, making it a promising tool for global health monitoring.
Acknowledgments
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korean Government (MSIT; No. RS-2023-00213267).
Data Availability
The datasets generated or analyzed during this research process are not publicly available due to South Korean privacy laws. However, you can apply for access to the data via the National Emergency Department Information System (NEDIS) or the NEVER Data Lab.
Authors' Contributions
SYK performed supervision and funding acquisition. ASJ, JSJ, DKY, and SYK performed data curation. ASJ and SYK conducted formal analysis. SYK, JSK, and SJL performed investigation. JSK, ASJ, JSJ, SJL, DKY, and SYK contributed to methodology. JSK, ASJ, JSJ, SJL, and SYK handled project administration. SYK and JSJ managed resources. SYK and SJL managed software. ASJ, SJL, and SYK performed visualization. JSK, ASJ, JSJ, SJL, and SYK wrote the original draft. All authors contributed to the conceptualization, writing—review and editing.
Conflicts of Interest
None declared.
Additional information. Prompt for data acquisition of synonyms for diarrhea; the 4 Korean search terms, including 1 symptom and 3 synonyms; correlation between diarrhea-related illness (NEDIS) and diarrhea related searches (NAVER) from January 2017 to December 2021; and fitted parameters of ARIMAX models for the associations between relative search volumes of symptoms and emergency department visits across lag periods (0 to 2 weeks).
DOCX File , 110 KBReferences
- Park J, Yeo Y, Ji Y, Kim B, Han K, Cha W, et al. Factors associated with emergency department visits and consequent hospitalization and death in Korea using a population-based national health database. Healthcare (Basel). 2022;10(7):1324. [FREE Full text] [CrossRef] [Medline]
- Weiss A, Jiang H. Most frequent reasons for emergency department visits, 2018. HCUP Statistical Brief. 2021. URL: https://hcup-us.ahrq.gov/reports/statbriefs/sb277-Top-Reasons-Hospital-Stays-2018.jsp [accessed 202-07-30]
- Bellido-Blasco J, Arnedo-Pena A. Epidemiology of infectious diarrhea. Environ Health. 2011:659. [CrossRef]
- Perin J, Mulick A, Yeung D, Villavicencio F, Lopez G, Strong KL, et al. Global, regional, and national causes of under-5 mortality in 2000-19: an updated systematic analysis with implications for the sustainable development goals. Lancet Child Adolesc Health. 2022;6(2):106-115. [FREE Full text] [CrossRef] [Medline]
- Aoun L, Lakkis N, Antoun J. Prevalence and outcomes of web-based health information seeking for acute symptoms: cross-sectional study. J Med Internet Res. 2020;22(1):e15148. [FREE Full text] [CrossRef] [Medline]
- Michie S, Yardley L, West R, Patrick K, Greaves F. Developing and evaluating digital interventions to promote behavior change in health and health care: recommendations resulting from an international workshop. J Med Internet Res. 2017;19(6):e232. [FREE Full text] [CrossRef] [Medline]
- Hswen Y, Zhang A, Ventelou B. Estimation of asthma symptom onset using internet search queries: lag-time series analysis. JMIR Public Health Surveill. 2021;7(5):e18593. [FREE Full text] [CrossRef] [Medline]
- Halford EA, Lake AM, Gould MS. Google searches for suicide and suicide risk factors in the early stages of the COVID-19 pandemic. PLoS One. 2020;15(7):e0236777. [FREE Full text] [CrossRef] [Medline]
- Bahk GJ, Kim YS, Park MS. Use of internet search queries to enhance surveillance of foodborne illness. Emerg Infect Dis. 2015;21(11):1906-1912. [FREE Full text] [CrossRef] [Medline]
- Yang S, Santillana M, Kou SC. Accurate estimation of influenza epidemics using google search data via ARGO. Proc Natl Acad Sci U S A. 2015;112(47):14473-14478. [FREE Full text] [CrossRef] [Medline]
- Ginsberg J, Mohebbi MH, Patel RS, Brammer L, Smolinski MS, Brilliant L. Detecting influenza epidemics using search engine query data. Nature. 2009;457(7232):1012-1014. [CrossRef] [Medline]
- Yang L, Zhang T, Han X, Yang J, Sun Y, Ma L, et al. Influenza epidemic trend surveillance and prediction based on search engine data: deep learning model study. J Med Internet Res. 2023;25:e45085. [FREE Full text] [CrossRef] [Medline]
- Ondrikova N, Harris JP, Douglas A, Hughes HE, Iturriza-Gomara M, Vivancos R, et al. Predicting norovirus in England using existing and emerging syndromic data: infodemiology study. J Med Internet Res. 2023;25:e37540. [FREE Full text] [CrossRef] [Medline]
- Olawade DB, Wada OJ, David-Olawade AC, Kunonga E, Abaire O, Ling J. Using artificial intelligence to improve public health: a narrative review. Front Public Health. 2023;11:1196397. [FREE Full text] [CrossRef] [Medline]
- Hyllestad S, Amato E, Nygård K, Vold L, Aavitsland P. The effectiveness of syndromic surveillance for the early detection of waterborne outbreaks: a systematic review. BMC Infect Dis. 2021;21(1):696. [FREE Full text] [CrossRef] [Medline]
- Adedire O, Love NK, Hughes HE, Buchan I, Vivancos R, Elliot AJ. Early detection and monitoring of gastrointestinal infections using syndromic surveillance: a systematic review. Int J Environ Res Public Health. 2024;21(4):489. [FREE Full text] [CrossRef] [Medline]
- Hassid BG, Day LW, Awad MA, Sewell JL, Osterberg EC, Breyer BN. Using search engine query data to explore the epidemiology of common gastrointestinal symptoms. Dig Dis Sci. 2017;62(3):588-592. [FREE Full text] [CrossRef] [Medline]
- Nuti SV, Wayda B, Ranasinghe I, Wang S, Dreyer RP, Chen SI, et al. The use of google trends in health care research: a systematic review. PLoS One. 2014;9(10):e109583. [CrossRef] [Medline]
- Salaris S, Ocagli H, Casamento A, Lanera C, Gregori D. Foodborne event detection based on social media mining: a systematic review. Foods. 2025;14(2):239. [FREE Full text] [CrossRef] [Medline]
- Jeblick K, Schachtner B, Dexl J, Mittermeier A, Stüber AT, Topalis J, et al. ChatGPT makes medicine easy to swallow: an exploratory case study on simplified radiology reports. Eur Radiol. 2024;34(5):2817-2825. [FREE Full text] [CrossRef] [Medline]
- Cascella M, Montomoli J, Bellini V, Bignami E. Evaluating the feasibility of ChatGPT in healthcare: an analysis of multiple clinical and research scenarios. J Med Syst. 2023;47(1):33. [FREE Full text] [CrossRef] [Medline]
- Dwivedi YK, Kshetri N, Hughes L, Slade EL, Jeyaraj A, Kar AK, et al. et al. Opinion Paper: “So what if ChatGPT wrote it?” multidisciplinary perspectives on opportunities, challenges and implications of generative conversational AI for research, practice and policy. Int J Inf Manag. 2023;71:102642. [CrossRef]
- Lee K, Lim D, Paik J, Choi YY, Jeon J, Sung HK. Suicide attempt-related emergency department visits among adolescents: a nationwide population-based study in Korea, 2016-2019. BMC Psychiatry. 2022;22(1):418. [FREE Full text] [CrossRef] [Medline]
- Ko Y, Kim HJ, Cha ES, Kim J, Lee WJ. Emergency department visits due to pesticide poisoning in South Korea, 2006-2009. Clin Toxicol (Phila). 2012;50(2):114-119. [CrossRef] [Medline]
- Choi DH, Jung JY, Suh D, Choi JY, Lee SU, Choi YJ, et al. Impact of the COVID-19 outbreak on trends in emergency department utilization in children: a multicenter retrospective observational study in seoul metropolitan area, Korea. J Korean Med Sci. 2021;36(5):e44. [FREE Full text] [CrossRef] [Medline]
- Yang HJ, Kim GW, Kim H, Cho JS, Rho TH, Yoon HD, et al. NEDIS-CA Consortium. Epidemiology and outcomes in out-of-hospital cardiac arrest: a report from the NEDIS-based cardiac arrest registry in Korea. J Korean Med Sci. 2015;30(1):95-103. [FREE Full text] [CrossRef] [Medline]
- Kwon C. A study on major uninsured Korean medicine treatments search trends and their meanings in an online portal: using naver data lab. J Korean Med. 2023;44(3):74-86. [CrossRef]
- Kim J, Han J, Chun BC. Trends of internet search volumes for major depressive disorder symptoms during the COVID-19 pandemic in Korea: an interrupted time-series analysis. J Korean Med Sci. 2022;37(14):e108. [CrossRef] [Medline]
- Jimenez A, Santed-Germán MA, Ramos V. Google searches and suicide rates in Spain, 2004-2013: correlation study. JMIR Public Health Surveill. 2020;6(2):e10919. [FREE Full text] [CrossRef] [Medline]
- Gore JI, Surawicz C. Severe acute diarrhea. Gastroenterol Clin North Am. 2003;32(4):1249-1267. [FREE Full text] [CrossRef] [Medline]
- Nelson BK. Statistical methodology: V. Time series analysis using autoregressive integrated moving average (ARIMA) models. Acad Emerg Med. 1998;5(7):739-744. [FREE Full text] [CrossRef] [Medline]
- Hyndman RJ, Khandakar Y. Automatic time series forecasting: the forecast package for R. J Stat Softw. 2008;27(3):1-22. [CrossRef]
- Kwiatkowski D, Phillips PC, Schmidt P, Shin Y. Testing the null hypothesis of stationarity against the alternative of a unit root. J Econom. 1992;54(1-3):159-178. [CrossRef]
- Ekström A, Kurland L, Farrokhnia N, Castrén M, Nordberg M. Forecasting emergency department visits using internet data. Ann Emerg Med. 2015;65(4):436-442.e1. [CrossRef] [Medline]
- Azqul M, Krakower D, Kalim S, Merchant RC. Emergency department visits in the United States by adults with a complaint of diarrhea (2016-2021). J Am Coll Emerg Physicians Open. 2024;5(1):e13105. [FREE Full text] [CrossRef] [Medline]
- DuPont HL. Guidelines on acute infectious diarrhea in adults. The practice parameters committee of the American college of gastroenterology. Am J Gastroenterol. 1997;92(11):1962-1975. [Medline]
- Park SI, Giannella RA. Approach to the adult patient with acute diarrhea. Gastroenterol Clin North Am. 1993;22(3):483-497. [Medline]
- Bolia R. Approach to "Upset Stomach". Indian J Pediatr. 2017;84(12):915-921. [CrossRef] [Medline]
- Eysenbach G, Köhler C. How do consumers search for and appraise health information on the world wide web? Qualitative study using focus groups, usability tests, and in-depth interviews. BMJ. 2002;324(7337):573-577. [FREE Full text] [CrossRef] [Medline]
- Bachl M, Link E, Mangold F, Stier S. Search engine use for health-related purposes: behavioral data on online health information-seeking in Germany. Health Commun. 2024;39(8):1651-1664. [FREE Full text] [CrossRef] [Medline]
- Moon Y. Survey on the internet usage in 2021, national information society agency, Daegu. NIA-VIII-RSE-C-21061. 2021. URL: https://tinyurl.com/yz5ep2nv [accessed 2025-04-12]
- Lim MSC, Molenaar A, Brennan L, Reid M, McCaffrey T. Young adults' use of different social media platforms for health information: insights from web-based conversations. J Med Internet Res. 2022;24(1):e23656. [FREE Full text] [CrossRef] [Medline]
- Hardt JH, Hollis-Sawyer L. Older adults seeking healthcare information on the internet. Educ Gerontol. 2007;33(7):561-572. [CrossRef]
- van Kessel R, Kyriopoulos I, Wong BLH, Mossialos E. The Effect of the COVID-19 Pandemic on Digital Health–Seeking Behavior: Big Data Interrupted Time-Series Analysis of Google Trendsv. J Med Internet Res. 2023;25:e42401. [CrossRef]
- Luciani LG, Mattevi D, Cai T, Giusti G, Proietti S, Malossini G. Teleurology in the time of covid-19 pandemic: here to stay? Urology. 2020;140:4-6. [FREE Full text] [CrossRef] [Medline]
- Czeisler MÉ, Marynak K, Clarke KE, Salah Z, Shakya I, Thierry JM, et al. Delay or avoidance of medical care because of COVID-19-related concerns - United States, June 2020. MMWR Morb Mortal Wkly Rep. 2020;69(36):1250-1257. [FREE Full text] [CrossRef] [Medline]
- Shen Y, Heacock L, Elias J, Hentel KD, Reig B, Shih G, et al. ChatGPT and other large language models are double-edged swords. Radiology. 2023;307(2):e230163. [CrossRef] [Medline]
- Mohammad-Rahimi H, Ourang SA, Pourhoseingholi MA, Dianat O, Dummer PMH, Nosrat A. Validity and reliability of artificial intelligence chatbots as public sources of information on endodontics. Int Endod J. 2024;57(3):305-314. [CrossRef] [Medline]
- Torres-Zegarra BC, Rios-Garcia W, Ñaña-Cordova AM, Arteaga-Cisneros KF, Chalco XCB, Ordoñez MAB, et al. Performance of chatGPT, bard, claude, and bing on the peruvian national licensing medical examination: a cross-sectional study. J Educ Eval Health Prof. 2023;20:30. [FREE Full text] [CrossRef] [Medline]
- Kim S, Kim S, Choi BY, Park B. Trends for syndromic surveillance of norovirus in emergency department data based on chief complaints. J Infect Dis. 2024;230(1):103-108. [CrossRef] [Medline]
- Moon BH, Lee SM, Oh M, Ryu HH, Heo T. Analysis of emergency department utilization rate by region, emergency medical center, and hospital type. J Korean Soc Emerg Med. 2016;27(5):442-449. [FREE Full text]
- Alanezi F. Factors influencing patients’ engagement with ChatGPT for accessing health-related information. Crit Public Health. 2024;34(1):1-20. [CrossRef]
- Konieczny P, Danabayev K, Kennedy K, Varpahovskis E. Information literacy in South Korea: similarities and differences between Korean and international students’ research trajectories. Asia Pac J Educ. 2023:1-16. [CrossRef]
- Charles-Smith LE, Reynolds TL, Cameron MA, Conway M, Lau EHY, Olsen JM, et al. Using social media for actionable disease surveillance and outbreak management: a systematic literature review. PLoS One. 2015;10(10):e0139701. [FREE Full text] [CrossRef] [Medline]
- Carneiro H, Mylonakis E. Google trends: a web-based tool for real-time surveillance of disease outbreaks. Clin Infect Dis. 2009;49(10):1557-1564. [CrossRef] [Medline]
- Zhang Y, Milinovich G, Xu Z, Bambrick H, Mengersen K, Tong S, et al. Monitoring pertussis infections using internet search queries. Sci Rep. 2017;7(1):10437. [FREE Full text] [CrossRef] [Medline]
Abbreviations
AI: artificial intelligence |
AICc: corrected Akaike Information Criterion |
ARIMAX: Autoregressive Integrated Moving Average with Exogenous Variables |
ED: emergency department |
MeSH: Medical Subject Headings |
NEDIS: National Emergency Department Information System |
RSV: relative search volume |
UMLS: Unified Medical Language System |
Edited by Q Jin; submitted 05.08.24; peer-reviewed by Z Hou, E Bai, GK Gupta; comments to author 05.10.24; revised version received 27.11.24; accepted 15.04.25; published 22.05.25.
Copyright©Jinsoo Kim, Ansun Jeong, Juseong Jin, Sangjun Lee, Do Kyoon Yoon, Soyeoun Kim. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 22.05.2025.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research (ISSN 1438-8871), is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.