Published on in Vol 18, No 7 (2016): July

Behavioral Analysis of Visitors to a Medical Institution’s Website Using Markov Chain Monte Carlo Methods

Behavioral Analysis of Visitors to a Medical Institution’s Website Using Markov Chain Monte Carlo Methods

Behavioral Analysis of Visitors to a Medical Institution’s Website Using Markov Chain Monte Carlo Methods

Original Paper

1Faculty of Health Sciences, Graduate School of Health sciences, Hokkaido University, Sapporo, Japan

2Department of Medical Informatics and Hospital Management, Asahikawa Medical University, Asahikawa, Hokkaido, Japan

Corresponding Author:

Katsuhiko Ogasawara, PhD, MBA

Faculty of Health sciences

Graduate School of Health Sciences

Hokkaido University

Kita12, Nishi 5, Kita-ku

Sapporo,

Japan

Phone: 81 11 706 3409

Fax:81 11 706 2851

Email: oga@hs.hokudai.ac.jp


Background: Consistent with the “attention, interest, desire, memory, action” (AIDMA) model of consumer behavior, patients collect information about available medical institutions using the Internet to select information for their particular needs. Studies of consumer behavior may be found in areas other than medical institution websites. Such research uses Web access logs for visitor search behavior. At this time, research applying the patient searching behavior model to medical institution website visitors is lacking.

Objective: We have developed a hospital website search behavior model using a Bayesian approach to clarify the behavior of medical institution website visitors and determine the probability of their visits, classified by search keyword.

Methods: We used the website data access log of a clinic of internal medicine and gastroenterology in the Sapporo suburbs, collecting data from January 1 through June 31, 2011. The contents of the 6 website pages included the following: home, news, content introduction for medical examinations, mammography screening, holiday person-on-duty information, and other. The search keywords we identified as best expressing website visitor needs were listed as the top 4 headings from the access log: clinic name, clinic name + regional name, clinic name + medical examination, and mammography screening. Using the search keywords as the explaining variable, we built a binomial probit model that allows inspection of the contents of each purpose variable. Using this model, we determined a beta value and generated a posterior distribution. We performed the simulation using Markov Chain Monte Carlo methods with a noninformation prior distribution for this model and determined the visit probability classified by keyword for each category.

Results: In the case of the keyword “clinic name,” the visit probability to the website, repeated visit to the website, and contents page for medical examination was positive. In the case of the keyword “clinic name and regional name,” the probability for a repeated visit to the website and the mammography screening page was negative. In the case of the keyword “clinic name + medical examination,” the visit probability to the website was positive, and the visit probability to the information page was negative. When visitors referred to the keywords “mammography screening,” the visit probability to the mammography screening page was positive (95% highest posterior density interval = 3.38-26.66).

Conclusions: Further analysis for not only the clinic website but also various other medical institution websites is necessary to build a general inspection model for medical institution websites; we want to consider this in future research. Additionally, we hope to use the results obtained in this study as a prior distribution for future work to conduct higher-precision analysis.

J Med Internet Res 2016;18(7):e199

doi:10.2196/jmir.5139

Keywords



To reduce the existing “asymmetry of information” between a patient and physician, patients routinely access Web-based information about their health problems and medical treatment options. Internet-based medical resources are constantly being developed and expanded on such an environment [1]. Consistent with the “attention, interest, desire, memory, action (AIDMA)” model of consumer behavior, patients collect information about the medical institutions available using the Internet to select information for their particular needs [2].

Because of this situation, recently, many medical institutions are intent on improving their websites. With the development in Internet environment and devices, we are now able to obtain information on many medical institutions in diverse ways. Many companies achieve greater advertising effects by active release of information on the Internet. Therefore, medical institutions actively using social media are also increasing. However, such actions and study are insufficiently advanced in Japan.

Market researchers and social psychologists routinely conduct various consumer behavior analyses based on the AIDMA model to predict factors influencing consumer action and purchase decisions and clarify consumer psychology and internal states [3].

Studies of consumer behavior may be found in areas other than medical institution websites. After the expansion of Web advertisements and publicity, Internet sales greatly increased the rate of Internet usage. Research can be conducted using the Web access logs on visitor search behavior. To investigate product sales over the Internet, searching behavior models such as the “search keyword” and “page view” assume that the searches are an expression of consumer needs. In this study, we determined the probability of visits to a certain webpage by the search keyword using the Markov Chain Monte Carlo (MCMC) methods [4,5]. Recently, marketing research has yielded positive results applying Bayesian statistics with improvements in computer count ability and expects to apply them to a greater degree in the future [6-8]. At this time, research applying the patient searching behavior model to medical institution website visitors is lacking.

In this study, we have developed a hospital website search behavior model using a Bayesian approach to clarify the behavior of medical institution website visitors and determine the probability of their visits classified by the search keyword.


Subject

The flowchart we propose for our research is shown in Figure 1.

We used the website data access log of a clinic of internal medicine and gastroenterology in the Sapporo suburbs for our research, collecting data (336 cases) from January 1 through June 31, 2011. We used Google Analytics to analyze the data access log [9]. The contents of the 6 website pages included the following: home, news, content introduction for medical examinations, mammography screening, holiday person-on-duty information, and other. We used all pages in the clinic for this study. The other page introduces the communication space attached to a hospital. A second visit to the website during the same visit session, distinguished from the first visit and to be counted more correctly as an index page, we classified as “the website (again).” The search keywords we identified as best expressing website visitor needs were listed as the top 4 headings from the access log: clinic name, clinic name + regional name, clinic name + medical examination, and mammography screening.

Figure 1. Flowchart.
View this figure
Methods of Analysis

In this study, we applied Bayes’ theorem as the analysis method. The obtained data were y, and the parameter was defined as θ. Both were random variables and are expressed using Bayes’ theorem as shown in Figure 2.

The left-hand side was called the posterior distribution. This represented the distribution of θ when data y were obtained. The right-hand side of f (y|θ) was the likelihood, and f (θ) was the distribution of θ. This distribution was called the prior distribution. The distribution of the data expressed by the following equation was represented by f (y). We analyzed using this method as shown in Figure 3.

Using the search keyword as the explaining variable, we built a binomial probit model allowing the inspection of the contents of each purpose variable [10]. The binomial probit model is a discrete selection model used in marketing science. In our study, this model used the formulas as described in Figure 4.

The discrete selection model was formulated to address the behaviors of individuals choosing alternatives from their selection sets. In marketing science, this concept is applied to verify consumer selection behavior [11].

Using this model, we determined the beta value and generated a posterior distribution, showing the visit probability to each category classified by the search keywords.

We performed the simulation using MCMC with a noninformation prior distribution for this model and determined the visit probability classified by keyword for each category. We used the Gibbs sampling method, sampling 50,000 times. We also canceled the first 5000 samples, as an initial dependence period (burn-in) [12].

The joint distribution is expressed as shown in Figure 5.

Generally, to check the convergence of the sampling, autocorrelation function (ACF) is used. Thus, in this study, we used ACF to check sample convergence. With the vertical axis as the autocorrelation coefficient, when autocorrelation is high, the accuracy of the Markov chain is low.

Although the form of the ACF in the determined posterior distribution and convergence was observed, there was a problem in reproducibility. In this research, the log judged precedence research to reference completed by 30 or more and the auto correlation coefficient or less by 0.1. In this research, we used statistical software R (version 2.13.0) for the simulation analysis [12,13].

Figure 2. Bayes' theorem.
View this figure
Figure 3. Distribution of the data -f(y).
View this figure
Figure 4. Binomial probit model.
View this figure
Figure 5. Joint distribution.
View this figure
Definition of Visit Probability

To evaluate the posterior probability density function presumed by MCMC, we used the highest posterior density (HPD) interval. As the value was computed using the Bayesian approach and one of the point estimates, the value alone was not sufficient for evaluation purposes. Therefore, for the interval estimate, because all HPDs of the obtained frequency function were either positive or negative, we assumed this was significant and defined the median of the HPD as the visit probability to each category [14]. HPD is not the same as the probability; it may become larger than 1 or less than −1 in value.


The statistical results for each keyword are shown in Tables 1-4 and Figures 6-9.

When a visitor referred to the keyword “clinic name,” the HPDs to the main page, the website (again), and the contents page were positive. When a visitor referred to the keyword “clinic name + regional name,” the HPDs to the website (again) and the mammography screening page were negative. When a visitor referred to the keyword “clinic name + medical examination,” the HPD to the main page was positive and that to the information page was negative. When a visitor referred to the keyword “mammography screening,” the HPD to the mammography screening was positive.

Table 1. Posterior distribution presumption result by keyword “clinic name”.
   95% HPDa interval 
ContentsPosterior meanSDb2.50%Median97.50%Convergence
Top page0.850.290.30.851.43
Top page (again)0.540.170.20.540.87
News0.120.41−0.680.110.93
Contents0.480.170.140.470.81
Mammography screening0.030.15−0.260.040.33
Information0.320.17−0.010.320.65
Holiday duty hospital−0.270.19−0.65−0.270.11
Others0.270.2−0.120.280.68

aHPD: highest posterior density.

bSD: standard deviation.

Table 2. Posterior distribution presumption result by keyword “clinic name + regional name”.
   95% HPDa interval 
ContentsPosterior meanSDb2.50%Median97.50%Convergence
Top page0.310.28−0.230.30.85
Top page (again)−0.480.17−0.81−0.48−0.15
News0.330.37−0.380.331.05
Contents−0.230.16−0.55−0.230.1
Mammography screening−0.50.15−0.79−0.5−0.22
Information0.010.16−0.310.010.32
Holiday duty hospital0.090.19−0.270.090.46
Others−0.060.2−0.46−0.060.34

aHPD: highest posterior density.

bSD: standard deviation.

Table 3. Posterior distribution presumption result by keyword “clinic name + medical examination”.
   95% HPDa interval 
ContentsPosterior meanSDb2.50%Median97.50%Convergence
Top page17.019.611.2615.9637.08
Top page (again)−0.020.45−0.92−0.010.86
News13.9412.05−0.5910.4940.41
Contents0.010.41−0.7800.85
Mammography screening−0.410.4−1.24−0.390.34
Information−0.880.43−1.80−0.86−0.09×
Holiday duty hospital0.250.48−0.710.251.17
Others0.030.56−1.170.061.06

aHPD: highest posterior density.

bSD: standard deviation.

Table 4. Posterior distribution presumption result by keyword “mammography screening”.
   95% HPDa interval 
ContentsPosterior meanSDb2.50%Median97.50%Convergence
Top page−2.340.51−3.35−2.33−1.37
Top page (again)−0.270.4−1.07−0.270.51
News−0.290.75−1.79−0.291.17×
Contents−0.50.33−1.16−0.50.14
Mammography screening14.846.593.3814.5826.66×
Information−0.710.35−1.41−0.71−0.05×
Holiday duty hospital0.410.4−0.370.411.22
Others−0.430.46−1.38−0.420.42

aHPD: highest posterior density.

bSD: standard deviation.

Next, we showed the results of the simulations regarding the time it took for a visitor to refer to the keyword “clinic name.” Figure 10 shows the presumed posterior distribution, and the horizontal axis is the value of parameter beta. The vertical axis is probability density. Posterior distribution obtained from this simulation was a unimodal distribution. The posterior distribution obtained by the vertical axis expressing probability density was a unimodal distribution.

Figure 11 shows the sampling convergence by MCMC, with the vertical axis as the beta value and the horizontal axis as the sampling number.

Figure 12 shows the ACF obtained by the simulation. For the keyword “clinic name,” the autocorrelation was small, and it was fully completed [15].

Figure 6. HPD of posterior distribution by keyword "clinic name".
View this figure
Figure 7. HPD of posterior distribution by keyword "clinic name + regional name".
View this figure
Figure 8. HPD of posterior distribution by keyword “clinic name + medical examination.”.
View this figure
Figure 9. HPD of posterior distribution by keyword "mammography screening".
View this figure
Figure 10. Posterior distribution by keyword "clinic name".
View this figure
Figure 11. The simulation convergence situation by keyword "clinic name".
View this figure
Figure 12. ACF by keyword "clinic name".
View this figure

Analysis of Searching Behavior of Medical Institution’s Website Visitors

From the MCMC results, the ACF converged on most pages. This means that we obtained consistent results. Therefore, we expect our results to be generally valid.

When a visitor referred to the keyword “clinic name,” the visit probability to the main, website (again), and contents pages addressing medical examinations was positive. Thus, search by keyword “clinic name” had the effect of increasing the probability of visits to the main, website (again), and contents pages. In particular, it is possible that the primary concern of a visitor who referred to a keyword “clinic name” was to reach the contents page addressing medical examinations. The visit probability to the holiday duty hospital page was negative. Visitors to the holiday duty hospital information page did not refer to a clinic name, and it is possible that many people visited this page from other linked pages.

When a visitor referred to the keyword “clinic name and regional name,” the visit probability to the website (again) and the mammography screening page was negative. Search by keyword “clinic name and regional name” had the effect of decreasing the probability of visits to the website (again) and mammography screening pages. The visit probability to the website (again) was also low. The visitor using this keyword did not visit the website for a second time within the same session, so, it is likely that they were uninterested in the mammography screening page.

When visitors referred to the keyword “clinic name and medical examination,” the visit probability to the main page was positive, and the visit probability to the information page was negative. Search by keyword “clinic name and medical examination” had the effect of increasing the probability of visits to the main page and decreasing the probability of the visits to the mammography screening page. Some visitors who visited the contents of the medical examination in the keywords had a low probability of visiting the contents page of the medical examination. In such instances, we thought the visitor did not get the information they wanted. We concluded that the website did not lead its visitors to the page having the information they required. We suggest that the hospital administration changes its webpage design to address this problem.

When visitors referred to the keyword “mammography screening,” the visit probability to the mammography screening page was positive. Thus, the website did lead visitors who visited by the keyword “mammography screening” to the page they wanted. Search by keyword “mammography screening” had the effect of increasing the probability of visits to the mammography page. This indicated that the visitors could arrive at the page that they wanted. In this area, medical institutions that have implemented mammography screening are not many. Therefore, the results are expected.

As the visit probability to the main and information pages was negative, we concluded that visitors had no interest in these pages. Information about access to the clinic was published on the website. As the visit probability to the website was low, we concluded that visitors who referred to the keyword “mammography screening” had not yet become patients of the clinic.

These results reveal that the tendency of the visit probabilities in each category was different for different keywords. Therefore, it is possible to increase the visit probability to the page a visitor wants by understanding the search behaviors based on visitor needs and therefore improve website effectiveness.

Problems and Overview

This study identified 4 problems for consideration.

A Setup of an Interest Level

Page view, although used, could not be reflected in the result beyond recording the presence or absence of visits to each page, not the presence or absence of browsing behavior. How much browsing by visitors actually reflects their needs is unclear. Therefore, a model analysis that would include inspection time by visitors would improve our ability to gauge visitor interest.

Six-Month Study Period

The fact that medical institution patient numbers fluctuate with the seasons should be taken into account. In this research, the access log covered a period of 6 months only; therefore, the fluctuations in patient numbers by season were not considered. In the study of medical institution websites, seasonality has not been studied. As we believe that visitor access to website pages may also follow seasonal patterns, future research periods should also span one full year. Moreover, we think that a larger dataset is needed to accurately determine convergence in website access samples. Regardless of whether the amount of data used in this study was sufficient, we think that it is necessary to compare the analysis results using more data.

Problem of a Prior Distribution Setup

In this study, because we had access only to data from the access log collection period, a noninformation prior distribution was assumed, so, no actual past data are reflected. To model a prior distribution, Ueda et al developed a prior distribution with high flexibility based on a nonparametric Bayesian model [15]. Bayesian estimation allows the determination of a posterior distribution, considering past data. We would like to consider using this technique for future research.

In this study, we used a noninformation prior distribution as the prior distribution. By setting the collected data with the prior distribution, we will be able to build a new model of medical institution webpage browsing behavior. In addition, using beta values obtained from improved prior distributions to estimate the behavior of website visitors, it becomes possible to build a website appropriate to medical institutions.

Problem of Model Selection

Although the binomial probit model was used in this research, we could not verify the validity of the model. As the logit model, the classic Bayesian model, the nonparametric Bayesian model, and so forth are proposed in the literature as discrete selection models for use with a Bayesian approach, it is necessary to validate our model by comparing it with those of others [15-17]. One feature of medical institutions is that patient region and age groups differ by hospital scale, department, and region. In our research, because the only clinic website we targeted was the one near Sapporo, we could not address the characteristics of regionality or hospital scale. To better identify the browsing characteristics of visitors to the websites of many medical institutions, we would like to analyze other departments, regional areas, hospital scale, and so forth.

 Recently, branding has become a marketing technique for hospital networks, and many patients select hospitals by recognizing their brands. In these situations, some hospitals are adopting a differentiated marketing strategy. Moreover, they are beginning to undertake customer relationship management (CRM), recognizing the lifetime value of a customer. As a hospital is an organization providing medical treatment as a service, it essentially has the same marketing challenges of any other company, while varying to a considerable degree in terms of the services provided for the public benefit. However, research by Kim states that the health care field can effectively apply CRM as well as any other field. The Bayesian approach used in this research is also a useful technique in CRM. As it is possible to perform heterogeneity modeling between consumers, this tool can be developed as one of the database marketing strategies for medical treatment [18].

 For the modeling of heterogeneity among consumers, purchasing history data analysis, estimating heterogeneous price thresholds, and e-commerce site visitor behavior analysis has been conducted recently. However, behavior analysis on the websites of medical institutions remains unstudied [19-21].

 Selection of medical institutions, as in the case of selecting products and services other than medical services, is affected by such competitive relationships between the patient’s preferences and brand. In medical institution marketing activities, it is very important to know the variables. Therefore, it is necessary to further ascertain the heterogeneity between patients.

Conclusion

To clarify the information that citizens want when searching the Web, we developed a searching behavior model for visitors to a medical institution's website using a Bayesian approach and determined the visit probability to each category of interest, classified by search keyword. We targeted the website access log of a clinic near Sapporo, for the January 1 to June 31, 2011 period and determined the visit probability to each category using the predetermined search keywords. In the case of the keyword “clinic name,” the visit probability to the website, the website (again), and the contents page for medical examination was positive. For the holiday person-on-duty page, visit probability was negative. In the case of the keyword “clinic name and regional name,” the visit probability to the website (again) and the mammography screening page was negative. In the case of the keyword “clinic name + medical examination,” the visit probability to the website was positive, and the visit probability to the information page was negative. When visitors referred to the keywords “mammography screening,” the visit probability to the mammography screening page was positive. Further analysis for not only the clinic website but also various other medical institution websites is necessary to build a general inspection model for medical institution websites; we want to consider this in future research. In addition, we hope to use the results obtained in this study as a prior distribution for future work and to conduct higher precision analysis.

Acknowledgments

The authors gratefully acknowledge the work of past and present members of their laboratory. They would also like to thank the clinic and Dr. Katayama who provided data required for this study.

Conflicts of Interest

None declared.

Multimedia Appendix 1

R-code.

PDF File (Adobe PDF File), 15KB

  1. Fukunaga W, Satomura Y. The environment and conditions for the establishment of Internet medicine. The Chiba Medical Society 2005;81:47-57.
  2. The text of medical management talent. Japan: Ministry of Economy, Trade and Industry   URL: http://www.meti.go.jp/report/whitepaper/ [accessed 2016-07-01] [WebCite Cache]
  3. Murakami T, Suyama A, Orihara R, TOSHIBA Corp. Research & Development Center. Consumer behavior modeling using Bayesian networks. 2006 Feb 11 Presented at: Conference of the Japanese Society for Artificial Intelligence; February 11, 2006; Japan. [CrossRef]
  4. Saito Y. Behavioral Analysis of Website Visitor by Markov model. The operations research of Japan 2007:46-47.
  5. Bucklin RE, Sismeiro C. A Model of Web Site Browsing Behavior Estimated on Clickstream Data. Journal of marketing research 2003;XL:249-267.
  6. Şenel T, Cengiz MA. A Bayesian Approach for Evaluation of Determinants of Health System Efficiency Using Stochastic Frontier Analysis and Beta Regression. Comput Math Methods Med 2016;2016:2801081 [FREE Full text] [CrossRef] [Medline]
  7. Nobuhiko T. Marketing Analysis by Bayes Modeling. Tokyo, Japan: Tokyo Denki University Press; 2008.
  8. Amano T, Nakazato H, Nakamura T. Web Mining Using Bayesian Method. The Institute of Electronics, Information and Communication Engineers 2004:43-48.
  9. Google Analytics.   URL: http://www.google.com [accessed 2016-07-01] [WebCite Cache]
  10. Yamamura M. Probit Analysis Based on ?National Livelihood Survey? Relevant to Use Nursing Care Service. Proceedings of Institute of Statistical Mathematics 2007 Jan 26:125-142.
  11. Tsuchida N. A review on discrete choice models in marketing science. Journal of Business and Institutions 2010;8:63.
  12. Satomura S. Data Science Learning Through R. Japan: Kyoritsu Shuppan Co., Ltd; 2010.
  13. Jin M. Data Science by R. Japan: Morikita Publishing Co., Ltd; 2007.
  14. Ogata Y. Biases and Uncertainties When Estimating the Hazard of the Next Nankai Earthquake. Journal of Geography 2001;110(4):602-614.
  15. Markov Chain Monte Carlo Method.   URL: http://www.omori.e.u-tokyo.ac.jp/ [accessed 2016-07-01] [WebCite Cache]
  16. Andrews RL, Ansari A, Currim IS. Hierarchical Bayes versus Finite mixture Conjoint Analysis models: A comparison of fit, prediction, and partworth recovery. Journal of Marketing Research 2002;39(1):87-98.
  17. Ueda N, Yamada T. Introduction to Nonparametric Bayesian Models. Bulletin of the Japan Society for Industrial and Applied Mathematics 2007;17(3):196-214.
  18. Kim Y, Shin DH. A Study on the Success Factor of Hospital CRM. Kawasaki Journal of Medical Welfare 2011;20:319-329.
  19. Motohashi Y, Higuchi T. Shijyoukouzou no Henka wo kouryo shita Brand Sentaku ni yoru Koubairireki Data no Kaiseki. Japan Institute of Marketing Science (The Japanese Journal of Marketing Research) 2013;21:37-59.
  20. Yamaguchi K. An Analysis of Website visit Behavior by a Hierarchical Bayesian Model Considering the Time Variation of Frequency. Japan Institute of Marketing Science (The Japanese Journal of Marketing Research) 2014;22:13-29.
  21. Terui N, Wirawan D. Estimating Heterogeneous Price Thresholds. Marketing Science 2006;25:384-391.


ACF: autocorrelation function
CRM: customer relationship management
MCMC: Markov Chain Monte Carlo
HPD: highest posterior density


Edited by G Eysenbach; submitted 16.09.15; peer-reviewed by H Potts, G Spyrou, E Nortey, NA Ismail; comments to author 29.03.16; revised version received 24.05.16; accepted 10.06.16; published 25.07.16

Copyright

©Teppei Suzuki, Yuji Tani, Katsuhiko Ogasawara. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 25.07.2016.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information must be included.