Objective User Engagement With Mental Health Apps: Systematic Search and Panel-Based Usage Analysis

doi:10.2196/14567

Original Paper

¹Department of Community Mental Health, University of Haifa, Haifa, Israel

²The Partnership for Drug-Free Kids, New York, NY, United States

³Psychiatry Research, The Zucker Hillside Hospital, Northwell Health, Glen Oaks, NY, United States

Corresponding Author:

Amit Baumel, PhD

Department of Community Mental Health

University of Haifa

Abba Khoushy Ave 199

Haifa, 3498838

Israel

Phone: 972 048240111

Email: abaumel@univ.haifa.ac.il

Background: Understanding patterns of real-world usage of mental health apps is key to maximizing their potential to increase public self-management of care. Although developer-led studies have published results on the use of mental health apps in real-world settings, no study yet has systematically examined usage patterns of a large sample of mental health apps relying on independently collected data.

Objective: Our aim is to present real-world objective data on user engagement with popular mental health apps.

Methods: A systematic engine search was conducted using Google Play to identify Android apps with 10,000 installs or more targeting anxiety, depression, or emotional well-being. Coding of apps included primary incorporated techniques and mental health focus. Behavioral data on real-world usage were obtained from a panel that provides aggregated nonpersonal information on user engagement with mobile apps.

Results: In total, 93 apps met the inclusion criteria (installs: median 100,000, IQR 90,000). The median percentage of daily active users (open rate) was 4.0% (IQR 4.7%) with a difference between trackers (median 6.3%, IQR 10.2%) and peer-support apps (median 17.0%) versus breathing exercise apps (median 1.6%, IQR 1.6%; all z≥3.42, all P<.001). Among active users, daily minutes of use were significantly higher for mindfulness/meditation (median 21.47, IQR 15.00) and peer support (median 35.08, n=2) apps than for apps incorporating other techniques (tracker, breathing exercise, psychoeducation: medians range 3.53-8.32; all z≥2.11, all P<.05). The medians of app 15-day and 30-day retention rates were 3.9% (IQR 10.3%) and 3.3% (IQR 6.2%), respectively. On day 30, peer support (median 8.9%, n=2), mindfulness/meditation (median 4.7%, IQR 6.2%), and tracker apps (median 6.1%, IQR 20.4%) had significantly higher retention rates than breathing exercise apps (median 0.0%, IQR 0.0%; all z≥2.18, all P≤.04). The pattern of daily use presented a descriptive peak toward the evening for apps incorporating most techniques (tracker, psychoeducation, and peer support) except mindfulness/meditation, which exhibited two peaks (morning and night).

Conclusions: Although the number of app installs and daily active minutes of use may seem high, only a small portion of users actually used the apps for a long period of time. More studies using different datasets are needed to understand this phenomenon and the ways in which users self-manage their condition in real-world settings.

J Med Internet Res 2019;21(9):e14567

doi:10.2196/14567

Keywords

user engagement (78); usage (83); adherence (341); retention (60); mental health (1978); depression (1167); anxiety (782); mHealth (3362)

The wide dissemination of mobile phone devices and the leap in the development and distribution of mobile health (mHealth) apps have altered the ways in which scholars conceptualize care management in the behavioral health domain. The conversation has shifted from patients and providers to individuals who can now engage in self-care around the clock outside of traditional health care settings (eg, [Krishna S, Boren S, Balas E. Healthcare via cell phones: a systematic review. Telemed J E Health 2009 Apr;15(3):231-240. [CrossRef] [Medline]1-Naslund J, Marsch L, McHugo G, Bartels S. Emerging mHealth and eHealth interventions for serious mental illness: a review of the literature. J Ment Health 2015;24(5):321-332 [FREE Full text] [CrossRef] [Medline]3]). Approximately 77% of the US adult population, and more than 89% of those younger than 50 years, now own a mobile phone [Pew Research Center. 2019 Jun 12. Mobile fact sheet URL: http://www.pewinternet.org/fact-sheet/mobile/ [accessed 2019-01-07] [WebCite Cache]4,WalkerSands Communications. 2019. How technology is expanding the scope of online commerce beyond retail URL: https://www.walkersands.com/resources/the-future-of-retail-2018/ [accessed 2019-01-07] [WebCite Cache]5] where they can store and use computerized apps. This widespread use has established a market for mHealth apps. Accordingly, a 2015 World Health Organization survey identified approximately 15,000 mobile apps for health care, with at least 29% designed for mental health [Anthes E. Mental health: there's an app for that. Nature 2016 Apr 7;532(7597):20-23. [CrossRef] [Medline]6].

The use of unguided apps has the potential to increase access to care in a scalable manner by reducing the costs associated with service uptake [Kazdin A. Addressing the treatment gap: a key challenge for extending evidence-based psychosocial interventions. Behav Res Ther 2017 Jan;88:7-18. [CrossRef] [Medline]7,Baumel A, Baker J, Birnbaum ML, Christensen H, De Choudhury M, Mohr DC, et al. Summary of key issues raised in the Technology for Early Awareness of Addiction and Mental Illness (TEAAM-I) meeting. Psychiatr Serv 2018 May 01;69(5):590-592. [CrossRef] [Medline]8]. However, the impact of digital interventions is limited by their ability to engage users in therapeutic activities and to support user adherence to the therapeutic process [Eysenbach G. The law of attrition. J Med Internet Res 2005;7(1):e11 [FREE Full text] [CrossRef] [Medline]9,Christensen H, Mackinnon A. The law of attrition revisited. J Med Internet Res 2006;8(3):e20. [Medline]10]. Digital interventions require individuals to engage with self-care outside of traditional settings; therefore, individuals’ engagement must compete with other events in their daily lives and endure fluctuating motivation to be involved in effortful behavior [Baumeister R, Vohs K. Self-regulation, ego depletion, and motivation. Social Pers Psych Compass 2007 Nov;1(1):115-128. [CrossRef]11]. As a result, user engagement with mobile apps and websites across the behavior change spectrum is low in the absence of human support [Kohl LF, Crutzen R, de Vries NK. Online prevention aimed at lifestyle behaviors: a systematic review of reviews. J Med Internet Res 2013;15(7):e146 [FREE Full text] [CrossRef] [Medline]12-Day J, Sanders M. Do prents benefit from help when completing a self-guided parenting program online? A randomized controlled trial comparing Triple P Online with and without telephone support. Behav Ther 2018;49(6):1020-1038.14]. Furthermore, various studies have suggested that most users of unguided Web-based programs exit websites before the full completion of the offered program [Eysenbach G. The law of attrition. J Med Internet Res 2005;7(1):e11 [FREE Full text] [CrossRef] [Medline]9,Christensen H, Mackinnon A. The law of attrition revisited. J Med Internet Res 2006;8(3):e20. [Medline]10,Glasgow RE. eHealth evaluation and dissemination research. Am J Prev Med 2007 May;32(5 Suppl):S119-S126. [CrossRef] [Medline]15,Farvolden P, Denisoff E, Selby P, Bagby RM, Rudy L. Usage and longitudinal effectiveness of a Web-based self-help cognitive behavioral therapy program for panic disorder. J Med Internet Res 2005;7(1):e7 [FREE Full text] [CrossRef] [Medline]16]. For example, Christensen and colleagues [Christensen H, Griffiths KM, Korten AE, Brittliffe K, Groves C. A comparison of changes in anxiety and depression symptoms of spontaneous users and trial participants of a cognitive behavior therapy website. J Med Internet Res 2004 Dec 22;6(4):e46 [FREE Full text] [CrossRef] [Medline]17] reported that less than 1% of users completed all modules in MoodGym, an open-access website for depression. In a systematic review of published articles reporting real-world user engagement with unguided programs for depression, anxiety, or mood enhancement, Fleming and colleagues [Fleming T, Bavin L, Lucassen M, Stasiak K, Hopkins S, Merry S. Beyond the trial: systematic review of real-world uptake and engagement with digital self-help interventions for depression, low mood, or anxiety. J Med Internet Res 2018 Jun 06;20(6):e199 [FREE Full text] [CrossRef] [Medline]18] reported that 7% to 42% of users of Web- and app-based programs engaged in moderate use (completing between 40% and 60% of modular fixed-length programs or continuing to use the app after 4 weeks). For example, the developers of the PTSD Coach mobile app reported a usage decline over time, with 41.6% continuing to use the app 1 month after installation and 19.4% after 6 months [Owen JE, Jaworski BK, Kuhn E, Makin-Byrd KN, Ramsey KM, Hoffman JE. mHealth in the wild: using novel data to examine the reach, use, and impact of PTSD Coach. JMIR Ment Health 2015;2(1):e7 [FREE Full text] [CrossRef] [Medline]19]. Among Happify mobile app users, 3.5% completed a 6-week assessment. However, the authors noted that these users might have completed assessments without engaging in other content [Carpenter J, Crutchley P, Zilca RD, Schwartz HA, Smith LK, Cobb AM, et al. Seeing the "big" picture: big data methods for exploring relationships between usage, language, and outcome in internet intervention data. J Med Internet Res 2016 Aug 31;18(8):e241 [FREE Full text] [CrossRef] [Medline]20] (see [Fleming T, Bavin L, Lucassen M, Stasiak K, Hopkins S, Merry S. Beyond the trial: systematic review of real-world uptake and engagement with digital self-help interventions for depression, low mood, or anxiety. J Med Internet Res 2018 Jun 06;20(6):e199 [FREE Full text] [CrossRef] [Medline]18] for a review).

Understanding patterns of real-world usage of e-mental health apps outside of empirical trials is key to maximizing the potential of apps to increase the public self-management of care. Utilization in real-world settings may differ from that in study settings for several reasons. First, empirical study settings include enrollment and assessment procedures that are not part of real-world utilization of the app, as trials largely emphasize internal validity over real-world generalizability [Mohr DC, Weingardt KR, Reddy M, Schueller SM. Three problems with current digital mental health research…and three things we can do about them. Psychiatr Serv 2017 May 01;68(5):427-429. [CrossRef] [Medline]13]. Ebert and Baumeister [Ebert D, Baumeister H. Internet-based self-help interventions for depression in routine care. JAMA Psychiatry 2017 Aug 01;74(8):852-853. [CrossRef] [Medline]21] claim, for example, that within randomized trials “the securing of commitment represents an adherence-promoting element in self-help interventions.” It is reasonable to assume that the human contact provided by research coordinators, provision of ongoing assessments, and reimbursement to incentivize the completion of assessments—none of which are available in real-world use—impact engagement patterns with the interventions. Second, from an external validity perspective, recruitment challenges in trials are often addressed by increasing the reach to potential participants through the expansion of participating venues and the refinement of social media strategies [Mohr DC, Weingardt KR, Reddy M, Schueller SM. Three problems with current digital mental health research…and three things we can do about them. Psychiatr Serv 2017 May 01;68(5):427-429. [CrossRef] [Medline]13]. In this way, researchers unintentionally recruit people who are much more likely to adhere to e-mental health technologies than people in the general population who download and try available programs “in the wild.” Such assumptions are supported by a systematic review of internet interventions for anxiety and depression, which found that the rates of attrition in randomized controlled trials were lower than the reported dropout rates from open-access websites [Christensen H, Griffiths K, Farrer L. Adherence in internet interventions for anxiety and depression. J Med Internet Res 2009 Apr 24;11(2):e13. [CrossRef] [Medline]22].

Overall, there is a need to understand how the general population engages with the most popular unguided mobile apps targeting anxiety, depression, or emotional well-being, and whether there is a difference in how individuals engage with these apps depending on the mental health focus or incorporated techniques. Although some developer-led studies have published results on the use of individual mental health apps deployed in real-world settings, to the best of our knowledge, no study has examined a large sample of mental health apps relying on independently collected data. This investigation is feasible by leveraging the big data commonly generated and stored by digital platforms that record user traffic in the wild [Moller A, Merchant G, Conroy D, West R, Hekler E, Kugler KC, et al. Applying and advancing behavior change theories and techniques in the context of a digital health revolution: proposals for more effectively realizing untapped potential. J Behav Med 2017 Feb;40(1):85-98 [FREE Full text] [CrossRef] [Medline]23,Baumel A, Yom-Tov E. Predicting user adherence to behavioral eHealth interventions in the real world: examining which aspects of intervention design matter most. Transl Behav Med 2018 Sep 08;8(5):793-798. [CrossRef] [Medline]24]. Leveraging such data, this examination provides benchmarks of app usage in the real world, where the general public is expected to benefit from their engagement with unguided programs. This information could shed light on specific engagement problems and opportunities for new intervention development and may offer a resource for researchers and developers who want to study and compare their app performance with similar apps.

For this study, a panel provided objective aggregated nonpersonal data on user engagement with mobile apps to analyze patterns of mental health app usage. The three primary aims were to (1) describe common usage patterns of popular unguided apps based on available metrics, (2) identify patterns of user retention over the first 30 days after app installation, and (3) explore whether these patterns differ based on the app’s mental health focus and primary incorporated techniques.

Search Strategy

The search strategy aimed at identifying the most-installed unguided apps targeting depression, anxiety-related problems, or mental health. We used keywords related to depression and anxiety because of the high prevalence of these conditions [Demyttenaere K, Bruffaerts R, Posada-Villa J, Gasquet I, Kovess V, Lepine JP, WHO World Mental Health Survey Consortium. Prevalence, severity, and unmet need for treatment of mental disorders in the World Health Organization World Mental Health Surveys. JAMA 2004 Jun 02;291(21):2581-2590. [CrossRef] [Medline]25,World Health Organization. The World Health Report 2001. Mental Health: New Understanding, New Hope. Geneva: World Health Organization; 2001.26]. We also included mental health apps that focused on happiness or the enhancement of mental health (ie, mindfulness meditations) because our previous work identified them as highly popular mental health tools [MindTools.io. URL: https://mindtools.io/ [accessed 2018-06-24] [WebCite Cache]27,Baumel A, Kane J. Examining predictors of real-world user engagement with self-guided eHealth interventions: analysis of mobile apps and websites using a novel dataset. J Med Internet Res 2018 Dec 14;20(12):e11491 [FREE Full text] [CrossRef] [Medline]28]. We conducted a systematic engine search of the Google Play Store in November 2018 using the following terms: “depression” OR “mood” OR “anxiety” OR “panic attack” OR “phobia” OR “social phobia” OR “PTSD” OR “posttraumatic stress disorder” OR “stress reduction” OR “worry relief” OR “OCD” OR “obsessive compulsive disorder” OR “mental health” OR “emotional well-being” OR “happiness.” One researcher documented all the apps emerging from the first 100 search results of each keyword, removed duplicates, and sorted them alphabetically. We also included a manual search of apps presented on MindTools.io [MindTools.io. URL: https://mindtools.io/ [accessed 2018-06-24] [WebCite Cache]27] and PsyberGuide [PsyberGuide. URL: http://psyberguide.org/ [accessed 2015-04-10] [WebCite Cache]29].

App Screening and Inclusion Criteria

Determining Apps’ Number of Installs Threshold

To avoid including apps without a representative number of users, and to determine a minimum threshold for inclusion, we assessed the install categories presented by Google Play based on the number of app installs (eg, 10,000, 50,000 installs). Table 1 presents a preliminary analysis of the number of identified apps in each install category and the aggregated minimum number of app installs and corresponding percentages. Included apps had at least 5000 installs after removing any nonrelevant apps based on their title (ie, apps that were clearly not targeted at emotional well-being such as Heart Rate Monitor & Pulse Checker, 7 Minute Workout, 30 Day Fitness Challenge). Adding all the apps in the 5000 installs category would have resulted in a less than 0.5% increase in the total sample of users. Therefore, we determined an inclusion threshold of 10,000 app installs. Table 1 also shows that a small number of apps within the higher install categories were responsible for the most app installs. To make sure that including a large portion of apps with a relatively smaller number of installs (eg, <10,000 app installs) would not bias the results, we also examined whether there was a difference in the pattern of results based on the number of app installs. This will be further explained in the data analysis section.

Table 1. Analysis of install categories based on the number of apps in each category.

Install category	Apps identified, n	Minimum identified app installs within this category^a, n	Cumulative frequency of app installs based on category threshold^b, n	Added percentage of installs to the overall sample^c, %
≥10,000,000	2	20,000,000	20,000,000	100.00
5,000,000-9,999,999	6	30,000,000	50,000,000	60.00
1,000,000-4,999,999	21	21,000,000	71,000,000	29.58
500,000-999,999	23	11,500,000	82,500,000	13.94
100,000-499,999	69	6,900,000	89,400,000	7.72
50,000-99,999	33	1,650,000	91,050,000	1.81
10,000-49,999	103	1,030,000	92,080,000	1.12
5000-9999	66	330,000	92,410,000	0.36

^aThe number of apps multiplied by the minimum number of installs based on the install category.

^bThe accumulated number of app installs in all install categories above and including the current install category.

^cThe added percentage of installs to the total sample if the current install category is added to the analysis; it represents the percentage of the total number of app installs within this category divided by the accumulated number of app installs based on the current category threshold.

Inclusion and Exclusion Criteria

To be included in this review, apps had to:

Be in English;
Have at least 10,000 installs documented on Google Play;
Focus on mental illness, mental health, or emotional well-being not specifically related to another medical condition (for example, we excluded apps specifically focused on stress reduction due to a physical medical issue such as heart attack); and
Incorporate recognized techniques aimed at promoting self-management of mental health problems such as coping with negative symptoms (eg, feeling nervous, loss of energy), achieving positive results (eg, feeling better), or symptom management (eg, mood tracking). We excluded apps focused on the incorporation of sham techniques (see
Multimedia Appendix 1
Definition of sham techniques.
PDF File (Adobe PDF File)75 KBMultimedia Appendix 1 for a definition of sham techniques).

We excluded apps that:

Required payment for installation or provided a free trial only for a limited amount of time because it would be expected to bias program usage (free to install apps that included in-app purchases were not excluded);
Were therapist-based (eg, telepsychiatry) because the study was focused on unguided interventions; and
Were not meant to be used for more than a few times (eg, tests, one-time exposure technique) or were merely magazines.

Two independent reviewers screened the apps based on the inclusion and exclusion criteria. All disagreements were discussed with a third author with reference to the apps until consensus was reached.

Coding

Two independent reviewers coded the apps’ incorporated techniques based on the following categories: mindfulness/meditation, tracker (including diary or journal), psychoeducation, peer support, and breathing exercise (not exercised as part of a meditation program). These categories were based on previous work done on the therapeutic components of mental health apps [MindTools.io. URL: https://mindtools.io/ [accessed 2018-06-24] [WebCite Cache]27,Baumel A, Birnbaum ML, Sucala M. A systematic review and taxonomy of published quality criteria related to the evaluation of user-facing eHealth programs. J Med Syst 2017;41(8):128. [CrossRef] [Medline]30], drawing on the thematic analysis method suggested by Braun and Clarke [Braun V, Clarke V. Using thematic analysis in psychology. Qual Res Psychol 2006 Jan;3(2):77-101. [CrossRef]31]. The categories were designed to represent nonoverlapping components of potential therapeutic engagement (see

Multimedia Appendix 2

Definition of coded techniques.

PDF File (Adobe PDF File)62 KB Multimedia Appendix 2 for definitions of categories). Although our goal was to identify how specific techniques related to patterns of app use, our metrics did not enable us to differentiate between various techniques incorporated within the same app (ie, we could not tell which parts in the app the users were using). Therefore, we also added a coding of “primary technique” in cases where the app mostly incorporated one technique that was deemed to be the main reason for the app’s use (eg, mindfulness/meditation). It is important to note that this limitation did not enable us to include app features that might influence user engagement but were not identified as a primary incorporated technique. Similarly, it was not feasible to target specific theoretical modalities, such as cognitive behavioral therapy. Because nearly all apps included some components of cognitive behavioral therapy, these were impossible to dismantle given our data.

An app’s mental health focus was determined in the following manner: first, the app’s description had to explicitly state that it targeted people with [mental health focus] and, second, most of the techniques used within the app had to have been built to help users cope with or manage their symptoms directly related to the mental health focus. We grouped apps based on several mental health foci. Under “mental health problems,” we included apps that were focused on supporting people coping with depression, anxiety-related disorders, and emotional difficulties. We also subcoded the app with the terms (a) anxiety-related disorders or (b) depression if the app specifically targeted only one of these aims. (During our coding process, we did not identify another theme for the remaining apps.) Under “happiness,” we included apps that focused on nurturing happiness or general positivity (eg, exercising gratitude, happiness assessment, suggestions for activities nurturing positive feelings), rather than the management of mental health states or problems.

During our coding process, we found a greater ambiguity around the description of apps with a primary incorporated technique of mindfulness/meditation, which leaned more toward enhancing emotional well-being (ie, helping users achieve a positive sense of experience and good mental health), but also aimed at stress reduction. Therefore, we grouped mindfulness and meditation apps separately and did not attribute either of the two mental health foci to them. For this reason, and to enable a proper comparison between categories, we present the mindfulness/meditation category in both the mental health focus and technique outcomes, despite being the same results.

A Cohen kappa interrater agreement of .92 was obtained for coding the variables of interest (incorporated technique, primary technique, and mental health focus). All disagreements were discussed with a third author with reference to the apps until consensus was reached.

Behavioral Data on User Engagement in the Real World

Information on user traffic was obtained from SimilarWeb’s Pro panel data [SimilarWeb. 2018. URL: https://www.similarweb.com/ [accessed 2018-06-18] [WebCite Cache]32]. The panel provides aggregated nonpersonal information on user engagement with websites and mobile apps all over the world to enable Web and mobile app traffic research and analytics. The panel is based on several sources of anonymized usage data, such as data obtained from consenting users of mobile apps (ie, products). A dedicated product team at SimilarWeb is responsible for building and partnering with hundreds of high-value consumer products that make up the panel. According to SimilarWeb, the products are used across diverse audiences, without cluttering the user with advertisements. While benefiting from the products, users contribute to the panel because they enable the documentation of their online or mobile app usage activities seamlessly and anonymously [SimilarWeb. 2018. URL: https://www.similarweb.com/ [accessed 2018-06-18] [WebCite Cache]32]. The data are not used by SimilarWeb or provided to any third parties for the purposes of marketing, advertising, or targeting of individual subjects. The data-gathering procedures comply with data privacy laws, including the way data are collected, anonymized, stored, secured, and used. These procedures are updated regularly based on evolving data privacy legislation and requirements, such as the European Union’s General Data Protection Regulation [European Commission. Data protection URL: https://ec.europa.eu/info/law/law-topic/data-protection_en [accessed 2018-06-18] [WebCite Cache]33].

Our examination of data validity was tested and presented in a previous study [Baumel A, Kane J. Examining predictors of real-world user engagement with self-guided eHealth interventions: analysis of mobile apps and websites using a novel dataset. J Med Internet Res 2018 Dec 14;20(12):e11491 [FREE Full text] [CrossRef] [Medline]28]. An Oath researcher [Oath. URL: https://www.oath.com/ [accessed 2018-10-14] [WebCite Cache]34] (RW) examined 30 randomly selected mobile apps with data on SimilarWeb and usage data in Oath’s independent records. The researcher examined the correlation between the average number of user sessions per day in the two datasets, finding a very strong Spearman correlation (N=30, r=.77, P<.001). In our study, we also examined the Spearman correlation between app install categories presented on Google Play (eg, 10,000, 50,000) and the number of downloads documented on SimilarWeb, and found a very strong correlation (N=93, r=.81, P<.001). These findings suggest a sufficient convergent validity, which is recommended to be above .70 [Carlson KD, Herdman AO. Understanding the impact of convergent validity on research results. Organ Res Methods 2010 Dec 30;15(1):17-32. [CrossRef]35].

The study was approved by University of Haifa Institutional Review Board, Haifa, Israel. The measures were set to include data gathered over a 12-month period from August 1, 2017, to July 31, 2018. For each app, available metrics on the panel included app open rate (the average percentage of daily active users out of the total sample of people who currently had the app installed), average number of sessions in a day per daily active user, and average daily minutes of use per daily active user. User 30-day retention included the percentage of users who opened the app each day between day 1 and day 30 out of the number of users who installed and opened the app on day 0. Usage patterns by time were available only for apps with a very large number of users. It was represented by two metrics—average percentage of use per hour (24 hours; eg, 7:00 am, 8:00 am) and per day (7 days; eg, Sunday, Monday)—both calculated based on total app usage.

Data Analysis

We did not assume a normal distribution of the metrics; therefore, medians and interquartile ranges (IQRs) were used as descriptive statistic measures. In cases in which a category included a small number of apps (n≤5), we used range instead of IQR. To examine differences in usage metrics between apps with different mental health foci or techniques, a Kruskal-Wallis one-way analysis of variance (ANOVA) was performed, followed by Mann-Whitney U tests to identify the source of the difference. To examine dependencies in the distribution of categorical values in relevant cases, we used chi-square tests. Most app installs came from a small number of apps with a large number of installs (see Table 1), so we conducted a sensitivity analysis to examine whether including apps with a smaller number of installs would bias the results. Mann-Whitney U tests were conducted to compare the distributions of the usage patterns for the top 5 installed apps and the remaining apps from each category presented in the results section (and that included more than five apps). We picked the top 5 apps based on their install category in Google Play. In cases in which several apps “competed” for the fifth place in the same install category, the app with the higher number of downloads (as documented in the SimilarWeb user panel) was chosen.

Screening

Figure 1 presents the app inclusion flow diagram. The engine search and manual searches produced a total of 386 apps with 10,000 installs or more. Through the first screening process, 299 apps were identified and accessed for a detailed evaluation, and 93 apps were finally included in this study analysis (see

Multimedia Appendix 3

List of included apps.

PDF File (Adobe PDF File)72 KB Multimedia Appendix 3 for a complete list of included apps).

Description of Apps

The mental health focus of 59 (63%) apps was a mental health problem. Of these, 19 focused specifically on anxiety-related disorders and 4 focused specifically on depression. In addition, 8 (9%) apps focused on happiness, and 26 (28%) apps focused on the enhancement of emotional well-being through mindfulness/meditation. The distribution of apps based on incorporated techniques is presented in Table 2. Overall, 60 of 93 (65%) apps had a primary incorporated technique, and 33 (36%) apps had two or more incorporated techniques, none of which were primary. Mindfulness/meditation was the most frequent technique as the primary technique of the app (26/93, 28%), followed by use of a tracker (22/93, 24%). Psychoeducation (35/93, 38%) was the most frequent salient technique to be used not as the primary technique, followed by use of a tracker (28/93, 30%).

Table 2. Distribution of incorporated techniques in the app sample (N=93).

Incorporated technique	Primary technique, n (%)	Cotechnique^a, n (%)	Total, n (%)
Mindfulness/meditation	26 (28)	14 (15)	40 (43)
Tracker	22 (24)	28 (30)	50 (54)
Breathing exercise	7 (8)	20 (22)	27 (29)
Psychoeducation	3 (3)	35 (38)	38 (41)
Peer support	2 (2)	7 (8)	9 (1)

^aThe technique is saliently presented in the app but is not considered a primary technique.

App Usage by Daily Active Users

All apps had complete metrics on app usage by daily active users. Medians and IQRs of daily app usage are presented in Table 3 based on the app’s mental health focus and in Table 4 based on the app’s incorporated techniques. As shown in Table 3, the median app open rate was 4.0% (IQR 4.7%), with medians of 3.28 (IQR 2.53) daily sessions and 13.03 (IQR 14.27) minutes of app use per active user. Daily active usage of mindfulness/meditation apps (median 21.47, IQR 15.00) was found to be significantly higher than the usage of apps for mental health problems (median 10.02, IQR 10.60; z=4.64, P<.001) or for happiness (median 7.77, IQR 6.90; z=3.82, P<.001). No other significant difference in app usage was found between mental health foci, including between anxiety- and depression-related apps. As seen in Table 4, the number of app minutes of use was significantly higher for mindfulness/meditation (median 21.47, IQR 15.00) and peer support (median 35.08, n=2) than for other techniques (all z ≥2.11, all P<.05). In addition, tracker (median 6.3%, IQR 10.2%) and peer support (median 17.0%, n=2) apps had significantly higher open rates than breathing exercise apps (median 1.6%, IQR 1.6%; all z ≥3.42, all P<.001). No significant differences in usage patterns were found for apps without a primary strategy that incorporated more than one technique.

Table 3. App usage based on app mental health focus (N=93).

Mental health focus		Apps, n	Installation category, median (IQR)	Open rate (%), median (IQR)	Daily number of sessions per active users, median (IQR)	Daily minutes of use per active user, median (IQR)^a
All apps		93	100,000 (90,000)	4.0 (4.7)	3.28 (2.53)	13.03 (14.27)
Mental health problems		59	50,000 (90,000)	4.0 (5.1)	3.77 (3.15)	10.02 (10.60)^*
	Anxiety	19	10,000 (40,000)	2.6 (2.5)	3.58 (3.49)	08.17 (09.42)
	Depression	4	100,000 (50,000-100,000^b)	4.8 (3.0-6.8^b)	5.22 (3.97-6.55^b)	06.97 (02.05-15.12^b)
Happiness		8	100,000 (50,000)	3.7 (5.3)	3.50 (4.18)	7.77 (6.90)^*
Mindfulness/meditation^c		26	100,000 (650,000)	4.1 (3.3)	2.96 (1.66)	21.47 (15.00)^**

^aCategories with different number of asterisks (*, **) within a column are significantly different (P<.05) based on our analytical approach, which included Kruskal-Wallis one-way ANOVA at the variable level, followed by Mann-Whitney U tests.

^bDue to a small number of included apps, brackets in this cell reflect the range (minimum-maximum value) and not the IQR.

^cMindfulness/meditation is presented as a separate mental health focus because all apps in this category were not attributed to another focus as they focus on enhancement of well-being as well as stress reduction.

Table 4. App usage based on app incorporated technique (N=93).

Incorporated technique		Apps, n	Installation category, median (IQR)	Open rate (%), median (IQR)^a	Sessions per active user, median (IQR)	Daily minutes of use per active user, median (IQR)^a
Primary technique
	Mindfulness/meditation	26	100,000 (650,000)	4.1 (3.3)	2.96 (1.66)	21.47 (15.00)^*
	Tracker	22	50,000 (90,000)	6.3 (10.2)^*	4.58 (4.47)	07.27 (08.83)^**
	Breathing exercise^b	7	10,000 (40,000)	1.6 (1.6)^**	2.19 (1.23)	08.32 (19.02)^**
	Psychoeducation	3	10,000 (10,000-100,000^b)	3.0 (2.5-3.3^c)	4.16 (2.57-4.80^c)	03.53 (02.07-19.23^c)^**
	Peer support^d	2	300,000 (N/A^e)	17.0 (N/A)^*	8.67 (N/A)	35.08 (N/A)^*
Number of primary techniques
	2 techniques	17	50,000 (90,000)	4.0 (5.6%)	3.18 (1.40)	07.83 (11.93)
	≥3 techniques^f	16	100,000 (50,000)	3.2 (3.1%)	4.06 (3.91)	12.88 (07.13)

^bNot including mindfulness/meditation.

^cDue to the small number of included apps, brackets in this cell reflect the range (minimum-maximum value) and not the IQR.

^dDue to the small number of included apps, IQR or range could not be calculated (marked with N/A).

^eN/A: not applicable.

^fIncludes two apps that use a chatbot (Wysa, Woebot), which did not have a different pattern of results emerging for a certain direction.

User 30-Day Retention

Fifty-nine apps (63%) had data on user retention. Chi-square tests for independence revealed no difference between apps with or without user retention data in the distribution of mental health foci (χ²₂=2.1, P=.36) and primary incorporated techniques (χ²₄=3.8, P=.44). Figure 2 presents user 30-day retention by the app’s mental health focus; Figure 3 presents user 30-day retention by the app’s incorporated technique. In both figures, there is a sharp decline of more than 80% in app open rates between day 1 and day 10, whereas the differences between day 15 and day 30 are smaller and represent a decline of approximately 20% in app open rates. Figure 2 reveals that, relative to users who opened the app on day 0, the median app open rate was as follows: 69.4% (IQR 27.8%) of users opened it on day 1, 3.9% (IQR 10.3%) of users opened it on day 15, and 3.3% (IQR 6.2%) of users opened it on day 30. Kruskal-Wallis one-way ANOVAs revealed no significant differences in app open rates on day 30 based on mental health focus (H₂=1.88, P=.39) and a significant difference in app open rates on day 30 based on incorporated technique (H₅=11.31, P=.046). Mann-Whitney U tests revealed that on day 30 peer support (median 8.9%), mindfulness/meditation (median 4.7%, IQR 6.2%), and tracker/diary apps (median 6.1%, IQR 20.4%) had significantly higher retention rates than breathing exercise apps (median 0.0%, IQR 0.0%; all z ≥2.18, all P ≤.04). This pattern of difference is also descriptively apparent in 15-day retention, in which the median retention for breathing exercise apps was 0.0% (IQR 0.0%), whereas the range of medians for peer support, mindfulness/meditation, and tracker/diary apps was from 4.9% (IQR 7.1%) to 11.9% (IQR 0.7%).

Figure 2. App 30-day retention by mental health focus. The percentages reflect the number of users who opened the app from day 1 to day 30 out of the number of users who installed and opened the app on day 0.

Figure 3. App 30-day retention by primary incorporated technique. The percentages reflect the number of users who opened the app from day 1 to day 30 out of the number of users who installed and opened the app on day 0.

Usage Pattern by Hours and Days

Sixteen apps had data on hourly and daily app usage. Figure 4 presents the hourly usage patterns of apps and Figure 5 presents the daily usage patterns of apps. The number of apps with available data was small; therefore, we only present categories with data on more than three apps. Furthermore, we have not conducted statistical testing to compare program usage among the different categories. For hourly usage, the results pointed to a peak in app usage in the evening (8:00 pm) for apps targeting mental health problems. The results also showed that mindfulness/meditation apps had two usage peaks: one in the morning (7 am-9 am) and the other in the late evening (10 pm-midnight). In terms of daily usage, the results showed a peak in app usage on Thursday for mindfulness/meditation apps.

Sensitivity Analysis

We conducted a series of Mann-Whitney U tests to examine the difference in app open rate, number of sessions, daily minutes of use, and 30-day retention among the top 5 installed apps and the remaining apps per mental health focus and incorporated technique. We found a significant difference in the open rate of mental health apps favoring the top 5 installed apps (z=1.68, P ≤.05; top 5 installed apps: median 9.0%, IQR 6.9%; remaining apps: n=54, median 4.0%, IQR 4.7%). Among these five apps, one incorporated online peer support and three incorporated mood trackers. No other differences were found. A series of Mann-Whitney U tests was also conducted to examine whether app usage (app open rates, daily number of sessions, daily minutes of use) in each app category (mental health focus, incorporated technique) differed between apps with or without in-app purchases and no significant differences were found (all P>.05).

Figure 4. Hourly usage pattern. Usage is presented by hour out of the total app usage; therefore, the sum of percentages within each category is 100%. Note: a subset of apps for which that data were available is included; “All apps” includes both categories and one app targeting happiness.

Figure 5. Daily usage pattern. Percentage of app usage is presented by day out of the total app usage; therefore, the sum of percentages within each category is 100%. Note: a subset of apps for which that data were available is included; “All apps” includes both categories and one app targeting happiness.

Principal Findings

This is the first study to report the usage and retention metrics of a large number of frequently installed, unguided mental health apps as recorded “in the wild” and independent of developer-led data. Based on Google Play Store data (using keyword search terms), there were over 90 million mental health app installs documented by the end of 2018 (ie, reach). Although our findings revealed that daily active users use apps for a significant amount of time during the day (daily usage median of 13.03 minutes), most people with the app installed on their device do not open it in any given day (median open rate of 4.0%). Furthermore, general user retention is poor, with a median 15-day retention of 3.9% and 30-day retention of 3.3%. These findings reflect the lower ranges of real-world retention rates reported in developer-led studies [Christensen H, Griffiths KM, Korten AE, Brittliffe K, Groves C. A comparison of changes in anxiety and depression symptoms of spontaneous users and trial participants of a cognitive behavior therapy website. J Med Internet Res 2004 Dec 22;6(4):e46 [FREE Full text] [CrossRef] [Medline]17-Carpenter J, Crutchley P, Zilca RD, Schwartz HA, Smith LK, Cobb AM, et al. Seeing the "big" picture: big data methods for exploring relationships between usage, language, and outcome in internet intervention data. J Med Internet Res 2016 Aug 31;18(8):e241 [FREE Full text] [CrossRef] [Medline]20,Christensen H, Griffiths K, Farrer L. Adherence in internet interventions for anxiety and depression. J Med Internet Res 2009 Apr 24;11(2):e13. [CrossRef] [Medline]22].

Our results also indicate that there are significant differences in app usage and user retention that are associated with the app’s incorporated techniques. Daily minutes of use were significantly higher for mindfulness/meditation (median 21.47) and peer support (median 35.08) apps than for apps incorporating other techniques. Daily open rates were significantly lower for breathing exercise apps (median 1.6%) than for apps incorporating the two techniques with the highest open rates (tracker: median 6.3%; peer support: median 17.0%). User 30-day retention was significantly lower for breathing exercise apps (median 0.0%) than for all other incorporated techniques (mindfulness/meditation: 4.7%; trackers: 6.1%; peer support: 8.9%), except for psychoeducation, which exhibited a pattern similar to the breathing exercise apps at 30-day retention. These patterns could be explained using the notion of effective engagement described by Yardley and colleagues [Yardley L, Spring BJ, Riper H, Morrison LG, Crane DH, Curtis K, et al. Understanding and promoting effective engagement with digital behavior change interventions. Am J Prev Med 2016 Nov;51(5):833-842. [CrossRef] [Medline]36], wherein there is “sufficient engagement with the intervention to achieve intended outcomes.” From this perspective, it might be that once people acquire the desired skills (breathing exercise) or knowledge (psychoeducation) they no longer use the app, thus affecting the pattern of retention over a longer period. By contrast, mindfulness/meditation apps often include guided meditations designed for repeated use over longer periods of time, while not fostering learning or direct skill acquisition.

Our findings on user retention highlight the low engagement with these apps. Although this warrants a re-evaluation of current engagement and retention strategies, it does not necessarily suggest that these apps are only helpful for a small number of users. First, we do not have data implying that users engage only with one app in the self-management of their states or conditions. However, it is difficult to assume that users are knowledgeable about the different apps available, which apps to use, and when to use them. Although there are some recommender websites [MindTools.io. URL: https://mindtools.io/ [accessed 2018-06-24] [WebCite Cache]27,PsyberGuide. URL: http://psyberguide.org/ [accessed 2015-04-10] [WebCite Cache]29,ADAA Mental Health Apps rates. URL: https://www.adaa.org/finding-help/mobile-apps [accessed 2016-11-01] [WebCite Cache]37] and approaches to help users identify the right apps [Baumel A, Faber K, Mathur N, Kane JM, Muench F. Enlight: a comprehensive quality and therapeutic potential evaluation tool for mobile and web-based eHealth interventions. J Med Internet Res 2017 Mar 21;19(3):e82 [FREE Full text] [CrossRef] [Medline]38-Torous JB, Chan SR, Gipson SY, Kim JW, Nguyen T, Luo J, et al. A hierarchical framework for evaluation and informed decision making regarding smartphone apps for clinical care. Psychiatr Serv 2018 May 01;69(5):498-500. [CrossRef] [Medline]41], a therapeutic framework that provides guidance to users about how to use the right app at the right time could be useful. For example, in their novel study of IntelliCare—a suite of 13 apps and one Hub app accompanied by 8 weeks of coaching to encourage participants to try the apps recommended to them through the Hub app—Mohr and colleagues [Mohr DC, Tomasino KN, Lattie EG, Palac HL, Kwasny MJ, Weingardt K, et al. IntelliCare:an eclectic, skills-based app suite for the treatment of depression and anxiety. J Med Internet Res 2017 Jan 05;19(1):e10 [FREE Full text] [CrossRef] [Medline]42] found that 95% of participants eventually downloaded five or more of the IntelliCare apps as part of their therapeutic process. In another study, patients with schizophrenia spectrum disorders received 6 months of treatment that included health technology coaching around the use of three digital tools that were offered to patients based on their needs; 96% of patients rated the program as beneficial [Baumel A, Correll CU, Hauser M, Brunette M, Rotondi A, Ben-Zeev D, et al. Health technology intervention after hospitalization for schizophrenia: service utilization and user satisfaction. Psychiatr Serv 2016 Jun 1;67(9):1035-1038. [CrossRef] [Medline]43]. Future studies are needed to examine the feasibility of executing a scalable framework of care in which users receive the right app recommendation at the right time as part of a self-management routine.

Second, user retention patterns might also indicate the low burden associated with app installation (ie, the simplicity of opening the Google Play Store and clicking the app download and installation buttons), which implies that user context, motivation, and ability to engage [Fogg B. A behavior model for persuasive design. 2009 Presented at: Persuasive '09 4th international Conference on Persuasive Technology; April 26-29, 2009; Claremont, CA. [CrossRef]44] with these apps were not tested before app installation. The poor active user rates found in our analysis (median open rates of 4%) suggest that the number of app installs available in app stores do not provide a proper estimation of the proportion of users who actually self-manage their state by using the app. These issues further justify a previous call for the development of models to conceptualize the relationships between user state, need, ability, and motivation to engage with early interventions in the digital public space [Baumel A, Baker J, Birnbaum ML, Christensen H, De Choudhury M, Mohr DC, et al. Summary of key issues raised in the Technology for Early Awareness of Addiction and Mental Illness (TEAAM-I) meeting. Psychiatr Serv 2018 May 01;69(5):590-592. [CrossRef] [Medline]8]. Although we need to significantly improve our ability to engage users who have made initial attempts at help-seeking, taking a public health engagement approach that is also focused on sustainability represents an important step forward in scaling effective care.

Finally, we identified that the two apps that incorporated peer support as a primary technique had relatively high engagement and retention rates. In our previous work, we defined a program’s relatability as “a good representation of a human factor that is easily relatable within the therapeutic context/process” [Baumel A, Faber K, Mathur N, Kane JM, Muench F. Enlight: a comprehensive quality and therapeutic potential evaluation tool for mobile and web-based eHealth interventions. J Med Internet Res 2017 Mar 21;19(3):e82 [FREE Full text] [CrossRef] [Medline]38]. Relational factors have also been previously acknowledged to nurture a therapeutic alliance with users [Cavanagh K, Millings A. (Inter)personal computing: the role of the therapeutic relationship in e-mental health. J Contemp Psychother 2013 Jul 17;43(4):197-206. [CrossRef]45-Barazzone N, Cavanagh K, Richards DA. Computerized cognitive behavioural therapy and the therapeutic alliance: a qualitative enquiry. Br J Clin Psychol 2012 Nov;51(4):396-417. [CrossRef] [Medline]47], and have demonstrated to be a quality aspect that predicts user engagement with mobile health interventions [Baumel A, Kane J. Examining predictors of real-world user engagement with self-guided eHealth interventions: analysis of mobile apps and websites using a novel dataset. J Med Internet Res 2018 Dec 14;20(12):e11491 [FREE Full text] [CrossRef] [Medline]28]. Future studies are needed to determine whether technology has a special advantage as an infrastructure that connects between users and results in better engagement rates.

Limitations

This study has several limitations that should be considered. First, because we used an anonymous user panel, we did not have data about how different users use the apps and which parts of the apps were more engaging. The absence of such data means that some apps might have been more engaging due to the characteristics of their users, a phenomenon suggested previously by Ernsting and colleagues [Ernsting C, Dombrowski S, Oedekoven M, O Sullivan JL, Kanzler M, Kuhlmey A, et al. Using smartphones and health apps to change and manage health behaviors: a population-based survey. J Med Internet Res 2017 Apr 05;19(4):e101 [FREE Full text] [CrossRef] [Medline]48]. In addition, due to this limitation we were only able to focus on primary incorporated techniques within the apps and not on the way different design features (not deemed to be a primary technique) may have impacted the results. Subsequently, because we were leaning on off-the-shelf programs available to the public, we could not manipulate the programs themselves to account for aspects which lacked variability in our data, such as the impact of theoretical modalities on usage. That is, although our study advantage is that it enables us to present benchmarks of real-world use independent to trial settings, one advantage of direct experiments is the ability to control participant identity and manipulate intervention modalities and features to identify the group of active components leading to the best outcome (eg, [Collins L, Murphy S, Strecher V. The multiphase optimization strategy (MOST) and the sequential multiple assignment randomized trial (SMART): new methods for more potent eHealth interventions. Am J Prev Med 2007 May;32(5 Suppl):S112-S118 [FREE Full text] [CrossRef] [Medline]49]). Such experiments could be also helpful in determining causal relationships between intervention modalities and user behaviors, based on the context of use.

Second, some techniques such as peer support were only incorporated by a small number of highly installed apps (median installation category of 300,000). However, our results did not indicate a significant difference in any incorporated technique in terms of app installs, which suggests that these apps usage patterns go beyond an app’s popularity.

Third, because we were led by the available metrics on the platform, we could not examine retention rates after the first 30 days. The retention slope presented a slower decline in app open rates between day 15 and 30 and, based on previous reports, it would be reasonable to assume that there is a continuous usage decline over time (eg, [Owen JE, Jaworski BK, Kuhn E, Makin-Byrd KN, Ramsey KM, Hoffman JE. mHealth in the wild: using novel data to examine the reach, use, and impact of PTSD Coach. JMIR Ment Health 2015;2(1):e7 [FREE Full text] [CrossRef] [Medline]19,Cheung K, Ling W, Karr C, Weingardt K, Schueller S, Mohr D. Evaluation of a recommender app for apps for the treatment of depression and anxiety: an analysis of longitudinal user engagement. J Am Med Inform Assoc 2018 Aug 01;25(8):955-962 [FREE Full text] [CrossRef] [Medline]50]), but more studies are needed to determine the magnitude of the decline.

Finally, this study was only based on Android users. Current estimates suggest that the Android market share is approximately 88% of mobile phone users globally [Statistica. Global mobile OS market share in sales to end users from 1st quarter 2009 to 2nd quarter 2018 URL: https://www.statista.com/statistics/266136/global-market-share-held-by-smartphone-operating-systems/ [accessed 2019-01-06] [WebCite Cache]51] and approximately 42.7% of mobile phone users in the United States [Statcounter GlobalStats. Mobile operating system market share United States of America URL: http://gs.statcounter.com/os-market-share/mobile/united-states-of-america [accessed 2019-01-06] [WebCite Cache]52]. Although these data suggest that a sufficient portion of users use the Android operating system, it would be beneficial to validate these results with datasets from the Apple market.

Conclusions

The use of digital platforms that record user traffic “in the wild” enables us to examine patterns of app usage outside of study settings and to assess real-world public engagement. Although we found daily active minutes of use to be relatively high, only a small portion of users actually used popular apps regularly. More studies leveraging different datasets are needed to understand these phenomena. On a broader level, findings point to the importance of the ways we measure, report, and address aspects of user engagement in the real world. It would be helpful to track the context of users who eventually use apps, hopefully through the use of digital footprints, while also tracking the use of multiple apps and websites across times. Obviously, aspects that relate to security and privacy of data have to be addressed. In addition, new studies are needed to better conceptualize our understanding of users’ contexts and the ways they search for and engage with beneficial services outside of traditional health care settings.

Acknowledgments

This study was supported by the Donald & Barbara Zucker Foundation.

Conflicts of Interest

None declared.

‎

Multimedia Appendix 1

Definition of sham techniques.

PDF File (Adobe PDF File)75 KB

‎