Behavior Change Content, Understandability, and Actionability of Chronic Condition Self-Management Apps Available in France: Systematic Search and Evaluation

Background The quality of life of people living with chronic conditions is highly dependent on self-management behaviors. Mobile health (mHealth) apps could facilitate self-management and thus help improve population health. To achieve their potential, apps need to target specific behaviors with appropriate techniques that support change and do so in a way that allows users to understand and act upon the content with which they interact. Objective Our objective was to identify apps targeted toward the self-management of chronic conditions and that are available in France. We aimed to examine what target behaviors and behavior change techniques (BCTs) they include, their level of understandability and actionability, and the associations between these characteristics. Methods We extracted data from the Google Play store on apps labelled as Top in the Medicine category. We also extracted data on apps that were found through 12 popular terms (ie, keywords) for the four most common chronic condition groups—cardiovascular diseases, cancers, respiratory diseases, and diabetes—along with apps identified through a literature search. We selected and downloaded native Android apps available in French for the self-management of any chronic condition in one of the four groups and extracted background characteristics (eg, stars and number of ratings), coded the presence of target behaviors and BCTs using the BCT taxonomy, and coded the understandability and actionability of apps using the Patient Education Material Assessment Tool for audiovisual materials (PEMAT-A/V). We performed descriptive statistics and bivariate statistical tests. Results A total of 44 distinct native apps were available for download in France and in French: 39 (89%) were found via the Google Play store and 5 (11%) were found via literature search. A total of 19 (43%) apps were for diabetes, 10 for cardiovascular diseases (23%), 8 for more than one condition in the four groups (18%), 6 for respiratory diseases (14%), and 1 for cancer (2%). The median number of target behaviors per app was 2 (range 0-7) and of BCTs per app was 3 (range 0-12). The most common BCT was self-monitoring of outcome(s) of behavior (31 apps), while the most common target behavior was tracking symptoms (30 apps). The median level of understandability was 42% and of actionability was 0%. Apps with more target behaviors and more BCTs were also more understandable (ρ=.31, P=.04 and ρ=.35, P=.02, respectively), but were not significantly more actionable (ρ=.24, P=.12 and ρ=.29, P=.054, respectively). Conclusions These apps target few behaviors and include few BCTs, limiting their potential for behavior change. While content is moderately understandable, clear instructions on when and how to act are uncommon. Developers need to work closely with health professionals, users, and behavior change experts to improve content and format so apps can better support patients in coping with chronic conditions. Developers may use these criteria for assessing content and format to guide app development and evaluation of app performance. Trial Registration PROSPERO CRD42018094012; https://www.crd.york.ac.uk/prospero/display_record.php?RecordID=94012


Behavioural content
Target behaviours (TB) The number and percentage of apps in which each target behaviour was identified in this sample are presented in Table 2. The distribution of the total number of TBs per app is presented in Figure 1 as a histogram.

Interpretation
We observed a total of 10 TBs. More than a third of the apps (n = 17, 38.64%) targeted none or only one behaviour. Median number of TB per app was 2, ranging from 0 to 7.

Behaviour Change Techniques (BCTs)
The number and percentage of apps in which each BCT was identified are presented in Table 3 for BCTs with at least one occurrence in this sample. Figure 2 presents a histogram of the total number of BCTs per app.

Interpretation
We observed a total of 20 BCTs. Nearly one fifth of the apps (n = 8, 18.18%) had none or only one BCT present. Median number of BCT per app was 3, ranging from 0 to 12.

PEMAT
Histograms with Understandability and Actionability scores are presented in Figure 3 and Figure 4, respectively. Figure 5 shows the co-occurrence of Understandability (x-axis) and Actionability (y-axis) scores in this sample (circle size and label indicate the number of apps with the corresponding 2 scores).

Interpretation
Overall, Understandability scores were higher than Actionability scores. In the sample, 30 apps had null Actionability scores, while 0 had null Understandability scores. On the other hand, 5 apps had scores of 100% Actionability and the maximum score of Understandability was 92. Plot shows that apps with higher Actionability scores also had higher Understandability scores. We also performed a Wilcoxon rank sum test to examine this difference and results show that Understandability values for apps with non-null Actionability (mean = 64.5) and with null Actionability (mean = 33.7) were significantly different (p = 0.00002).

Inferential statistics
This section includes bivariate and multivariate analysis to compare and examine the relationship between computed variables (TB per app, BCT per app, Understandability and Actionability) and apps characteristics extracted from Google Play store (Stars, Ratings, Downloads and Sales on app).
Normality tests were performed on quantitative variables to choose parametric or non-parametric statistical tests (presented below). Stars, BCT per app, TB per app and Actionability are not normal, while Ratings and Understandability are. Therefore, Wilcoxon rank sum test was chosen to perform bivariate analysis and Kruskal-Wallis test to perform multivariate analysis. We also used the Spearman's coefficient (rho) to compute correlations.    Table 4 presents Spearman's correlation coefficient (rho) between ranking characteristics (Stars and Ratings) and computed variables (TB per app, BCT per app, Understandability and Actionability). Figure 4 is heatmap to illustrate these correlations.

Are stars and ratings associated with the presence of specific BCTs?
We performed Wilcoxon rank sum tests to compare the occurence of BCTs present in more than 10% of apps and ranking characteristics. Results are presented in Table 5 and Table 6.

Interpretation
There was significant difference in Stars mean values between apps with (mean = 4.07) and without (mean = 4.45) BCT 2.4 Self-monitoring of outcome(s) of behaviour (p = 0.025). For Ratings, there was a significant difference between apps with (mean = 22734.71) and without (mean = 414.46) BCT 2.4 Self-monitoring of outcome(s) of behaviour as well (p = 0.016), but also between apps with (mean = 25889.12) and without (mean = 3312.42) BCT 2.7 Feedback on outcome(s) of behaviour (p = 0.047). Table 7 presents comparisons of computed variables (TB per app, BCT per app, Understandability and Actionability) between each category of developer ("Private company", "Non-Private", "Pharma/MedTech"). A paired test for Actionability scores by developer type is also presented.

Are BC content and/or PEMAT scores related to number of downloads?
To examine the relationship between computed variables (BCT per app, TB per app, Understandability and Actionability scores) and number of downloads, we performed Kruskal-Wallis tests, as the variable Dowloads is presented in 12 ranges. Table 8 present the results; no significant difference between variables considering number of downloads was observed.

Are there differences in PEMAT scores and BC content between apps with/without sales in app?
To examine the relationship between computed variables (BCT per app, TB per app, Understandability and Actionability scores) and the presence of paid features, we also performed Kruskal-Wallis tests. Table 9 presents the results; no significant difference between variables in paid, with sales and without sales groups was observed.

What BCTs and TBs occur together?
A Spearman's correlation heatmap between observed BCTs and TBs is presented in Figure 7.

Interpretation
This figure shows the strength of the correlation (a positive correlation can be considered as co-occurrence of the observed items). For example, BCT 2.4 Self-monitoring of behaviour tend to occur together with TB Tracking symptoms (rho = 0.84).

Are there differences in app characteristics between apps identified via Google Play search versus literature search?
In this work, mHealth self-management apps were found through two types of searches, a Google Play store search and a literature search. Willcoxon rank sum tests were performed on variables by search groups and Table 10 presents the statistics and correspondent p-values.

Correlations between observed BCTs and TBs
Tracking emotional symptoms