The Mobile Health App Trustworthiness Checklist: Usability Assessment

Background The mobile health (mHealth) app trustworthiness (mHAT) checklist was created to identify end users’ opinions on the characteristics of trustworthy mHealth apps and to communicate this information to app developers. To ensure that the checklist is suited for all relevant stakeholders, it is necessary to validate its contents. Objective The purpose of this study was to assess the feasibility of the mHAT checklist by modifying its contents according to ratings and suggestions from stakeholders familiar with the process of developing, managing, or curating mHealth apps. Methods A 44-item online survey was administered to relevant stakeholders. The survey was largely comprised of the mHAT checklist items, which respondents rated on a 5-point Likert scale, ranging from completely disagree (1) to completely agree (5). Results In total, seven professional backgrounds were represented in the survey: administrators (n=6), health professionals (n=7), information technology personnel (n=6), managers (n=2), marketing personnel (n=3), researchers (n=5), and user experience researchers (n=8). Aside from one checklist item—“the app can inform end users about errors in measurements”—the combined positive ratings (ie, completely agree and agree) of the checklist items overwhelmingly exceeded the combined negative ratings (ie, completely disagree and disagree). Meanwhile, two additional items were included in the checklist: (1) business or funding model of the app and (2) details on app uninstallation statistics. Conclusions Our results indicate that the mHAT checklist is a valuable resource for a broad range of stakeholders to develop trustworthy mHealth apps. Future studies should examine if the checklist works best for certain mHealth apps or in specific settings.


Introduction
From self-diagnosis to wellness, mobile health (mHealth) apps have evolved as a conduit for individuals to play more pronounced roles in their own health care [1,2]. Sustaining the uptake of mHealth apps is particularly necessary to realize their potential in health systems. Among others, mHealth apps are perceived as a vehicle to enhance patient-provider communication, boost patient attempts to self-monitor health conditions, as well as to advance patient empowerment [3]. Despite these positive outcomes, several studies have found significant flaws in the privacy, security, and safety claims of several mHealth apps on the market [4,5].
There is evidence to suggest that end users abandon or reject mHealth apps that they perceive as untrustworthy [6][7][8]. So far, regulatory bodies such as the US Food and Drug Administration have been more concerned about the safety of mHealth apps that purport to be medical devices or their accompanying add-ons [9]. The wellness app space lacks quality assurance frameworks and clear guidance from respective authorities [10]. In light of the relaxed regulatory approach to mHealth apps, evaluation tools have been proposed to assist end users in determining which apps are safe or secure and, thus, which can be trusted [11,12]. Very few of these tools target app developers, although they are vital stakeholders in ensuring that mHealth apps are trustworthy from the outset [13]. To fill this gap, we created the mHealth app trustworthiness (mHAT) checklist via a focus group study with end users of health apps. The mHAT checklist is displayed in Multimedia Appendix 1 and details of the study can be found elsewhere [14].
The purpose of this study was to validate the mHAT checklist by modifying its contents according to ratings and suggestions from stakeholders familiar with the process of developing, managing, or curating mHealth apps to ensure that its contents are applicable. Since the checklist goes beyond the technical aspects of app development, procuring the suggestions of different stakeholders will ensure that it is suited for anyone likely to be involved in app development. The value of validating the checklist was reinforced by a preliminary feedback exercise during the development of the mHAT checklist. Among the six experts with information technology (IT) backgrounds that provided feedback, there was a consensus that the checklist must be validated among a wider range of stakeholders.

Study Design
We designed a cross-sectional online survey to validate the mHAT checklist. Ethical approval for this study was granted by the ETH Zurich (Swiss Federal Institute of Technology) Ethics Commission (EK 2019-N-20). To ensure that this study poses little to no risks to the respondents, we did not require any personal details [15]. Taking part in the survey implied consent. The survey was administered through the SurveyMonkey online survey tool (SurveyMonkey Inc) between March 14 and July 10, 2019.

Survey Development
The first question in the 44-item open survey screened for eligible respondents to ensure that only individuals with experience in developing apps advanced on to the next page. Another question requested participants' areas of expertise and was accompanied by four professions-administrators; marketing personnel; software designers, programmers, or developers; and user experience (UX) researchers-as well as a field for free-text answers. To provide an overview of the types of apps created by participants, the next question asked for the function of an app respondents have created within the last 5 years. The remaining 41 survey items consisted of the mHAT checklist items and required answers on a 5-point Likert scale, ranging from completely disagree (1) to completely agree (5) [16].
Aside from the screening question, which blocked noneligible participants from proceeding further, no other special functions, such as skip logic, were applied to the survey. The number of questions on each page of the survey differed according to the items in each segment of the checklist. Consequently, page 1 contained the 11 questions from the informational content section, whereas page 2 was comprised of the seven questions under organizational attributes. SurveyMonkey estimated 10 minutes as the average completion time.

Participant Recruitment
Our target population was a convenience sample of adults over the age of 18 years who are knowledgeable about the processes involved in developing mobile apps. Among others, administrators (ie, business and systems); marketing personnel; software designers, programmers, and developers; and UX researchers from all geographical locations were eligible for this study. In line with the principles for calculating sample sizes for surveys, the minimum desired sample size was 30 [17]. Participants were recruited by propagating a link of the survey hosted on the SurveyMonkey platform (1) to relevant individuals via email and (2) on social media (ie, Twitter). In line with minimizing access to participants' personal information, participants' Internet Protocol (IP) addresses were not tracked, and their computers were also not assigned unique identification numbers.

Data Analysis
In accordance with recommended practice, the data from the Likert scales were analyzed with descriptive statistics, such as ranks, modes, frequencies, and ranges [16,18]. The frequencies and percentages of the survey items were presented as positive, neutral, and negative using Microsoft Excel. Positive ratings refer to the sum of completely agree and agree, whereas negative ratings are the sum of completely disagree and disagree. Meanwhile, the neutral values present the raw ratings of each item on the Likert scale. Only the checklist items with positive ratings (ie, completely agree and agree) exceeding negative ratings (ie, completely disagree and disagree) were retained. The data derived from the open-ended questions were analyzed by content analysis in NVivo 12 (QSR International) [19]. Novel suggestions of items to include in the checklist were catalogued accordingly.

Participant Characteristics
Of the 144 individuals that responded to calls to participate in this survey, 49 (34.0%) with some professional experience in developing mobile apps were eligible to participate. Out of these 49 participants, 23 (47%) completed the survey in its entirety, while 26 (53%) others completed it partially.
Overall, 22 respondents indicated their professional expertise among the four fixed-choice professions provided in the survey: 6 (27%) administrators; 3 (14%) marketing personnel; 5 (23%) software designers, programmers, or developers; and 8 (36%) UX researchers. Another 15 free-text responses showed additional professional backgrounds: 7 (47%) health professionals, 5 (33%) researchers, 2 (13%) managers, and 1 (7%) IT professional. Figure 1 presents the frequencies of the professions represented in this survey. The respondents had collectively created apps with 33 functions within the last 5 years. By aggregating similar apps, 20 out of 33 (61%) sought to assist end users to self-manage or monitor chronic diseases, such as diabetes and mental health, while 8 (24%) apps intended to promote healthy lifestyles through diet and or exercise. The complete list of app functions can be found in Multimedia Appendix 2.

Survey Responses According to mHAT Categories
In Table 1, we rank the positive (ie, sum of completely agree and agree), neutral, and negative (ie, sum of completely disagree and disagree) ratings according to categories of the checklist. Aside from one checklist item-"the app can inform end users about errors in measurements"-the combined positive ratings (ie, completely agree and agree) of the checklist items overwhelmingly exceeded the combined negative ratings (ie, completely disagree and disagree). The individual ratings for completely disagree, disagree, neutral, agree, and completely agree can be retrieved from Multimedia Appendix 3.

Suggestions for Additional Survey Items
There were two new suggestions of items to include in the checklist: (1) transparency about business models and funding streams as well as (2) statistics on app uninstallations. These items were catalogued under the organizational attributes (reputation) and informational content (transparency) categories of the checklist, respectively. In the final version of the mHAT checklist found in Multimedia Appendix 4, the new items are indicated with an asterisk.

Principal Findings
We have conducted a stakeholder survey to validate the mHAT checklist, a practical tool for app developers to create trustworthy mHealth apps. The checklist items can be considered feasible in many different settings, since the majority of the items were rated positively by the disparate stakeholders-UX researchers, administrators, IT personnel, health care professionals, researchers, managers, and marketing personnel-who reviewed its contents.
Throughout the survey, it appeared that those items that could apply to disparate mHealth apps were rated higher than those items that may be relevant to certain apps but not others. For example, "the app can inform end users about errors in measurements" was the most negatively rated item, whereas "the app can be easy to use and have a friendly end-user interface" was the most positively rated. One plausible explanation for this difference in ratings may be due to the aspiration to design any app, regardless of function, to be user friendly for its end users. The extent to which an app can have no measurement errors, however, will differ from the function of the app (eg, diabetes versus mental health).
The most negatively rated items in the survey-"the app can inform end users about error in measurements" and "the app can highlight potential risks or side effects resulting from its use"-demonstrated that app developers place marginal value in informing end users about the errors, risks, or side effects that may arise from their apps. Regardless of the reasons that contributed to these ratings, such information is vital to upholding end users' trust in mHealth apps. Since plenty of evidence suggests that some apps on the market perpetuate inaccurate advice that may threaten patient safety, it is imperative that app developers are transparent about these issues [4,20]. Without a clear list of app risks or errors, end-user concerns about the trustworthiness of mHealth apps are bound to continue.
Since the checklist item "the app can inform end users about error in measurements" received more negative than positive ratings, it was excluded from the checklist. Meanwhile, respondents suggested including two new items: (1) a disclosure of the business model and (2) uninstallation statistics. Indeed, when app developers obscure their business models, they are perceived negatively by end users [6]. Including the uninstallation statistics of an app, however, is interesting for two reasons. On the one hand, it could signal transparency on the part of app developers and thus boost end users' trust. On the other hand, it may deter end users from downloading the app altogether, since unpopular apps are unlikely to be downloaded in the first place [21].

Strengths and Limitations
To the best of our knowledge, the mHAT checklist is the only empirically validated checklist that presents the attributes of trustworthy mHealth apps. This validated checklist is a robust tool for assisting app developers to create trustworthy apps. Our study has limitations. Despite multiple attempts to obtain a minimum sample size of 30 participants, challenges in recruiting participants meant that fewer participants could take part in the survey. Nonetheless, 23 participants answered all the survey items affording informative statistical analysis. The convenience sample recruited for this study is another limitation. Since these samples are usually unrepresentative of an entire population, our study may have missed out on capturing the opinions of some relevant stakeholders [22,23]. Further, ineligible individuals may have participated in the survey for two reasons: (1) it was an open survey and (2) to minimize access to respondents' personal information, their computers were not assigned unique identification numbers. Nonetheless, there was no incentive to participate, since no compensation was awarded.

Implications for Future Research
The mHAT checklist informs the conversation on the expectations of trustworthy mHealth apps. Future studies should assess whether the checklist is suitable for all mHealth apps or whether it is better suited for certain apps than others. Through additional studies, the contents of the checklist can be improved to retain useful items and exclude redundant ones. More enquiries about the underlying reasons why app developers see little value in informing end users about the risks, side effects, or errors of their apps is also warranted.

Conclusions
This study presents a validated mHAT checklist: a useful guide for app developers to create trustworthy health apps. The 41-item checklist is comprised of five main categories-informational content, organizational attributes, societal influences, technology-related factors, and user control-and 11 subcategories-information accuracy, understandability, transparency, brand familiarity, reputation, recommendations, external factor, usability, privacy, autonomy, and empowerment (see Multimedia Appendix 4).