Exploring the Applications of Explainability in Wearable Data Analytics: Systematic Literature Review

doi:10.2196/53863

Review

¹College of Science and Engineering, Hamad Bin Khalifa University, Doha, Qatar

²Qatar Computing Research Institute, Hamad Bin Khalifa University, Doha, Qatar

*all authors contributed equally

Corresponding Author:

Yasmin Abdelaal, MSc

College of Science and Engineering

Hamad Bin Khalifa University

Penrose Building

Doha

Qatar

Phone: 974 74024682

Email: y.abdelaal98@gmail.com

Background: Wearable technologies have become increasingly prominent in health care. However, intricate machine learning and deep learning algorithms often lead to the development of “black box” models, which lack transparency and comprehensibility for medical professionals and end users. In this context, the integration of explainable artificial intelligence (XAI) has emerged as a crucial solution. By providing insights into the inner workings of complex algorithms, XAI aims to foster trust and empower stakeholders to use wearable technologies responsibly.

Objective: This paper aims to review the recent literature and explore the application of explainability in wearables. By examining how XAI can enhance the interpretability of generated data and models, this review sought to shed light on the possibilities that arise at the intersection of wearable technologies and XAI.

Methods: We collected publications from ACM Digital Library, IEEE Xplore, PubMed, SpringerLink, JMIR, Nature, and Scopus. The eligible studies included technology-based research involving wearable devices, sensors, or mobile phones focused on explainability, machine learning, or deep learning and that used quantified self data in medical contexts. Only peer-reviewed articles, proceedings, or book chapters published in English between 2018 and 2022 were considered. We excluded duplicates, reviews, books, workshops, courses, tutorials, and talks. We analyzed 25 research papers to gain insights into the current state of explainability in wearables in the health care context.

Results: Our findings revealed that wrist-worn wearables such as Fitbit and Empatica E4 are prevalent in health care applications. However, more emphasis must be placed on making the data generated by these devices explainable. Among various explainability methods, post hoc approaches stand out, with Shapley Additive Explanations as a prominent choice due to its adaptability. The outputs of explainability methods are commonly presented visually, often in the form of graphs or user-friendly reports. Nevertheless, our review highlights a limitation in user evaluation and underscores the importance of involving users in the development process.

Conclusions: The integration of XAI into wearable health care technologies is crucial to address the issue of black box models. While wrist-worn wearables are widespread, there is a notable gap in making the data they generate explainable. Post hoc methods such as Shapley Additive Explanations have gained traction for their adaptability in explaining complex algorithms visually. However, user evaluation remains an area in which improvement is needed, and involving users in the development process can contribute to more transparent and reliable artificial intelligence models in health care applications. Further research in this area is essential to enhance the transparency and trustworthiness of artificial intelligence models used in wearable health care technology.

J Med Internet Res 2024;26:e53863

doi:10.2196/53863

Keywords

explainable artificial intelligence (24); XAI (9); wearable (238); machine learning (1676); deep learning (418); health informatics (169); wearable sensors (44); user experience (200); wearable data (3); analytics (30); interpretation (5)

Background

Wearable technologies have become indispensable and dominant in the health care landscape [Iqbal MH, Aydin A, Brunckhorst O, Dasgupta P, Ahmed K. A review of wearable technology in medicine. J R Soc Med. Oct 11, 2016;109(10):372-380. [FREE Full text] [CrossRef] [Medline]1]. A notable recent shift in their use involves leveraging their capabilities for the continuous monitoring of users, which proves particularly beneficial in patient care scenarios, such as monitoring patients with diabetes to preempt hypertension [Duckworth C, Guy MJ, Kumaran A, O'Kane AA, Ayobi A, Chapman A, et al. Explainable machine learning for real-time hypoglycemia and hyperglycemia prediction and personalized control recommendations. J Diabetes Sci Technol. Jan 13, 2024;18(1):113-123. [FREE Full text] [CrossRef] [Medline]2] or in the case of athletes, analyzing heart rate (HR) data to tailor personalized exercise regimes for enhanced progress [Barricelli BR, Casiraghi E, Gliozzo J, Petrini A, Valtolina S. Human digital twin for fitness management. IEEE Access. 2020;8:26637-26664. [CrossRef]3]. However, to develop such sophisticated models, machine learning (ML) or deep learning (DL) algorithms are used, which often function as “black boxes,” lacking transparency and making them challenging for medical professionals and end users to comprehend. In this context, explainability becomes crucial in ensuring the responsible and ethical use of wearable technologies in health care. By providing transparent insights into the inner workings of complex algorithms, explainable artificial intelligence (XAI) empowers medical professionals and end users to trust and confidently use these technologies for improved patient outcomes and personalized interventions [Saeed W, Omlin C. Explainable AI (XAI): a systematic meta-survey of current challenges and future opportunities. Knowl Based Syst. Mar 2023;263:110273. [CrossRef]4]. This bridges the gap between ML experts and health care professionals, empowering the latter with actionable insights derived from these models.

Despite the growing interest and research on explainability in 2017, the association between artificial intelligence (AI) and explainability dates back to the mid-80s [Adadi A, Berrada M. Peeking inside the black-box: a survey on explainable artificial intelligence (XAI). IEEE Access. 2018;6:52138-52160. [CrossRef]5-Neches R, Swartout WR, Moore JD. Enhanced maintenance and explanation of expert systems through explicit models of their development. IEEE Trans Softw Eng. Nov 1985;SE-11(11):1337-1351. [CrossRef]7]. Over time, the significance of explainability has grown, recognizing its immense potential across various domains. In 2018, to further promote the importance of explainability, organizations such as the Association for Computing Machinery issued statements emphasizing algorithmic transparency and accountability [USACM issues statement on algorithmic transparency and accountability. Association for Computing Machinery. 2017. URL: https://www.acm.org/articles/bulletins/2017/january/usacm-statement-algorithmic-accountability [accessed 2024-04-29] 8], encouraging researchers and institutions to prioritize explainability when designing AI systems. International institutes such as the Defense Advanced Research Projects Agency have also contributed to the focus on explainability by funding the Explainable AI (XAI) Program [Gunning D, Aha D. DARPA’s explainable artificial intelligence (XAI) program. AI Mag. Jun 24, 2019;40(2):44-58. [CrossRef]9]. This initiative emphasizes the importance of transparency and interpretability in AI systems, further highlighting the growing recognition of explainability’s significance in the field of AI. As a result of this, there has been a notable increase in research on explainability, emphasizing its growing importance in promoting the ethical use of AI in various domains.

In parallel, the recent emphasis on XAI has sparked interest in integrating it with wearable technologies. Wearables have demonstrated significant potential and effectiveness in health monitoring, paving the way for innovative health care applications. What sets XAI apart when applied to wearable data compared to other datasets lies in the unique characteristics of wearables. Wearables capture highly personalized, granular data, often in dynamic real-world settings. This personal and real-time nature introduces complexities that demand a specialized XAI approach. The interpretability and transparency of AI models become even more critical as users must understand not just the decisions but also their impact on health and well-being. Moreover, wearables often integrate diverse data types, from physiological signals to activity tracking, requiring XAI techniques capable of handling multimodal data. Thus, delving into XAI within the domain of wearables is essential to address these distinct challenges and harness the full potential of wearable technology in health care, fitness, and personal well-being. However, the incorporation of XAI techniques into wearables remains an emerging research frontier. A recent review, which analyzed papers from 2011 to 2022, underscored an existing gap in the field: the lack of XAI research specifically focused on interpreting 1D biosignals obtained from wearable devices [Loh HW, Ooi CP, Seoni S, Barua PD, Molinari F, Acharya UR. Application of explainable artificial intelligence for healthcare: a systematic review of the last decade (2011-2022). Comput Methods Programs Biomed. Nov 2022;226:107161. [CrossRef] [Medline]10]. To address this gap, this paper aimed to review the recent literature, exploring the application of explainability in wearables. By examining how XAI can enhance the interpretability of generated data and models, this review sought to shed light on the possibilities that arise at the intersection of wearable technologies and XAI.

Related Studies

In the current AI era, a notable transformation is being witnessed in health care [Iqbal MH, Aydin A, Brunckhorst O, Dasgupta P, Ahmed K. A review of wearable technology in medicine. J R Soc Med. Oct 11, 2016;109(10):372-380. [FREE Full text] [CrossRef] [Medline]1]. Various applications are powered by AI systems, leading to the emergence of ML and DL. As AI complexity increases, the demand for enhanced transparency is being recognized. This demand is met by the implementation of XAI, which allows AI model workings to be understood. The following paragraphs provide a more comprehensive exploration of the terminology associated with AI, ML, DL, and XAI. Following this, the paper proceeds to delve into specific research questions (RQs) that will be addressed.

AI involves the creation of systems and machines designed to replicate human intelligence, enabling them to perform real-world tasks effectively. AI systems are trained on data, allowing them to learn from experience and solve specific problems. They continuously refine their performance based on the information they receive. AI applications are diverse and include advanced web search engines, self-driving cars, gaming, speech recognition, recommendation systems, and health care AI systems and applications. AI essentially emulates human cognitive processes, making it invaluable when dealing with extensive datasets [Iqbal MH, Aydin A, Brunckhorst O, Dasgupta P, Ahmed K. A review of wearable technology in medicine. J R Soc Med. Oct 11, 2016;109(10):372-380. [FREE Full text] [CrossRef] [Medline]1].

ML, a subset of AI, empowers computers to recognize patterns, make highly accurate predictions, and self-improve through experiential learning without the need for explicit programming. Building AI-driven applications relies heavily on ML methodologies. These models undergo training using extensive datasets, enabling them to deliver precise predictions [Duckworth C, Guy MJ, Kumaran A, O'Kane AA, Ayobi A, Chapman A, et al. Explainable machine learning for real-time hypoglycemia and hyperglycemia prediction and personalized control recommendations. J Diabetes Sci Technol. Jan 13, 2024;18(1):113-123. [FREE Full text] [CrossRef] [Medline]2]. ML is further categorized into supervised and unsupervised learning, with supervised learning branching into semisupervised learning [Saeed W, Omlin C. Explainable AI (XAI): a systematic meta-survey of current challenges and future opportunities. Knowl Based Syst. Mar 2023;263:110273. [CrossRef]4] and reinforcement learning [Adadi A, Berrada M. Peeking inside the black-box: a survey on explainable artificial intelligence (XAI). IEEE Access. 2018;6:52138-52160. [CrossRef]5].

DL, a specialized field within ML, draws inspiration from the human brain’s structure and functionality. DL effectively uses both structured and unstructured data for model training. It plays a vital role in predicting life-threatening diseases in medical research, with deep neural networks achieving remarkable predictive capabilities [Bai X, Wang X, Liu X, Liu Q, Song J, Sebe N, et al. Explainable deep learning for efficient and robust pattern recognition: a survey of recent developments. Pattern Recognit. Dec 2021;120:108102. [CrossRef]11]. Prominent DL models include convolutional neural networks (CNNs), residual networks, fully convolutional networks, and long short-term memory (LSTM) [Watson C, Cooper N, Palacio DN, Moran K, Poshyvanyk D. A systematic literature review on the use of deep learning in software engineering research. ACM Trans Softw Eng Methodol. Mar 04, 2022;31(2):1-58. [FREE Full text] [CrossRef]12]. Figure 1 [Saranya A, Subhashini R. A systematic review of explainable artificial intelligence models and applications: recent developments and future trends. Decis Anal J. Jun 2023;7:100230. [CrossRef]13] depicts the relationship among AI, ML, DL, and XAI.

**Figure 1.** Relationship among artificial intelligence (AI), machine learning (ML), deep learning (DL), and explainable AI (XAI) [Saranya A, Subhashini R. A systematic review of explainable artificial intelligence models and applications: recent developments and future trends. Decis Anal J. Jun 2023;7:100230. [CrossRef]13].

XAI enriches AI models with information comprehensible to the end users. While AI algorithms enable users to make informed business decisions, the opacity of these algorithms often leaves users uninformed about the decision-making processes [Glikson E, Woolley AW. Human trust in artificial intelligence: review of empirical research. Acad Manag Ann. Jul 2020;14(2):627-660. [CrossRef]14]. This lack of transparency is where XAI comes into play. XAI strives to elucidate the inner workings of AI models, offering users comprehensible explanations of the methodologies, procedures, and outputs. The term XAI is often referred to as the “white box” approach due to its emphasis on revealing the model’s processes.

In the field of XAI, training data serve as input, and users select the prediction methodology and XAI techniques based on specific application requirements. The input data vary depending on their source. For example, they can be electronic health records, vital sign recordings, medical scans, and wearable data. This review focuses on wearable data as the training data. Wearable data vary among physiological, activity, environmental, behavioral, biometric, and social interaction data. These techniques shed light on the model’s internal operations and provide an explanatory interface, as shown in Figure 2, adapted from the study by Saranya and Subhashini [Saranya A, Subhashini R. A systematic review of explainable artificial intelligence models and applications: recent developments and future trends. Decis Anal J. Jun 2023;7:100230. [CrossRef]13]. In the modified diagram, the input data are changed to “wearable data” to be more specific to this review. This transparency empowers users with insights into AI model outputs, fostering trust [Ahmad MA, Teredesai A, Eckert C. Interpretable machine learning in healthcare. In: Proceedings of the 2018 IEEE International Conference on Healthcare Informatics. 2018. Presented at: ICHI '18; June 4-7, 2018:447; New York, NY. URL: https://ieeexplore.ieee.org/document/8419428 [CrossRef]15]. Armed with this understanding, users can enhance output accuracy and identify model shortcomings, facilitating informed decision-making to improve the model.

**Figure 2.** Process of explainable artificial intelligence (AI) from wearable data, adapted from Saranya and Subhashini [Saranya A, Subhashini R. A systematic review of explainable artificial intelligence models and applications: recent developments and future trends. Decis Anal J. Jun 2023;7:100230. [CrossRef]13].

Despite the recent surge in wearable devices within the health care sector, there is a notable gap in research on XAI applied to wearable data. The application of XAI in wearable data analysis is crucial due to several key factors. First, the high granularity and personal nature of the data collected via wearables emphasize the need for transparency. Users must understand how their data are processed and interpreted to trust the insights generated. Second, given the inherently personal nature of wearables, establishing trust becomes paramount. Users need confidence in the accuracy and privacy of the data collected, making transparency in algorithms and data handling essential. Finally, wearables operate in dynamic environments and integrate diverse data types. XAI can help unravel complex relationships within these data streams, enabling better insights and decisions in various health care and well-being applications. Although recent reviews have explored the broader applications of XAI in health care, the potential of wearables in this context has remained somewhat overshadowed. Some of these reviews have touched on the relevance of wearable biosensors in health care applications, yet they often fail to fully uncover the diverse opportunities presented by both commercial and noninvasive smartwatches [Islam MR, Ahmed MU, Barua S, Begum S. A systematic review of explainable artificial intelligence in terms of different application domains and tasks. Appl Sci. Jan 27, 2022;12(3):1353. [CrossRef]16,Qureshi R, Irfan M, Ali H, Khan A, Nittala AS, Ali S, et al. Artificial intelligence and biosensors in healthcare and its clinical relevance: a review. IEEE Access. 2023;11:61600-61620. [CrossRef]17]. Similarly, certain reviews have focused on health care Internet of Things devices, but they tend to only scratch the surface regarding the intricate field of wearable technology [Chaudhari A, Sarode V, Udtewar S, Moharkar L, Patil L, Barreto F. A review of artificial intelligence for predictive healthcare analytics and healthcare IoT applications. In: Proceedings of the 2023 Conference on Intelligent Computing and Networking. 2023. Presented at: IC-ICN '23; December 16-17, 2023:555-562; Bhubaneswar, India. URL: https://link.springer.com/chapter/10.1007/978-981-99-3177-4_42 [CrossRef]18,Jagatheesaperumal SK, Pham QV, Ruby R, Yang Z, Xu C, Zhang Z. Explainable AI over the internet of things (IoT): overview, state-of-the-art and future directions. IEEE Open J Commun Soc. 2022;3:2106-2136. [CrossRef]19]. The limited number of studies on XAI from wearable data highlights a significant research gap in this emerging field. While wearables have gained traction in health care and other domains for data collection and analysis, the incorporation of XAI principles to ensure the interpretability of AI models is an area that holds great potential but has seen limited exploration. This review aimed to explore the use of XAI within wearables (RQ 1).

In the domain of wearable technology, these devices offer a diverse array of data, encompassing activity metrics such as calorie expenditure and step counts as well as physiological signals such as electrodermal activity (EDA) and electrocardiography (ECG). Furthermore, wearables extend their capabilities to encompass behavioral patterns and environmental cues, thereby enriching the contextual understanding derived from the data they collect. Recent reviews in the health care field have illuminated the widespread adoption of wearables across various anatomical regions, including the head, limbs, and torso [Lu L, Zhang J, Xie Y, Gao F, Xu S, Wu X, et al. Wearable health devices in health care: narrative systematic review. JMIR Mhealth Uhealth. Nov 09, 2020;8(11):e18907. [FREE Full text] [CrossRef] [Medline]20]. These versatile devices have been instrumental in monitoring a range of medical conditions, spanning stroke and poststroke rehabilitation to Parkinson disease, among others [Vijayan V, Connolly JP, Condell J, McKelvey N, Gardiner P. Review of wearable devices and data collection considerations for connected health. Sensors (Basel). Aug 19, 2021;21(16):5589. [FREE Full text] [CrossRef] [Medline]21]. Hence, the adoption of wearables in health care is evident, yet some studies have delved deeper and adopted XAI for wearable data. For example, a study used EDA for pain recognition using the gradient-weighted class activation mapping technique [Gouverneur P, Li F, Shirahama K, Luebke L, Adamczyk WM, Szikszay TM, et al. Explainable artificial intelligence (XAI) in pain research: understanding the role of electrodermal activity for automated pain recognition. Sensors (Basel). Feb 09, 2023;23(4):1959. [FREE Full text] [CrossRef] [Medline]22], and another adopted accelerometer data for fall detection using local interpretable model-agnostic explanations (LIME) as the explainability method [Mankodiya H, Jadav D, Gupta R, Tanwar S, Alharbi A, Tolba A, et al. XAI-fall: explainable AI for fall detection on wearable devices using sequence models and XAI techniques. Mathematics. Jun 09, 2022;10(12):1990. [CrossRef]23]. This review sought to address the question of which specific wearable data types have been explored within the context of explainability (RQ 2). By shedding light on this aspect, this review sought to bridge the gap and provide insights into the areas of wearable data that warrant greater attention from researchers and practitioners in the field of XAI. In addition, this review sheds light on the various explainability methods deployed specifically for each type of wearable data (RQ 3).

Previous reviews on XAI in the health care field have focused on explainability. They have highlighted the dominance of feature explanations over textual and example-based explanation methods. In addition, most explainability methods use the post hoc approach while focusing on the local rather than the global scope of the data. This review aimed to identify whether wearable data follow this trend from previous reviews (RQ 4).

The evaluation of AI models after applying an explainability method is crucial for ensuring transparency, accountability, and user understanding. Several researchers have stressed the need for formal evaluation metrics and a more systematic evaluation of the XAI methods [Adadi A, Berrada M. Peeking inside the black-box: a survey on explainable artificial intelligence (XAI). IEEE Access. 2018;6:52138-52160. [CrossRef]5,Guidotti R, Monreale A, Ruggieri S, Turini F, Giannotti F, Pedreschi D. A survey of methods for explaining black box models. ACM Comput Surv. Aug 22, 2018;51(5):1-42. [CrossRef]24]. Evaluation allows for a formal comparison of the available explanation methods [Markus AF, Kors JA, Rijnbeek PR. The role of explainability in creating trustworthy artificial intelligence for health care: a comprehensive survey of the terminology, design choices, and evaluation strategies. J Biomed Inform. Jan 2021;113:103655. [FREE Full text] [CrossRef] [Medline]25] and offers a formal method to assess whether explainability is achieved in an application [Markus AF, Kors JA, Rijnbeek PR. The role of explainability in creating trustworthy artificial intelligence for health care: a comprehensive survey of the terminology, design choices, and evaluation strategies. J Biomed Inform. Jan 2021;113:103655. [FREE Full text] [CrossRef] [Medline]25]. A previous review focused on the importance of assessing the explanation, and the results revealed that only 1 in 3 studies solely relied on anecdotal evidence for evaluation, whereas 1 in 5 studies incorporated user evaluations [Nauta M, Trienes J, Pathak S, Nguyen E, Peters M, Schmitt Y, et al. From anecdotal evidence to quantitative evaluation methods: a systematic review on evaluating explainable AI. ACM Comput Surv. Jul 13, 2023;55(13s):1-42. [CrossRef]26]. This highlights the gap in evaluating the XAI outcomes through either anecdotal evidence or user evaluations. To address this gap with wearable data, this review aimed to identify the evaluation methods used for the XAI techniques in the field of wearable data (RQ 5).

These aspects led to the following RQs, which guided our survey:

In the health care domain, how is XAI being used within the context of wearables? (RQ 1)
What is the predominant data type used in building XAI models from wearables? (RQ 2)
How is the explainability of AI models represented to the users? (RQ 3)
To what extent do model-agnostic, post hoc, and global explainability methods prevail compared to other approaches in line with existing literature? (RQ 4)
What are the evaluation methods used for various explainability techniques in the context of wearables and XAI in health care? (RQ 5)

A Typology of XAI Features

Overview

To guide the exploration of XAI within the domain of wearable technology, recent reviews on explainability have provided categorizations of XAI methods. These reviews have extensively examined various dimensions of XAI. One review focused on 6 dimensions, namely, the type of explanation, type of task, type of data, type of explainability method, type of problem, and type of model to be explained [Nauta M, Trienes J, Pathak S, Nguyen E, Peters M, Schmitt Y, et al. From anecdotal evidence to quantitative evaluation methods: a systematic review on evaluating explainable AI. ACM Comput Surv. Jul 13, 2023;55(13s):1-42. [CrossRef]26]. Another review focused on the application domain, model type, stage, scope, and output format [Islam MR, Ahmed MU, Barua S, Begum S. A systematic review of explainable artificial intelligence in terms of different application domains and tasks. Appl Sci. Jan 27, 2022;12(3):1353. [CrossRef]16]. We opted to build our survey on those previous XAI taxonomies [Vilone G, Longo L. Classification of explainable artificial intelligence methods through their output formats. Mach Learn Knowl Extr. Aug 04, 2021;3(3):615-661. [CrossRef]27] as this informs the different forms of explainability deployed in wearable technology, encompassing stage, scope, problem type, input data, and output format. This taxonomy is recent, covers relevant concepts, and has been highly cited by other researchers.

The following sections provide a brief definition of each explainability dimension.

Input Data of the XAI System

The term input data within the context of XAI systems refers to the data used to train the AI model, as shown in Figure 2 [Saranya A, Subhashini R. A systematic review of explainable artificial intelligence models and applications: recent developments and future trends. Decis Anal J. Jun 2023;7:100230. [CrossRef]13]. The nature of these input data can vary based on their source. In the context of this review, which focuses on wearable data, the primary source of input data is wearable devices. Wearable data encompass various forms, including physiological signals such as HR, ECG, and electroencephalography. In addition, wearables provide data in the form of activity metrics such as step and calorie counts. Moreover, wearable technology captures behavioral and emotional states, allowing users to log factors such as stress levels. Environmental data, including temperature and ambient noise levels, can also be collected via wearables. Furthermore, wearables can capture social data, such as monitoring the time spent on specific social media apps and related interactions.

Output Format of the XAI System

Output format of explainability refers to how the explanations are presented. This can vary from visual representations such as graphs and images to textual, numerical, rule-based, or mixed formats [Vilone G, Longo L. Classification of explainable artificial intelligence methods through their output formats. Mach Learn Knowl Extr. Aug 04, 2021;3(3):615-661. [CrossRef]27]. The presentation of explainability encompasses diverse approaches that depend on several factors, such as the target population and the nature of the input data. For instance, when the target population consists of lay users, explainability must be presented in a simple manner that aligns with their level of understanding. Conversely, when explainability is intended for medical professionals or researchers, a more detailed and in-depth approach is necessary as this population possesses a higher level of expertise and requires a comprehensive understanding of the underlying mechanisms and processes driving the model’s outcomes. In addition, each type of input data may require specific methods of explanation. The nature and characteristics of the input data play a role in determining the most effective approach for conveying their insights and interpretations.

Stage of Explainability

Stage of explainability refers to the point during the XAI process (Figure 2 [Saranya A, Subhashini R. A systematic review of explainable artificial intelligence models and applications: recent developments and future trends. Decis Anal J. Jun 2023;7:100230. [CrossRef]13]) at which a method generates explanations. The stage when the explainability is introduced can be post hoc or ante hoc. Ante hoc methods aim to consider the explainability of a model from the beginning and during training to make it naturally explainable. In contrast, post hoc methods maintain pretrained models without any structural modifications and introduce explainability mechanisms after the model’s training phase. The post hoc methods frequently use external explainer techniques during testing. An example of a post hoc method is Shapley Additive Explanations (SHAP) [Lundberg SM, Lee SI. A unified approach to interpreting model predictions. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017. Presented at: NIPS '17; December 4-9, 2017:4768-4777; Long Beach, CA. URL: https://dl.acm.org/doi/10.5555/3295222.329523028], which aims to provide a way to attribute the contribution of each feature to a model’s prediction. Conversely, an example of an ante hoc method is recurrent lexicon networks [Clos J, Wiratunga N, Massie S. Towards explainable text classification by jointly learning lexicon and modifier terms. Semantic Scholar. URL: https://www.semanticscholar.org/paper/Towards-Explainable-Text-Classification-by-Jointly-Clos-Wiratunga/ca68496cb928636b8d735b4b3ad864daad50d386 [accessed 2024-09-25] 29], a method that models lexicons as naïve gated recurrent networks while seamlessly integrating explainability principles throughout the training process. This approach ensures that the model inherently possesses explainability characteristics from the beginning.

Scope of Explainability

The scope of explainability refers to the extent to which an explanation clarifies the inner workings of the AI model. The scope of explainability can be either local or global [Das A, Rad P. Opportunities and challenges in explainable artificial intelligence (XAI): a survey. arXiv. Preprint posted online June 16, 2020. 2020. [FREE Full text]30]. Local explainability focuses on clarifying the reasoning behind individual predictions, offering insights into why a specific decision was made for a particular instance. For instance, by using techniques such as LIME [Ribeiro MT, Singh S, Guestrin C. “Why should I trust you?”: explaining the predictions of any classifier. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations. 2016. Presented at: NAACL '16; June 12-17, 2016:97-101; San Diego, CA. URL: https://aclanthology.org/N16-3020.pdf [CrossRef]31], a model may reveal that it diagnosed a rare medical condition for a patient based on a combination of relevant biomarkers and wearable data. Conversely, global explainability seeks to provide a broader perspective, offering a holistic understanding of the model’s behavior and feature importance across an entire dataset. Using methods such as SHAP [Lundberg SM, Lee SI. A unified approach to interpreting model predictions. In: Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017. Presented at: NIPS '17; December 4-9, 2017:4768-4777; Long Beach, CA. URL: https://dl.acm.org/doi/10.5555/3295222.329523028], one can analyze the model’s tendencies in an entire patient population. Thus, local explainability delves into explaining individual predictions, whereas global explainability offers insights into the model’s behavior and feature importance across an entire dataset.

Problem Types Addressed by AI Models

Each AI model that is constructed is tailored to address a specific underlying problem type, which could fall into either regression or classification categories. Studies that use regression aim to predict a continuous numerical value, whereas classification focuses on categorizing data into distinct classes or groups. Regression models are valuable for predicting patient outcomes and estimating essential health parameters. For instance, these models can be used to predict blood pressure levels in patients with diabetes based on factors such as food intake, exercise, and insulin dosage [van den Brink WJ, van den Broek TJ, Palmisano S, Wopereis S, de Hoogh IM. Digital biomarkers for personalized nutrition: predicting meal moments and interstitial glucose with non-invasive, wearable technologies. Nutrients. Oct 24, 2022;14(21):4465. [FREE Full text] [CrossRef] [Medline]32]. On the other hand, classification models are instrumental in disease diagnosis and treatment planning. For example, ML models can analyze physiological data to classify an individual’s stress levels [Kim T, Kim H, Lee HY, Goh H, Abdigapporov S, Jeong M, et al. Prediction for retrospection: integrating algorithmic stress prediction into personal informatics systems for college students’ mental health. In: Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. 2022. Presented at: CHI '22; April 29-May 5, 2022:1-20; New Orleans, LA. URL: https://dl.acm.org/doi/10.1145/3491102.3517701 [CrossRef]33].

Study Design

We followed a systematic review design using qualitative methods. We adhered to the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) statement [Moher D, Liberati A, Tetzlaff J, Altman DG, PRISMA Group. Preferred reporting items for systematic reviews and meta-analyses: the PRISMA statement. PLoS Med. Jul 21, 2009;6(7):e1000097. [FREE Full text] [CrossRef] [Medline]34].

Data Sources and Search Strategy

At the time of conducting our study, we captured the most recent publications in the rapidly evolving fields of XAI and wearable devices. Since the Defense Advanced Research Projects Agency introduced the XAI program in 2017, there has been an increase in papers focusing on XAI. Recent reviews have shown a significant rise in publications in the field of XAI during the period from 2018 to 2022, highlighting the growing research interest and developments [Saranya A, Subhashini R. A systematic review of explainable artificial intelligence models and applications: recent developments and future trends. Decis Anal J. Jun 2023;7:100230. [CrossRef]13] in the field, and this trend continues today. In this review, we collected publications covering 3 years, from January 1, 2018, to December 31, 2022, and then conducted the analysis and reporting process. We deem the qualitative outcomes of our review also representative of more recent publications in the field. We collected the publications from ACM Digital Library, IEEE Xplore, PubMed, SpringerLink, JMIR, Nature, and Scopus. The selected publishers are renowned for their high-quality and impactful research across computer science, engineering, health care, and interdisciplinary studies. In addition, our preliminary search indicated the potential of these publishers to provide research on XAI and wearable data. The search strategy encompassed a comprehensive range of terms from various domains. These included explainability concepts, AI (Medical Subject Heading; MeSH) terminology, target population (MeSH) descriptors, wellness (MeSH) terms, and technology (MeSH) keywords. XAI, ML, and DL were considered alongside target populations such as patients and physicians. Wellness-related terms such as physical activity, sleep, and exercise were also incorporated. In addition, technology aspects encompassed wearable electronic devices and sensors.

Multimedia Appendix 1

Search strategy used for the databases.

DOCX File , 14 KB Multimedia Appendix 1 provides detailed keywords for each database.

Eligibility Criteria

The inclusion criteria for this review encompassed technology-based research involving wearable devices, sensors, and mobile phones. The studies were required to incorporate explainability or ML or DL techniques and use quantified self data. The quantified self domain, which began in 2007, uses technology such as apps and wearable smart devices to monitor, measure, and quantify different aspects of daily life [Gullapalli BT, Carreiro S, Chapman BP, Ganesan D, Sjoquist J, Rahman T. OpiTrack: a wearable-based clinical opioid use tracker with temporal convolutional attention networks. Proc ACM Interact Mob Wearable Ubiquitous Technol. Sep 14, 2021;5(3):1-29. [FREE Full text] [CrossRef] [Medline]35]. In addition, the focus of the studies needed to be medical, and they had to be proceedings, book chapters, or peer-reviewed journals; written in English; and published between 2018 and 2022. On the other hand, the exclusion criteria included duplicate studies, review articles, and books, as well as workshop papers, courses, tutorials, and talks. Textbox 1 provides detailed information.

Textbox 1. Inclusion and exclusion criteria.

Inclusion criteria

Technology-based research (eg, wearable devices, sensors, and mobile phones)
Studies incorporating explainability
Studies incorporating machine learning or deep learning techniques
Studies using quantified self data
Medical-based studies
Studies published in peer-reviewed journals
Proceedings or book chapters
Studies written in English
Studies published from 2018 to 2022

Exclusion criteria

Duplicate studies and review articles
Books
Workshop papers
Courses
Tutorials
Talks

Study Screening

Screening of potentially eligible studies was performed in 3 steps: duplicate removal, title and abstract screening, and full-text screening. Duplicates were removed using Rayyan (Rayyan Systems Inc) [Ouzzani M, Hammady H, Fedorowicz Z, Elmagarmid A. Rayyan-a web and mobile app for systematic reviews. Syst Rev. Dec 05, 2016;5(1):210. [FREE Full text] [CrossRef] [Medline]36]. Additional duplicates that were not removed during this process were removed manually. Two review authors (YA and MA) independently screened the titles and abstracts for inclusion using the predefined inclusion and exclusion criteria specified previously. The other 2 review authors (DA-T and AB) independently screened a random 30% (n=197) of the included studies. Agreement among all authors was confirmed using a Cohen κ test. Table 1 provides more details.

Table 1. Cohen κ test of reviewer agreement.

Author initials		YA	DA-T	MA	AB
YA
	Cohen κ	—^a	0.63	1	0.78
	Percentage of agreement	—	94.17	100	96.58
DA-T
	Cohen κ	0.63	—	0.63	0.63
	Percentage of agreement	94.17	—	94.17	94.17
MA
	Cohen κ	1	0.63	—	0.78
	Percentage of agreement	100	94.17	—	96.57
AB
	Cohen κ	0.78	0.63	0.78	—
	Percentage of agreement	96.58	94.17	96.57	—

^aNot applicable.

All studies not discarded through this process were then screened by 1 review author (YA) in a full-text review process from which studies were identified for inclusion. Subsequently, a backward and forward referencing approach was used to further uncover potentially relevant papers. Conflicts among the 4 review authors were addressed through a majority vote system where a 3-versus-1 decision was made. Only 3 papers were found to have 2-versus-2 conflicts, which were subsequently resolved through discussions with each review author. Full data extraction, categorization, and labeling of papers were performed by 1 author (YA) and validated by 2 authors (MA and AB).

Feature Extraction

During the analysis, various features were extracted from different perspectives. Metadata features provided information about the publication and dissemination of the selected studies. Explainability features delved into the specific problem, the AI model used, and the input data used, following the taxonomy [Vilone G, Longo L. Classification of explainable artificial intelligence methods through their output formats. Mach Learn Knowl Extr. Aug 04, 2021;3(3):615-661. [CrossRef]27]. Evaluation of explainability included whether it was evaluated through user studies. For studies involving user evaluations, additional features such as the materials used, participant count, data collected, and outcome of the evaluation were extracted. Technology features focused on the type of wearable device used and its placement on the body. Table 2 provides detailed information. All features except the explainability features were created by the authors to fit the topic.

Table 2. Feature extraction.

Viewpoint and feature		Description
Metadata (PRISMA Flowchart and Selection Statistics and Geographic and Time Statistics sections)
	Country	Country of the article based on the affiliation of the first author
	Domain	Targeted domain of the study, such as a specific disease or medical condition (eg, diabetes or stress management)
	Publication date	When the article was published
Explainability [Vilone G, Longo L. Classification of explainable artificial intelligence methods through their output formats. Mach Learn Knowl Extr. Aug 04, 2021;3(3):615-661. [CrossRef]27] (Analysis Using the Typology of XAI Features section)
	Model	Type of ML^a or DL^b model that was used for performing the primary task
	Problem type	Methods for explainability can vary according to the underlying problem (classification or regression)
	Output format	Form of explanation generated for the model’s outcome
	Target group	The group targeted by the explainability generated
	Input data	Type of quantified self data used, such as step count or calories
	Stage	Refers to the stage at which a method generates explanations; can be either ante hoc or post hoc
	Scope	Refers to the scope of an explanation; can be either global or local
Evaluation(Human-Centered Evaluation section)
	Evaluation method	Method of evaluation; can be with or without end users
	Study design	The study design used for verifying the explainability
	Study method	The method used, such as qualitative, quantitative, or mixed methods
	Participants for building the model	Number of participants recruited for collecting data to build the model
	Participants for testing the model	Number of participants recruited for collecting data to test the model
	Type of participants	Participant types, such as healthy or disease or condition specific
	Data collection methods	Methods used to collect the data for the user study
	Duration of the study	Duration of the user study
	Medium of interaction	Medium used to communicate the explainability
	Study procedure	The intervention of the user study
	Data analysis	Techniques used for analyzing the user study
Technology(Technologies for Capturing Quantified Self Data section)
	Type of technology	Different types of wearable technology used, such as Empatica E4 wristband or Fitbit wearables
	Position of the wearable	Position of the wearable device, such as on the wrist or arm

^aML: machine learning.

^bDL: deep learning.

Data Synthesis and Analysis

We used descriptive statistics to describe the metadata, explainability features, human-centered evaluation, and technology used.

PRISMA Flowchart and Selection Statistics

The study selection sequence is outlined in a PRISMA flowchart, which is shown in Figure 3. Our search yielded 690 articles, of which 32 (4.6%) were identified as duplicates and removed. After screening abstracts, a further 609 (88.3%) were excluded, leaving 49 (7.1%) assessed for eligibility via full-text review. Among these, 29 (59.2%) were excluded. Forward and backward citation searching yielded an additional 6 included papers. A total of 25 studies met the inclusion criteria after completing the backward and forward citation check. Refer to

Multimedia Appendix 2

PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) checklist.

DOCX File , 32 KB Multimedia Appendix 2 for the PRISMA checklist.

**Figure 3.** PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) flow diagram of the systematic review. DL: deep learning; ML: machine learning.

Geographic and Time Statistics

Table 3 illustrates the distribution of the 25 included papers, with 10 (40%) originating from the United States, 8 (32%) originating from various European countries, 1 (4%) originating from China, and 6 (24%) originating from South Korea. This widespread interest and research focus on explainability in wearable devices indicates its global significance and relevance in diverse regions. Furthermore, Figure 4 highlights that the selected papers were primarily published between 2019 and 2022, indicating a recent uptrend in research activity in this field. This recent attention highlights the growing recognition and importance of integrating explainability with wearable technologies for advancing health care applications.

Table 3. Distribution of studies per country (N=25).

Country	Publications, n (%)
United States	10 (40)
South Korea	6 (24)
The Netherlands	2 (8)
Switzerland	2 (8)
China	1 (4)
Italy	1 (4)
Norway	1 (4)
Turkey	1 (4)
United Kingdom	1 (4)

**Figure 4.** Distribution of included papers per year.

Application Domains

In this systematic review, certain domains received more exploration and attention in the context of explainability in health care monitoring compared to others, as shown in Table 4. When assessing the depth of investigation, certain domains stand out, with health conditions and diseases being the most extensively studied, accounting for 24% (6/25) of the research focus. Notably, these studies delved into conditions such as kidney disease, multiple sclerosis, influenza, sarcopenia, osteopenia, and the ongoing COVID-19 pandemic. Similarly, sleep and activity monitoring attracted some research interest, constituting 24% (6/25) of the reviewed studies. These investigations provided valuable insights into individuals’ sleep patterns, personal health, and activity levels. Vital sign and health monitoring, encompassing parameters such as blood pressure and blood glucose levels, shared a comparable portion of the research landscape, also constituting 24% (6/25) of the total. This attention stems from the critical role of vital sign and health monitoring in the early detection and management of various health conditions, including diabetes. In comparison, domains such as mental health, stress management, and weight management collectively accounted for 16% (4/25) of the research focus. While these areas received some attention, they stood slightly behind the aforementioned domains. Finally, domains such as activity recognition, neurological monitoring (specifically brain signals), and substance abuse detection with a focus on opioid detection collectively constituted 12% (3/25) of the reviewed papers. These domains, while important, received relatively less exploration within the context of explainability in health care monitoring.

The explainability of wearable devices was applied in various health domains. It was widely used in the physical activity and health domain, such as predicting user-specific health risks [Gullapalli BT, Carreiro S, Chapman BP, Ganesan D, Sjoquist J, Rahman T. OpiTrack: a wearable-based clinical opioid use tracker with temporal convolutional attention networks. Proc ACM Interact Mob Wearable Ubiquitous Technol. Sep 14, 2021;5(3):1-29. [FREE Full text] [CrossRef] [Medline]35] and identifying effective representations of fitness goals to enhance user physical activity and trust in the system. Wearable devices were also extensively used for diabetes control. For example, some studies (2/25, 8%) used the wearable device Empatica E4 and glucose monitoring to detect hypoglycemia [Maritsch M, Föll S, Lehmann V, Bérubé C, Kraus M, Feuerriegel S, et al. Towards wearable-based hypoglycemia detection and warning in diabetes. In: Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 2020. Presented at: CHI EA '20; April 25-30, 2020:1-8; Honolulu, HI. URL: https://dl.acm.org/doi/abs/10.1145/3334480.3382808 [CrossRef]37] and hyperglycemia [Duckworth C, Guy MJ, Kumaran A, O'Kane AA, Ayobi A, Chapman A, et al. Explainable machine learning for real-time hypoglycemia and hyperglycemia prediction and personalized control recommendations. J Diabetes Sci Technol. Jan 13, 2024;18(1):113-123. [FREE Full text] [CrossRef] [Medline]2] with a lead time of up to 60 minutes. Another study focused on detecting eating moments and explaining glucose levels using wearables [van den Brink WJ, van den Broek TJ, Palmisano S, Wopereis S, de Hoogh IM. Digital biomarkers for personalized nutrition: predicting meal moments and interstitial glucose with non-invasive, wearable technologies. Nutrients. Oct 24, 2022;14(21):4465. [FREE Full text] [CrossRef] [Medline]32]. Similarly, other studies (2/25, 8%) focused on blood pressure monitoring and generated personalized lifestyle recommendations based on the user’s blood pressure [Chiang PH, Wong M, Dey S. Using wearables and machine learning to enable personalized lifestyle recommendations to improve blood pressure. IEEE J Transl Eng Health Med. 2021;9:1-13. [CrossRef]38,Leitner J, Chiang PH, Khan B, Dey S. An mHealth lifestyle intervention service for improving blood pressure using machine learning and IoMTs. In: Proceedings of the 2022 IEEE International Conference on Digital Health. 2022. Presented at: ICDH '22; July 10-16, 2022:142-150; Barcelona, Spain. URL: https://ieeexplore.ieee.org/document/9861082 [CrossRef]39]. It is worth noting that there were not many studies exploring XAI for weight management [Kim HH, Kim Y, Park YR. Interpretable conditional recurrent neural network for weight change prediction: algorithm development and validation study. JMIR Mhealth Uhealth. Mar 29, 2021;9(3):e22183. [FREE Full text] [CrossRef] [Medline]40]. This is interesting because it could have practical applications for a large user base, including nonexperts. One possible reason for this gap in research might be that weight-related data are relatively easy to understand for most people and something that can be easily measured. In contrast, more complex data types received more attention in the field of XAI.

Table 4. The different wearable data applications using explainability.

Applications	Studies, n (%)
Vital sign and health monitoring	6 (24)
Sleep and activity monitoring	6 (24)
Health conditions or diseases	6 (24)
Mental health	3 (12)
Opioid abuse and detection	1 (4)
Weight management	1 (4)
Neurological and brain signals	1 (4)
Human activity recognition	1 (4)

Stress detection and management also received attention in wearable research. A study focused on stress detection and coping strategies [Kim T, Kim H, Lee HY, Goh H, Abdigapporov S, Jeong M, et al. Prediction for retrospection: integrating algorithmic stress prediction into personal informatics systems for college students’ mental health. In: Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. 2022. Presented at: CHI '22; April 29-May 5, 2022:1-20; New Orleans, LA. URL: https://dl.acm.org/doi/10.1145/3491102.3517701 [CrossRef]33], whereas another explored the prediction of next-day perceived stress using physiological signals such as ECG [Ng A, Wei B, Jain J, Ward EA, Tandon SD, Moskowitz JT, et al. Predicting the next-day perceived and physiological stress of pregnant women by using machine learning and explainability: algorithm development and validation. JMIR Mhealth Uhealth. Aug 02, 2022;10(8):e33850. [FREE Full text] [CrossRef] [Medline]41]. Comparative studies were also conducted to compare the effectiveness of different wearables in stress detection [Chalabianloo N, Can YS, Umair M, Sas C, Ersoy C. Application level performance evaluation of wearable devices for stress classification with explainable AI. Pervasive Mob Comput. Dec 2022;87:101703. [CrossRef]42]. Furthermore, wearables were used in medical applications such as opioid detection [Gullapalli BT, Carreiro S, Chapman BP, Ganesan D, Sjoquist J, Rahman T. OpiTrack: a wearable-based clinical opioid use tracker with temporal convolutional attention networks. Proc ACM Interact Mob Wearable Ubiquitous Technol. Sep 14, 2021;5(3):1-29. [FREE Full text] [CrossRef] [Medline]35] and identifying COVID-19 [Gadaleta M, Radin JM, Baca-Motes K, Ramos E, Kheterpal V, Topol EJ, et al. Passive detection of COVID-19 with wearable sensors and explainable machine learning algorithms. NPJ Digit Med. Dec 08, 2021;4(1):22. [CrossRef]43]. These studies highlight the diverse applications and potential of explainable wearable devices in health care.

Analysis Using the Typology of XAI Features