Published on in Vol 26 (2024)

Preprints (earlier versions) of this paper are available at https://preprints.jmir.org/preprint/58187, first published .
Detection of Sleep Apnea Using Wearable AI: Systematic Review and Meta-Analysis

Detection of Sleep Apnea Using Wearable AI: Systematic Review and Meta-Analysis

Detection of Sleep Apnea Using Wearable AI: Systematic Review and Meta-Analysis

Review

1AI Center for Precision Health, Weill Cornell Medicine-Qatar, Qatar Foundation, Doha, Qatar

2Health Informatics Department, College of Health Science, Riyadh, Saudi Electronic university, Riyadh, Saudi Arabia

3Department of Computer Science and Software Engineering, United Arab Emirates University, Al Ain, Abu Dhabi, United Arab Emirates

Corresponding Author:

Alaa Abd-alrazaq, PhD

AI Center for Precision Health

Weill Cornell Medicine-Qatar

Qatar Foundation

A31 Luqta street

Education City

Doha

Qatar

Phone: 974 55787845654

Email: aaa4027@qatar-med.cornell.edu


Background: Early detection of sleep apnea, the health condition where airflow either ceases or decreases episodically during sleep, is crucial to initiate timely interventions and avoid complications. Wearable artificial intelligence (AI), the integration of AI algorithms into wearable devices to collect and analyze data to offer various functionalities and insights, can efficiently detect sleep apnea due to its convenience, accessibility, affordability, objectivity, and real-time monitoring capabilities, thereby addressing the limitations of traditional approaches such as polysomnography.

Objective: The objective of this systematic review was to examine the effectiveness of wearable AI in detecting sleep apnea, its type, and its severity.

Methods: Our search was conducted in 6 electronic databases. This review included English research articles evaluating wearable AI’s performance in identifying sleep apnea, distinguishing its type, and gauging its severity. Two researchers independently conducted study selection, extracted data, and assessed the risk of bias using an adapted Quality Assessment of Studies of Diagnostic Accuracy-Revised tool. We used both narrative and statistical techniques for evidence synthesis.

Results: Among 615 studies, 38 (6.2%) met the eligibility criteria for this review. The pooled mean accuracy, sensitivity, and specificity of wearable AI in detecting apnea events in respiration (apnea and nonapnea events) were 0.893, 0.793, and 0.947, respectively. The pooled mean accuracy of wearable AI in differentiating types of apnea events in respiration (normal, obstructive sleep apnea, central sleep apnea, mixed apnea, and hypopnea) was 0.815. The pooled mean accuracy, sensitivity, and specificity of wearable AI in detecting sleep apnea were 0.869, 0.938, and 0.752, respectively. The pooled mean accuracy of wearable AI in identifying the severity level of sleep apnea (normal, mild, moderate, and severe) and estimating the severity score (Apnea-Hypopnea Index) was 0.651 and 0.877, respectively. Subgroup analyses found different moderators of wearable AI performance for different outcomes, such as the type of algorithm, type of data, type of sleep apnea, and placement of wearable devices.

Conclusions: Wearable AI shows potential in identifying and classifying sleep apnea, but its current performance is suboptimal for routine clinical use. We recommend concurrent use with traditional assessments until improved evidence supports its reliability. Certified commercial wearables are needed for effectively detecting sleep apnea, predicting its occurrence, and delivering proactive interventions. Researchers should conduct further studies on detecting central sleep apnea, prioritize deep learning algorithms, incorporate self-reported and nonwearable data, evaluate performance across different device placements, and provide detailed findings for effective meta-analyses.

J Med Internet Res 2024;26:e58187

doi:10.2196/58187

Keywords



Background

Sleep apnea refers to a health condition where airflow either ceases or decreases episodically during sleep [1]. According to the American Academy of Sleep Medicine, sleep apnea is categorized as a sleep disorder wherein an individual experiences challenges pertaining to breathing when they are asleep [2]. Primarily, there are 3 kinds of sleep apnea. First, obstructive sleep apnea (OSA) is the consequence of issues with the operation of the upper respiratory tract and is considered a chronic breathing disorder associated with sleep [3]. By contrast, a condition where signals required to regulate breathing muscles are not generated or transmitted is referred to as central sleep apnea (CSA). Complex or mixed sleep apnea is a condition that involves a combination of both OSA and CSA [4]. It often begins as OSA and evolves into CSA [4].

According to global estimates, approximately 936 million adults aged between 30 and 69 years experience OSA [5]. A systematic review showed that the global prevalence of OSA is between 9% and 38% [6]. In the United States alone, the number of people struggling with sleep apnea may exceed 30 million, as per the American Medical Association [7]. Furthermore, studies showed that >80% of sleep apnea cases remain undiagnosed [7-10]. If not diagnosed and treated, sleep apnea may result in severe health issues, such as mood disorders [11-13], cardiovascular diseases [14,15], cognitive deterioration [16,17], increased risk of road accidents [18,19], and all-cause mortality [20,21]. Therefore, the timely detection of sleep apnea for prompt initiation of treatment is imperative.

Conventionally, polysomnography is a comprehensive diagnostic test used in the field of sleep medicine to evaluate and monitor various physiological parameters during sleep to help diagnose sleep disorders, such as sleep apnea [22]. Despite being considered the gold standard for diagnosing sleep apnea, it does have some disadvantages and limitations: (1) it is relatively expensive; (2) access to sleep laboratories may be limited, particularly in certain geographic areas; (3) it can be inconvenient for patients, as they must spend a full night in a sleep laboratory with numerous sensors and electrodes attached to their body; (4) the physiological parameters recorded using polysomnography may not fully reflect the individual’s typical sleep behavior due to a first-night effect in a sleep laboratory, where sleep patterns are different from those at home due to the novelty of the environment; and (5) it is a subjective process, as analyzing polysomnography data depends on sleep clinicians’ experience [22-24]. Hence, there is a dire need to develop and integrate automated technologies and tools that are more efficient and capable of addressing the challenges posed by the current system of diagnosing sleep apnea.

One of the promising solutions that have been used to address the limitations of polysomnography is wearable artificial intelligence (AI), which refers to the integration of AI algorithms and techniques into wearable devices (eg, smartwatches, fitness trackers, and smart glasses) to collect and analyze data (eg, heart rate [HR], respiration rate, and oxygen saturation) to offer various functionalities and insights. Sleep apnea can be efficiently detected with wearable AI due to its convenience, accessibility, affordability, objectivity, and real-time monitoring capabilities. Various types of wearable devices can be used for gathering biomarkers associated with sleep apnea: on-body devices (worn directly on the body or skin), near-body devices (worn close to the body but not touching the skin), in-body devices (implanted within the body), and electronic textiles (clothes with built-in technology). Wearable AI can be used for (1) detecting apnea events in respiration, (2) identifying the type of apnea events in respiration (hypopnea, OSA, CSA, and mixed), (3) detecting patients with sleep apnea, and (4) estimating the severity of sleep apnea.

Research Problem and Aim

In the last decade, numerous investigations have been carried out to evaluate the effectiveness of wearable AI in detecting sleep apnea. Consolidating the results of these studies can contribute to forming more conclusive judgments regarding the effectiveness of wearable AI in detecting sleep apnea. Previous literature reviews attempted to summarize the evidence, but they were constrained by the following limitations. First, most previous reviews were literature reviews rather than systematic reviews [22-28]. Second, many reviews concentrated solely on OSA rather than considering all types of sleep apnea [22,23,25-29]. Third, some reviews focused on a specific type of data, such as HR variability [2,25] and electrocardiography [1,2,25], for sleep apnea detection. Fourth, main databases, such as Embase [1,2,22-29], ACM [1,2,22-29], IEEE [22-25,27-29], and Scopus [1,2,22-25,28], were not incorporated in the searches of previous reviews. Fifth, all prior reviews focused on the performance of various sensors rather than specifically addressing wearable devices [1,2,22-29]. Sixth, one of the reviews focused on non-AI tools for detecting sleep apnea [29]. Seventh, the risk of bias was not taken into account in most of the reviews [1,2,22-28]. Finally, none of these reviews used statistical techniques (eg, meta-analysis) to aggregate findings from previous studies [1,2,22-29]. Hence, this review aimed to bridge all these identified gaps with a focus on examining the performance of wearable AI when it comes to both the detection and prediction of sleep apnea, thereby making it the first of its kind in this field.


Overview

This review was undertaken and reported in line with the PRISMA-DTA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Diagnostic Test Accuracy) guidelines [30]. Multimedia Appendix 1 provides this review’s PRISMA-DTA checklist. Its protocol has been registered with the PROSPERO (CRD42023495554).

Search Strategy

On December 7, 2023, a comprehensive search was performed across the following electronic repositories: MEDLINE (via Ovid), Embase (via Ovid), ACM Digital Library, Scopus, IEEE Xplore, and Google Scholar. MEDLINE and Embase were chosen due to their reputation as authoritative sources for biomedical and health sciences literature. ACM Digital Library and IEEE Xplore were selected for their status as leading repositories for publications in computing, information technology, electrical engineering, and electronics. Scopus was included because of its comprehensive coverage of scientific literature across multiple disciplines, including health sciences, engineering, computer science, and social sciences. Google Scholar was incorporated, as it indexes scholarly literature from diverse sources and serves as a valuable supplementary tool for identifying relevant studies and gray literature. We set an autoalert to run the search query biweekly for 3 months, concluding on March 6, 2024. Because Google Scholar returned a massive number of results, this review assessed only the first 100 results (equivalent to 10 pages). To identify additional relevant studies, we examined the references cited in the studies already included (backward reference list checking) and studies that had cited the included studies (forward reference list checking).

Relevant literature reviews were assessed, and 2 experts holding doctoral degrees in digital health and health informatics were consulted to compile and collate search terms [31]. The final search query combined three categories of search terms: (1) terms related to AI (eg, “artificial intelligence,” “machine learning,” and “deep learning”), (2) terms associated with wearable devices (eg, “wearable,” “smartwatch,” and “smartband”), and (3) terms linked to sleep apnea (eg, “sleep apnea” and “sleep aponea”). The Boolean operators “OR” and “AND” were used to combine terms within the same category and across different categories, respectively. The specific search query used for searching each database is detailed in Multimedia Appendix 2 for reference.

Study Eligibility Criteria

This review included studies that used AI algorithms to detect sleep apnea or predict its occurrence by leveraging data derived from wearable devices. The research articles deemed suitable for inclusion in this review were those that concentrated on individuals diagnosed with or suspected of having any type of sleep apnea. No limitations were imposed based on age, gender, or ethnicity. In addition, for inclusion in this review, studies were required to evaluate the performance of AI algorithms in detecting or predicting apnea events in respiration, identifying types of apnea events in respiration, detecting patients with sleep apnea, or estimating the severity of sleep apnea. The studies had to provide the confusion matrix or performance metrics (eg, accuracy, sensitivity, and specificity). Studies using AI solely for detecting sleep quality, sleep stages, or other sleep disorders or forecasting the outcomes of sleep apnea interventions were excluded. This review included studies that gathered data using, at a minimum, on-body wearable devices. Conversely, research papers exclusively relying on the following devices for data collection were not considered: nonwearable devices; handheld devices (eg, mobile phones); near-body wearable devices; in-body wearable devices; wearable devices physically connected to nonwearable devices; and wearable devices necessitating expert oversight, such as those demanding precise electrode placement. This review included only peer-reviewed journal articles, conference papers, and dissertations, without restrictions on study setting, study design, reference standard (ie, ground truth), year of publication, or country of study. However, papers not published in English were excluded from consideration. The decision to exclude studies not written in English was based on practical considerations related to resource constraints and the accessibility of non-English literature. While including studies in languages other than English may enhance the comprehensiveness of the review, it can also pose challenges in terms of language translation, interpretation, and the synthesis of findings. Furthermore, English is widely recognized as the dominant language of scholarly communication in many scientific disciplines, including health care and biomedical research. We have transparently acknowledged its implications for the review’s scope and findings in the Limitations section. We also excluded studies that fell into the categories of editorials, reviews, protocols, posters, conference abstracts, and research highlights. The decision to exclude these publication types was primarily guided by the need to maintain the focus and rigor of our review process. While editorials, reviews, and research highlights provide valuable insights into and perspectives on a topic, they typically do not present original research findings or empirical data that meet the objectives of our study. Similarly, protocols, posters, and conference abstracts often offer preliminary or incomplete results that may not undergo peer review or provide sufficient detail for a comprehensive analysis. This helps maintain the quality and reliability of the evidence synthesized in our review while minimizing the risk of bias introduced by including non–peer-reviewed or preliminary findings.

Study Selection

The study selection process comprised 3 key steps. Initially, the EndNote (version X9; Clarivate) software was used to eliminate any duplicate papers from the initial pool. Subsequently, 2 reviewers assessed the titles and abstracts of the remaining studies, separately deciding on their inclusion. Finally, the reviewers independently scrutinized the full texts of the remaining articles. Any discrepancies were deliberated upon and resolved through discussion. The level of agreement between the reviewers was substantial, indicated by a κ score of 0.92 for the evaluation of titles and abstracts and 0.95 for the examination of full texts.

Data Extraction

Initially, 5 studies were used to develop and test the data extraction form shown in Multimedia Appendix 3. Independently, 2 reviewers used Excel (Microsoft Corp) to extract metadata from the studies, participants’ characteristics, wearable devices’ specifications, and AI algorithms’ features. In addition to the previously mentioned extracted data, we collected the highest performance score for each metric, algorithm, and measured outcome. When studies provided raw data or confusion matrices, we calculated all possible performance metrics (eg, accuracy, specificity, and sensitivity). In case of the unavailability of such data, we attempted to obtain them by reaching out to the studies’ first and corresponding authors. Any discrepancies between the 2 reviewers were addressed through discussion between them.

Risk of Bias and Applicability Appraisal

To evaluate the quality of the studies included in our review, we adapted the Quality Assessment of Studies of Diagnostic Accuracy-Revised (QUADAS-2) [32] tool to align with our review’s specific objectives. This adaptation involved substituting some of the original criteria, which were not applicable to our context, with more relevant criteria from the Prediction Model Risk of Bias Assessment Tool [33]. We modified the QUADAS-2 tool to encompass 4 main domains tailored to our review: “participants,” “index test” (focused on AI algorithms), “reference standard” (representing the ground truth), and “analysis.” For each domain, we developed 4 targeted questions aligned with our review’s objectives. In addition, our evaluation assessed the practical applicability of the results derived from the first 3 domains. To optimize our adapted tool, we initially tested it on 5 studies for fine-tuning purposes. The included studies were independently evaluated by 2 reviewers using the modified QUADAS-2 tool (Multimedia Appendix 4). Any differences in their assessments were discussed and resolved through consensus.

Data Synthesis

We used both narrative and statistical techniques to synthesize the data extracted from the included studies. In our narrative synthesis, we used textual descriptions and tabulated summaries to elucidate the characteristics of the included studies, encompassing study metadata, wearable devices, and AI techniques. As for the statistical approach, a meta-analysis was carried out when at least 2 different studies presented enough data to perform the analysis. We conducted conventional meta-analyses for results associated with the following outcomes, given that they were extracted from different unique studies (ie, independent effect sizes): identification of types of apnea events in respiration, detection of patients with sleep apnea, and estimation of the severity of sleep apnea. Specifically, DerSimonian-Laird random-effects models [34] using the Freeman-Tukey double arcsine transformation [35,36] were performed to pool the extracted results. This method considers variations arising from sampling and accounts for heterogeneity in estimates. The analysis was carried out using the meta toolkit within R (version 4.2.2; The R Foundation) [37].

We also performed multilevel meta-analyses for results related to the detection of apnea events in respiration, as certain results originated from the same study (ie, dependent effect sizes) [34,38]. Multilevel meta-analyses were used to address this dependency in effect sizes, thereby minimizing the risk of type I errors. These analyses were carried out using the metafor toolkit within R (version 4.2.2) [35].

When applicable, subgroup meta-analyses were conducted to explore how different factors might influence the effectiveness of wearable AI [34,38]. These factors included AI algorithms, the type of algorithm (ie, machine learning [ML] vs deep learning), the number of participants, the type of sleep apnea, the status of the wearable device (ie, commercial vs noncommercial), the placement of the wearable device, data set size, data type, ground truth, and validation method. We considered differences in results between subgroups to be statistically significant if the statistical probability (P value) was <.05.

To assess how consistent the studies were in their findings (heterogeneity), we used 2 statistical tests. The first test is the Cochrane Q statistic, which indicates whether the observed differences in results could be due to chance alone. A P value <.05 indicates significant heterogeneity, meaning the results varied more than expected by chance. The second test is the I2 statistic, which quantifies the proportion of observed variability due to real differences between studies rather than differences by chance [35,39]. Heterogeneity was considered insignificant when I2 ranged from 0% to 40%, moderate when I2 fell within the 30% to 60% range, substantial when I2 ranged from 50% to 90%, or considerable when I2 extended from 75% to 100%.


Search Results

As depicted in Figure 1, a total of 615 citations were retrieved when the above-identified databases were searched. Of the retrieved citations, 161 (26.2%) duplicates were removed using EndNote X9, leaving 454 (73.8%) studies. Further, 362 (79.7%) studies were removed after screening the titles and abstracts of these 454 studies. After retrieving and reading the full text of all the remaining 92 (20.3%) studies, it was determined that 57 (62%) of these studies were ineligible for inclusion. The main reasons for exclusion were that they did not use wearable devices (23/92, 25%), did not use AI algorithms (11/92, 12%), did not focus on sleep apnea (6/92, 7%), were irrelevant publication types (16/92, 17%), or were not written in English (1/92, 1%). We identified 3 additional studies relevant to this review through backward reference list checking. In total, 38 studies were included in this review [40-77], and 27 (71%) of them were eligible for meta-analyses [40,41,45-49,52-55,57,58,61-64,66,68,69,71-77].

Figure 1. PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) flowchart of the study selection process. AI: artificial intelligence.

Characteristics of the Included Studies

As displayed in Table 1, the number of studies has varied over the years, with the highest number reached in 2020 (11/38, 29%). While the included studies were conducted in 16 different countries, the studies were predominantly from the United States (9/38, 24%). Most of the studies were journal articles (29/38, 76%), but conference papers also made a substantial contribution (9/38, 24%). The average number of participants across studies was 155.9 (SD 374.9). The number of participants ranged from 4 to 2252. The mean age of participants was identified in 25 (66%) of the 38 included studies and ranged from 25.6 to 61.1 years, with an average of 47.3 (SD 9.3) years. Across 25 studies reporting the proportion of female participants, female participants constituted an average of 37.4% of the total participants, ranging from 12% to 65%. A total of 20 studies reported the BMI, which ranged from 22.1 to 38.7 kg/m2, with an average of 28.6 (SD 3.81) kg/m2. About two-thirds (25/38, 66%) of studies did not focus on a specific type of sleep apnea. The characteristics of each included study are listed in Multimedia Appendix 5.

Table 1. Characteristics of the included studies (N=38).
FeaturesStudiesReferences
Year of publication, n (%)

20238 (21)[54,55,60,64,67,71,75,76]

20229 (24)[40,42,46,52,61,63,68,73,77]

20214 (11)[43,53,70,72]

202011 (29)[41,45,47,49,51,57-59,65,66,74]

20191 (3)[48]

20183 (8)[44,50,69]

20141 (3)[62]

20131 (3)[56]
Country of publication, n (%)

United States9 (24)[43,44,46,47,55-57,64,72]

China8 (21)[42,63,67,68,70,71,75,76]

South Korea4 (11)[45,51,73,77]

Canada2 (5)[48,49]

Italy2 (5)[40,60]

Norway2 (5)[53,54]

Taiwan2 (5)[41,69]

Others (<2)9 (24)[50,52,58,59,61,62,65,66,74]
Publication type, n (%)

Conference paper9 (24)[43-45,48,50,51,59,62,76]

Journal article29 (76)[40-42,46,47,49,52-58,60,61,63-75,77]
Number of participants, mean (SD; range)155.9 (374.9; 4-2252)[40-77]
Age (y)

Value, mean (SD; range)47.3 (9.3; 25.6-61.1)[40,41,43,45-49,53,54,57-62,64,66,67,69,71-73,75,77]

Not reported, n (%)13 (34)[42,44,50-52,55,56,63,65,68,70,74,76]
Female participants (%)

Value, mean (SD; range)37.4 (14.76; 12-65)[40,41,43,46-49,52-54,57-62,64,66-69,71-73,75]

Not reported, n (%)13 (34)[42,44,45,50,51,55,56,63,65,70,74,76,77]
BMI (kg/m2)

Value, mean (SD; range)28.6 (3.813; 22.1-38.7)[40,41,45,47-49,53,54,57,58,61,62,64,66,68,69,71-73,77]

Not reported, n (%)18 (47)[42-44,46,50-52,55,56,59,60,63,65,67,70,74-76]
Sleep apnea type, n (%)

All25 (66)[41,43,45,46,48,49,51-55,57,59-63,66,67,69,70,73-75,77]

Obstructive sleep apnea12 (32)[40,42,44,47,56,58,64,65,68,71,72,76]

Central sleep apnea1 (3)[50]

Features of Wearable Devices

Commercial wearable devices constituted the majority of wearable devices in the included studies (24/38, 63%; Table 2). The most mentioned wearable device in the included studies was the Belun Ring (3/38, 8%). Wearable devices are placed on various body parts, with the chest (16/38, 42%), wrist (11/38, 29%), and abdomen (9/38, 24%) being the most common locations. Wearable devices were worn for 1 full night (6-8 hours) in 29 studies (76%). The features of wearable devices in each included study are shown in Multimedia Appendix 6.

Table 2. Features of wearable devices (N=38).
FeaturesStudies, n (%)References
Status of the wearable device

Commercial24 (63)[40,42-47,51,53,54,57,58,60-66,70,72,73,76,77]

Noncommercial14 (37)[41,48-50,52,55,56,59,67-69,71,74,75]
Name of the wearable device

Belun Ring3 (8)[47,64,72]

Patch2 (5)[48,49]

T-REX TR100A2 (5)[73,77]

Others17 (45)[43-46,51,53,54,57,60-62,64-66,71,74,76]

Not reported14 (37)[41,42,50,52,55,56,58,59,63,67-70,75]
Placement of the wearable device

Chest16 (42)[41,46,50,52-54,56,57,59-62,65,66,68,69]

Wrist11 (29)[40,42-45,51,54,58,63,70,76]

Abdomen9 (24)[41,53,54,57,65,69,73,74,77]

Finger6 (16)[47,53,64,67,69,72]

Neck2 (5)[48,49]

Nose2 (5)[53,75]

Face1 (3)[55]
Duration of wearing the wearable device

1 full night29 (76)[41-43,45-49,53-56,58,60-69,71-73,75-77]

<1 full night3 (8)[50,52,59]

>1 full night2 (5)[44,51]

Not reported3 (8)[57,70,74]

Features of AI

Classification was the dominant problem-solving approach used in the included studies (38/38, 100%; Table 3). Various AI algorithms were used in the included studies, with convolutional neural networks (CNNs) being the most common (14/38, 37%). Among the 38 included studies, most studies (n=37, 97%) used AI to detect the current sleep apnea, whereas 3 (8%) studies used wearable AI to predict sleep apnea before its occurrence. The mean data set size reported in 28 (74%) studies was 60,554 (SD 133,059), with the range spanning from 12 to 561,480. Most studies (36/38, 95%) used closed-source data, while only 2 (5%) of 38 studies used open-source data. Data were gathered through wearable devices in all studies (38/38, 100%), via self-reported questionnaires in 3 (8%) studies, and using nonwearable devices (eg, smartphones) in 2 (5%) studies. Respiration data (eg, respiratory rate and respiratory efforts; 25/38, 66%) and HR data (eg, HR, HR variability, and interbeat interval; 21/38, 55%) were the most frequently used data for developing the models in the included studies. The number of features reported in 21 (55%) of the 38 studies ranged from 3 to 212, with an average of 44.3 (62.5). Most studies used polysomnography as the ground truth assessment method (26/38, 68%), followed by the wearable device (8/38, 21%) and the context of the experiment (eg, performing different patterns of breathing; 4/38, 11%). In 28 studies that reported the assessor of the ground truth, sleep technicians were the most common assessors (23/38, 61%), followed by sleep physicians (8/38, 21%). American Academy of Sleep Medicine guidelines were followed in 84% (32/38) of studies to assess the ground truth. Train-test split was the most common approach used in the included studies to validate the performance of AI models (20/38, 53%), followed by k-fold cross-validation (17/38, 45%). The included studies used wearable AI to detect apnea events in respiration (24/38, 63%) and patients with sleep apnea (15/38, 40%) and to identify the severity of sleep apnea (21/38, 55%) and types of apnea events in respiration (8/38, 21%). The features of AI in each included study are described in Multimedia Appendix 7.

Table 3. Features of artificial intelligence (N=38).
FeaturesStudiesReferences
Problem-solving approaches, n (%)

Classification38 (100)[40-77]

Regression15 (40)[45-49,54,56,58,62,64,70,71,76]
AIa algorithms, n (%)

Convolutional neural network14 (37)[48,49,53-55,57-60,63,64,68,73,75,77]

Random forest10 (26)[40,42,43,46,52-54,67,70,73]

Long short-term memory9 (24)[41,44,45,48,49,52-54,66]

Support vector machines8 (21)[43,52,53,56,62,67,69,73]

K-nearest neighbors7 (18)[42,51,52,54,61,67,70]

Artificial neural network5 (13)[47,51,65,72,74]

Multilayer perceptron5 (13)[40,50,53,54,73]

Naive Bayes5 (13)[41,42,51,52,70]

Decision trees4 (11)[42,43,52,70]

AdaBoost3 (8)[41,52,61]

Others (<3)5 (13)[52,67,71,73,76]
Aim of AI algorithms, n (%)

Detection37 (97)[40-55,57-76]

Prediction3 (8)[44,56,65]
Data set size

Value, mean (SD; range)60,554 (133,059; 12-561,480)[40,42-44,46-49,52-55,58-66,68,72-76]

Not reported, n (%)10 (26)[41,45,50,51,56,57,67,69-71]
Data sources, n (%)

Closed source36 (95)[40-43,45-64,66-77]

Open source2 (5)[44,65]
Data types, n (%)

Wearable device data38 (100)[40-77]

Self-reported data3 (8)[44,69,76]

Nonwearable device data2 (5)[69,76]
Data input to AI algorithms, n (%)

Respiration data25 (66)[41,43,45,46,48-50,52-54,56-62,65,66,69,73-77]

Heart rate21 (55)[40,42,45,47,50-52,56,58,62-64,66-68,70-73,76,77]

Body movement14 (37)[40,44,45,47,51,52,58,60,62,64,66,71,72,76]

Oxygen saturation13 (34)[41,46,47,53,54,56,60,64,67,69,71,72,76]

Acoustic data3 (8)[56,60,76]

Others (<3)10.5 (4)[44,55,58,76]
Number of features

Value, mean (SD; range)44.33 (62.5; 3-212)[40-43,45-50,52,56-58,61,64,65,69,70,72,73]

Not reported, n (%)17 (45)[44,51,53-55,59,60,62,63,66-68,71,73-77]
Ground truth assessment methods, n (%)

Polysomnography26 (68)[41-43,45-49,55,58,61-73,75-77]

Wearable device8 (21)[40,44,51,53,54,56,60,74]

Context4 (11)[50,52,57,59]
Guidelines for ground truth assessment, n (%)

American Academy of Sleep Medicine guidelines32 (84)[40-49,51,53-55,58,61-77]

Not reported6 (16)[50,52,56,57,59,60]
Assessors of ground truth, n (%)

Sleep technician23 (61)[40-42,44-47,53-56,58,60-62,64-66,69,72,73,76,77]

Sleep physician8 (21)[42,57,63,64,68,71,72]

Not reported10 (26)[48-52,59,68,70,74,75]
Validation methods, n (%)

Train-test split20 (53)[41,45,47,51,52,55,57-60,63-65,68,71-75,77]

K-fold cross-validation17 (45)[42-44,46,48-51,53,54,56,61,63,66,68,70,76]

Leave-one-out cross-validation5 (13)[40,54,60,62,69]

Not reported1 (3)[67]
Measured outcomes, n (%)

Apnea events in respiration24 (63)[41,43-46,50-59,61,63,66-68,73-75,77]

Sleep apnea severity21 (55)[40-43,45-49,53,58,62,64,65,69-73,76,77]

Patients with sleep apnea15 (40)[40-44,46-48,53,58,63,64,69,71,72,76]

Type of apnea events8 (21)[41,46,53,57,58,60,66,75]

aAI: artificial intelligence.

Results of Risk-of-Bias Appraisal

Nearly half of the included studies (17/38, 45%) reported comprehensive details to determine whether an appropriate consecutive or random sample of eligible participants was used. Over half of the studies (22/38, 58%) avoided inappropriate exclusions. A substantial majority, 30 (79%) out of 38 studies, ensured a balanced number of patients across subgroups. In addition, around two-thirds (25/38, 66%) of the studies reported a sufficient sample size. Consequently, a little less than half of the studies (16/38, 42%) were assessed as having a low risk of bias in the “selection of participants” domain, as shown in Figure 2. In terms of matching participants to the predefined requirements in the review question, a low level of concern was identified in nearly 40% (15/38, 40%) of the included studies, as shown in Figure 3.

Figure 2. Results of the assessment of risk of bias in the included studies.

A substantial majority of the included studies comprehensively detailed their AI models, with 34 (89%) out of 38 studies providing thorough descriptions. Almost all, 35 (92%) out of 38 studies, clearly reported the features (predictors) used. Moreover, an overwhelming majority, 36 (95%) out of 38 studies, ensured that these features were sourced without prior knowledge of the outcome data. Consistency in feature assessment across participants was observed in 35 (92%) out of 38 studies. Consequently, the potential for bias in the “index test” domain was assessed as low in the vast majority of the studies (32/38, 84%), as shown in Figure 2. In addition, 32 (84%) out of 38 studies were found to have minimal concerns regarding the alignment between the model’s predictors and the review question’s criteria, as illustrated in Figure 3.

Figure 3. Results of the assessment of applicability concerns in the included studies.

In most of the included studies (32/38, 84%), the outcome of interest, specifically sleep apnea, was consistently assessed using appropriate methodologies. Nearly all studies (37/38, 97%) defined and determined the outcome in a uniform manner for all participants. An overwhelming majority of the studies (36/38, 95%) determined the outcome without prior knowledge of the predictor information. In a substantial portion of the studies (33/38, 87%), the diagnostic test was conducted for an appropriate duration to ensure accurate results. As a result, the potential for bias in the “reference standard” domain was deemed low in the vast majority of the studies (32/38, 84%), as shown in Figure 2. In addition, the same number of studies (32/38, 84%) showed minimal concerns regarding any discrepancies between the outcome’s definition, timing, or determination and the review question’s criteria, as indicated in Figure 3.

Finally, a significant majority of the studies (34/38, 89%) ensured the inclusion of all enrolled participants in the data analysis. A substantial number of these studies (32/38, 84%) executed proper data preprocessing. Similarly, a high proportion (34/38, 89%) adopted suitable measures to evaluate the performance of their models. Nearly half of the studies (17/38, 45%) demonstrated an appropriate split among training, validation, and test sets. However, the risk of bias in the validation methods used by the remaining studies remained unclear due to insufficient information being provided. Consequently, slightly more than half of the studies (20/38, 53%) were deemed to have a low risk of bias in the “analysis” domain, as indicated in Figure 2. A detailed breakdown of the “risk of bias” and “applicability concerns” for each domain in every study is available in Multimedia Appendix 8.

Results of the Studies

As mentioned earlier, meta-analyses were carried out to pool results related to 4 outcomes: detection of apnea events in respiration, identification of types of apnea events in respiration, detection of patients with sleep apnea, and estimation of the severity of sleep apnea. The following subsections present the results of the meta-analyses for each outcome.

Apnea Events in Respiration
Accuracy

We conducted meta-analyses of 36 estimates of accuracy derived from 2,702,305 respiratory events across 17 (45%) of the 38 studies (Table 4). The pooled mean accuracy of these estimates was 0.893 (95% CI 0.82-0.94). The meta-analyzed evidence exhibited considerable statistical heterogeneity (P<.001; I2=100%). Further, Table 4 shows that there is a statistically significant difference in the pooled mean accuracy between subgroups in the “algorithms” group (P<.001) and “type of algorithms” group (P=.02), whereas no statistically significant difference (P>.05) was found in the pooled mean accuracy between subgroups in the remaining groups.

Table 4. Pooled mean estimates of accuracy in detecting respiratory events by several factors.
GroupsStudies (N=38), na (%)Sample size, NAccuracy (%), rangePooled mean accuracy (%; 95% CI)Heterogeneity measuresTest for subgroup differences, P value





Tau2Q (P value)I2 (%)
Algorithm<.001b

Convolutional neural network9 (24)437,5930.76-0.970.884 (0.84-0.92)0.01028523.6 (<.001)100

Recurrent neural networks6 (16)665,0910.73-0.920.848 (0.79-0.90)0.008618,227 (<.001)100

Random forest4 (11)807,2250.81-0.960.867 (0.77-0.94)0.015441,658 (<.001)100

Support vector machine3 (8)245,7450.79-0.840.807 (0.77-0.84)0.0014221.4 (<.01)99

K-nearest neighbors3 (8)233,6470.69-0.770.736 (0.69-0.78)0.001978.1 (<.01)97

AdaBoost2 (5)56290.71-0.720.716 (0.71-0.73)0.00000.99 (.32)0

Multilayer perceptron2 (5)242,5050.80-0.810.804 (0.79-0.86)0.000111.3 (<.01)91

Quadratic discriminant analysis2 (5)17,7270.60-0.730.664 (0.53-0.79)0.0103218.8 (<.01)100
Type of algorithm.02

Machine learning18 (47)1,334,1800.60-0.970.831 (0.65-0.92)0.1926289,302.4 (<.001)100

Deep learning18 (47)1,368,1250.73-1.000.899 (0.82-0.94)0.2659160,871.2 (<.001)100
Sample size, n.93

<10024 (63)326,3220.60-1.000.885 (0.74-0.95)0.4783111,302.0 (<.001)100

100-2003 (8)276,5720.82-0.930.896 (0.82-0.95)0.00862658.6 (<.001)100

<2009 (24)2,099,4110.77-0.970.907 (0.75-0.97)0.1238314,844.0 (<.001)100
Type of sleep apnea>.99

All34 (89)2,635,1880.60-1.000.893 (0.80-0.94)0.3616457,420.0 (<.001)96

Obstructive sleep apnea2 (5)67,1170.88-0.900.892 (0.87-0.91)0.000725.4 (<.001)100
Status of the WDc.05

Commercial22 (58)2,581,5050.69-0.970.844 (0.78-0.89)0.0705370,236.6 (<.001)100

Noncommercial14 (37)120,8000.60-1.000.947 (0.80-0.99)0.624373,404.3 (<.001)100
Placement of the WD.61

Chest14 (37)719,0550.60-0.970.845 (0.64-0.94)0.2173143,429.2 (<.001)100

Abdomen7 (18)127,3750.73-1.000.951 (0.13-0.99)1.465084,686.0 (<.001)100

Chest and abdomen3 (8)261,9490.76-0.930.880 (0.76-0.96)0.01862849.4 (<.001)100

Wrist3 (8)113,4320.82-0.880.841 (0.80-0.88)0.0024790.1 (<.001)100
Data set size, n.90

<10,00012 (32)34,0880.60-0.970.863 (0.39-0.98)0.47471136.8 (<.001)100

10,000-50,00011 (29)173,1710.73-1.000.907 (0.69-0.97)0.601689,963.2 (<.001)100

>50,00010 (26)2,218,4740.73-0.970.881 (0.71-0.95)0.1391343,840.9 (<.001)100
Data type.30

Respiration data5 (13)217,6840.69-1.000.962 (0.53-1.0)0.996256,612.8 (<.001)100

HRd data2 (5)27,7350.82-0.900.864 (0.77-0.94)0.0078240.24 (<.001)100

Respiration data and HR data6 (16)104,4390.73-0.840.813 (0.77-0.85)0.0116852.8 (<.001)100

Respiration data and SpO2e10 (26)2,108,9970.76-0.970.896 (0.75-0.96)0.1341319,067.7 (<.001)100

Respiration data, HR data, and body movement12 (32)238,4940.60-0.880.787 (0.69-0.86)0.01969958.4 (<.001)100
Ground truth.47

Polysomnography17 (45)971,8960.69-0.970.877 (0.81-0.92)0.1433185,452.6 (<.001)100

WD9 (24)1,511,4290.76-1.000.949 (0.19-1.0)1.4613148,018.6 (<.001)100

Experiment context10 (26)218,9800.60-0.930.856 (0.45-0.97)0.230213,488.7 (<.001)100
Validation method.31

K-fold cross-validation12 (32)2,173,8140.69-0.970.835 (0.64-0.93)0.1643347,304 (<.001)100

Train-test split24 (63)528,4910.60-1.000.911 (0.82-0.96)0.3667106,748.9 (<.001)100
Overall accuracy36 (95)2,702,3050.60-1.00.893 (0.82-0.94)0.3130457,567.0 (<.001)100f

aMany studies were included >1 time in most meta-analyses, given that the studies assessed the performance of >1 algorithm.

bItalicized values are statistically significant (P<.05).

cWD: wearable device.

dHR: heart rate.

eSpO2: blood oxygen saturation.

fNot applicable.

Sensitivity

As shown in Table 5, meta-analyses were carried out on 22 estimates of sensitivity derived from 872,443 respiratory events across 15 (39%) of the 38 studies. The pooled mean sensitivity of these estimates was 0.793 (95% CI 0.67-0.87). The meta-analyzed evidence has considerable statistical heterogeneity (P<.001; I2=100%). With regard to subgroup analyses, there was no statistically significant difference in the pooled mean sensitivity between subgroups in all groups.

Table 5. Pooled mean estimates of sensitivity in detecting respiratory events by several factors.
GroupsStudies, na (%)Sample size, NSensitivity (%), rangePooled mean sensitivity (%; 95% CI)Heterogeneity measuresTest for subgroup difference, P value





Tau2Q (P value)I2 (%)
Algorithm.39

Convolutional neural network8 (21)107,2740.25-0.940.752 (0.56-0.90)0.086214,383.9 (<.001)100

Recurrent neural networks6 (16)279,3690.68-0.890.799 (0.72-0.86)0.01197460.8 (<.001)100

Random forest2 (5)141,6010.68-0.800.737 (0.61-0.84)0.00921628.9 (<.001)100

K-nearest neighbors2 (5)114,3680.30-0.690.499 (0.15-0.85)0.0794228.8 (<.01)100
Type of algorithm.80

Machine learning6 (16)370,3370.30-0.800.682 (0.50-0.80)0.06963872.9 (<.001)100

Deep learning16 (42)502,1060.25-0.980.819 (0.69-0.90)0.139742,230.7 (<.001)100
Sample size, n.41

<10010 (26)41,7610.25-0.980.813 (0.58-0.92)0.16764692.1 (<.001)100

100-2003 (8)57,0220.70-0.870.801 (0.70-0.89)0.0113658.0 (<.001)100

<2009 (24)773,6600.44-0.940.718 (0.38-0.89)0.080255,208.4 (<.001)100
Type of sleep apnea.99

All20 (53)862,0600.25-0.980.791 (0.67-0.87)0.100658,431.0 (<.001)100

Obstructive sleep apnea2 (5)10,3830.44-0.930.724 (0.18-1.0)0.17092217.2 (<.001)100
Status of the WDb.05

Commercial16 (42)848,0340.30-0.940.726 (0.61-0.81)0.021458,113.7 (<.001)100

Noncommercial6 (16)24,4090.25-0.980.830 (0.60-0.97)0.10511790.4 (<.001)100
Placement of the WD.36

Chest6 (16)58,1750.30-0.930.745 (0.45-0.89)0.00791621.1 (<.001)100

Chest and abdomen3 (8)55,7830.78-0.870.826 (0.77-0.87)0.0032285.5 (<.001)99

Wrist3 (8)18,5480.44-0.700.617 (0.44-0.78)0.02381316.3 (<.001)100
Data set size, n.63

<10,0004 (11)16570.30-0.940.796 (0.71-0.97)0.1338191.7 (<.001)99

10,000-50,0005 (13)13,1370.25-0.980.768 (0.48-0.96)0.12242218.8 (<.001)100

>50,00010 (26)800,6270.44-0.940.718 (0.48-0.86)0.052357,265.8 (<.001)100
Data type.41

Respiration data5 (13)35,7490.30-0.980.888 (0.37-0.98)0.35662046.3 (<.001)100

HRc data2 (5)79560.70-0.930.835 (0.56-0.99)0.0499600.2 (<.001)100

Respiration data and SpO2d10 (26)787,5420.68-0.940.814 (0.73-0.87)0.087652,794.7 (<.001)100

Respiration data, HR data, and body movement4 (11)40,4570.44-0.800.658 (0.50-0.80)0.02531688.2 (<.001)100
Ground truth.29

Polysomnography11 (29)95,8910.25-0.940.726 (0.53-0.85)0.23737131.5 (<.001)100

WD9 (24)742,1360.69-0.980.900 (0.55-0.98)0.108151,466.2 (<.001)100

Experiment context2 (5)34,4160.80-0.820.813 (0.79-0.83)0.00022.47 (.12)60
Validation method.36

K-fold cross-validation12 (32)795,9590.30-0.940.743 (0.59-0.85)0.025354,250.9 (<.001)100

Train-test split10 (26)76,4840.25-0.980.770 (0.61-0.90)0.07658020.8 (<.001)100
Overall sensitivity22 (58)872,4430.25-0.980.793 (0.67-0.87)0.119662,433.8 (<.001)100e

aMany studies were included >1 time in most meta-analyses, given that the studies assessed the performance of >1 algorithm.

bWD: wearable device.

cHR: heart rate.

dSpO2: blood oxygen saturation.

eNot applicable.

Specificity

Meta-analyses were performed to pool 22 estimates of specificity derived from 1,699,503 respiratory events across 15 (39%) of the 38 studies (Table 6). The pooled mean specificity of these estimates was 0.946 (95% CI 0.88-0.98). There was considerable statistical heterogeneity (P<.001; I2=100%) in the meta-analyzed studies. We also found a statistically significant difference in the pooled mean specificity between subgroups in the “status of wearable device” group (P=.01), while there was no statistically significant difference (P>.05) in the pooled mean specificity between subgroups in the rest of the groups.

Table 6. Pooled mean estimates of specificity in detecting respiratory events by several factors.
GroupsStudies, na (%)Sample size, NSpecificity (%), rangePooled mean specificity (%; 95% CI)Heterogeneity measuresTest for subgroup differences, P value





Tau2Q (P value)I² (%)
Algorithm.14

Convolutional neural network8 (21)298,3150.72-0.990.932 (0.87-0.98)0.02366463.7 (<.001)100

Recurrent neural networks6 (16)385,7220.76-0.950.870 (0.81-0.92)0.009011,722.7 (<.001)100

Random forest2 (5)647,8970.85-0.980.930 (0.74-1.00)0.037428,107.9 (<.001)100

K-nearest neighbors2 (5)116,0390.76-0.860.812 (0.71-0.90)0.0079127.1 (<.01)99
Type of algorithm.10

Machine learning6 (16)879,9750.70-0.980.910 (0.38-0.99)0.5700259,327.0 (<.001)100

Deep learning16 (42)819,5280.72-1.000.949 (0.87-0.98)0.6454158,624.8 (<.001)100
Sample size, n.94

<10010 (26)154,2020.70-1.000.951 (0.79-0.99)1.0272123,494.7 (<.001)100

100-2003 (8)219,5500.84-0.950.922 (0.85-0.97)0.01122860.8 (<.001)100

<2009 (24)1,325,7510.85-0.980.949 (0.81-0.99)0.2549340,888.9 (<.001)100
Type of sleep apnea.97

All20 (53)1,642,7690.70-1.000.947 (0.86-0.98)0.7459487,691.1 (<.001)100

Obstructive sleep apnea2 (5)56,7340.94-0.950.943 (0.94-0.95)0.01003.0 (.09)66
Status of the WDb.01c

Commercial16 (42)1,629,0320.70-0.980.887 (0.79-0.94)0.2257413,387.5 (<.001)100

Noncommercial6 (16)70,4710.85-1.000.969 (0.92-1.00)0.01812607.4 (<.001)100
Placement of the WD.97

Chest6 (16)634,9600.70-0.980.900 (0.65-0.97)0.3678157,806.3 (<.001)100

Chest and abdomen3 (8)206,1660.72-0.950.893 (0.73-0.99)0.03704312.8 (<.001)100

Wrist3 (8)94,8840.84-0.940.885 (0.82-0.94)0.00742106.2 (<.001)100
Data set size, n.64

<10,0004 (11)65110.70-0.990.923 (0.06-1.00)0.8131416.7 (<.001)100

10,000-50,0005 (13)55,5950.72-1.000.937 (0.81-1.00)0.04829402.4 (<.001)100

>50,00010 (26)1,417,8470.76-0.980.888 (0.84-0.93)0.103193,530.5 (<.001)100
Data type.37

Respiration data5 (13)181,9350.70-1.000.977 (0.61-1.00)1.162153,359.8 (<.001)100

HRd data2 (5)19,7790.86-0.950.908 (0.80-0.98)0.0121217.8 (<.001)99

Respiration data and SpO2e10 (26)1,321,4550.72-0.980.925 (0.72-0.98)0.3791348,529.0 (<.001)100

Respiration data, HR data, and body movement4 (11)172,1170.76-0.940.864 (0.67-0.95)0.01179381.8 (<.001)100
Ground truth.90

Polysomnography11 (29)771,5660.70-1.000.948 (0.86-0.98)0.5619202,709.3 (<.001)100

WD9 (24)769,2930.72-1.000.957 (0.11-1.00)1.8198103,183.0 (<.001)100

Experiment context2 (5)158,6440.85-0.950.908 (0.78-0.98)0.0158293.1 (<.001)100
Validation method.10

K-fold cross-validation12 (32)1,377,8550.70-0.980.866 (0.61-0.96)0.3674401,832.4 (<.001)100

Train-test split10 (26)321,6480.84-1.000.947 (0.90-0.98)0.01788482.6 (<.001)100
Overall specificity22 (58)1,699,5030.70-1.000.946 (0.88-0.98)0.6373487,706.6 (<.001)100f

aMany studies were included more than one time in all meta-analyses given that the studies assessed the performance of more than one algorithm.

bWD: wearable device.

cItalicized values are statistically significant (P<.05).

dHR: heart rate.

eSpO2: blood oxygen saturation.

fNot applicable.

Type of Apnea Events in Respiration

We conducted meta-analyses of 6 estimates of accuracy derived from 637,250 respiratory events across 6 (16%) of the 38 studies (Table 7). The pooled mean accuracy of these estimates was 0.815 (95% CI 0.64-0.94). The meta-analyzed studies exhibited considerable statistical heterogeneity (P<.001; I2=100%). In addition, there was a statistically significant difference in the pooled mean accuracy between subgroups in the “data type” group (P=.001), while no statistically significant difference (P>.05) was found in the pooled mean accuracy between subgroups in the remaining groups.

Table 7. Pooled mean estimates of accuracy in detecting the type of respiratory events by several factors.
GroupsStudies, n (%)Sample size, NAccuracy (%), rangePooled mean accuracy (%; 95% CI)Heterogeneity measuresTest for subgroup differences, P value





Tau2Q (P value)I² (%)
Algorithm.76

Convolutional neural network4 (11)459,1630.40-0.970.829 (0.55-0.99)0.097326,906.1 (<.001)100

Long short-term memory2 (5)178,0870.73-0.840.788 (0.68-0.88)0.00732299.2 (<.001)100
Sample size, n.27

≤1003 (8)309,0330.74-0.970.892 (0.73-0.98)0.032821,243.3 (<.001)100

>1003 (8)328,2170.40-0.880.724 (0.41-0.95)0.082419,923.0 (<.001)100
Status of the WDa.25

Commercial4 (11)578,0760.40-0.930.759 (0.51-0.94)0.073541,330.6 (<.001)100

Noncommercial2 (5)59,1740.84-0.970.909 (0.74-0.99)0.024930.9 (<.01)97
Data type.001b

Respiration data2 (5)189,9700.93-0.970.944 (0.91-0.97)000223.7 (.05)73

Respiration data and SpO2c2 (5)308,8100.84-0.880.857 (0.81-0.90)0.0018697.4 (<.01)100

Respiration data, HRd data, and body movement2 (5)138,4700.40-0.740.574 (0.25-0.87)0.05907875.2 (<.001)100
Ground truth.19

Polysomnography4 (11)197,6440.40-0.970.762 (0.49-0.95)0.082212,665.3 (<.001)100

Nonpolysomnography2 (5)439,6060.88-0.930.905 (0.85-0.95)0.00393353.7 (<.001)100
Validation method.97

K-fold cross-validation2 (5)368,8490.74-0.880.812 (0.66-0.93)0.016410,579.9 (<.001)100

Train-test split4 (11)268,4010.40-0.970.818 (0.54-0.98)0.095428,010.1 (<.001)100
Overall accuracy6 (16)637,2500.40-0.970.815 (0.64-0.94)0.060341,608.1 (<.001)100e

aWD: wearable device.

bItalicized values are statistically significant (P<.05).

cSpO2: blood oxygen saturation.

dHR: heart rate.

eNot applicable.

Patients With Sleep Apnea
Accuracy

We carried out meta-analyses of 13 estimates of accuracy derived from 2015 participants across 13 (34%) of the 38 studies (Table 8). The pooled mean accuracy of these estimates was 0.869 (95% CI 0.81-0.92). The meta-analyzed estimates showed considerable statistical heterogeneity (P<.001; I2=100%). Further, there was a statistically significant difference in the pooled mean accuracy between subgroups in the “type of sleep apnea” group (P=.049). However, no statistically significant difference (P>.05) was found in the pooled mean accuracy between subgroups in the remaining groups.

Table 8. Pooled mean estimates of accuracy in detecting sleep apnea by several factors.
GroupsStudies, n (%)Sample size, NAccuracy (%), rangePooled mean accuracy (%; 95% CI)Heterogeneity measuresTest for subgroup differences, P value





Tau2Q (P value)I² (%)
Type of algorithm.20

Machine learning3 (8)11410.88-0.920.896 (0.87-0.92)0.00032.2 (.34)7

Deep learning9 (24)6780.71-1.000.849 (0.76-0.92)0.022655.4 (<.01)86
Sample size, n.20

≤1008 (21)4960.71-0.960.838 (0.77-0.90)0.009523.3 (<.01)70

>1005 (13)15190.75-1.000.905 (0.81-0.97)0.022445.5 (<.01)91
Type of sleep apnea.049a

All6 (16)6710.78-1.000.920 (0.84-0.97)0.016324.4 (<.01)80

Obstructive sleep apnea7 (18)13440.71-0.910.823 (0.76-0.88)0.009140.5 (<.01)85
Status of the WDb.18

Commercial9 (24)16320.71-0.960.841 (0.78-0.89)0.009849.5 (<.01)84

Noncommercial4 (11)3830.78-1.000.923 (0.81-0.99)0.025222.4 (<.01)87
Placement of the WD.17

Wrist4 (11)9820.74-0.960.840 (0.73-0.93)0.016528.4 (<.01)98

Finger3 (8)2120.71-0.860.805 (0.70-0.89)0.00705.99 (.05)67

Chest and abdomen3 (8)1470.86-1.000.949 (0.83-1.00)0.023010.6 (<.01)81
Data type.09

Respiration data and SpO2c4 (11)5560.86-1.000.938 (0.86-0.99)0.013913.4 (<.01)78

Respiration data, HRd data, and body movement4 (11)4080.71-0.910.840 (0.74-0.92)0.011117.1 (<.01)82
Ground truth.12

Polysomnography11 (29)19080.71-1.000.878 (0.82-0.93)0.016471.1 (<.01)86

WD2 (5)1070.74-0.860.789 (0.67-0.89)0.00361.6 (.20)38
Validation method.91

K-fold cross-validation4 (11)11770.78-0.910.873 (0.82-0.92)0.00358.2 (.04)63

Train-test split7 (18)6980.71-1.000.880 (0.78-0.95)0.026759.2 (<.01)90

Leave-one-out cross-validation2 (5)1400.74-0.920.839 (0.64-0.97)0.02407.7 (<.01)87
Overall accuracy13 (34)20150.71-1.000.869 (0.81-0.92)0.015680.3 (<.001)100e

aItalicized values are statistically significant (P<.05).

bWD: wearable device.

cSpO2: blood oxygen saturation.

dHR: heart rate.

eNot applicable.

Sensitivity

As shown in Table 9, meta-analyses were carried out on 13 estimates of sensitivity derived from 1580 participants across 13 (34%) of the 38 studies. The pooled mean sensitivity of these estimates was 0.938 (95% CI 0.89-0.97). The meta-analyzed evidence has considerable statistical heterogeneity (P<.001; I2=82%). With regards to subgroup analyses, there was no statistically significant difference in the pooled mean sensitivity between subgroups in all groups except for the “placement of wearable device” group (P<.001).

Table 9. Pooled mean estimates of sensitivity in detecting sleep apnea by several factors.
GroupsStudies, n (%)Sample size, NSensitivity (%), rangePooled mean sensitivity (%; 95% CI)Heterogeneity measuresTest for subgroup differences, P value





Tau2Q (P value)I² (%)
Type of algorithm.78

Machine learning3 (8)9210.89-0.980.926 (0.88-0.98)0.00386.9 (.03)71

Deep learning9 (24)4850.77-1.000.942 (0.87-0.99)0.025659.0 (<.01)86
Sample size, n.50

≤1008 (21)3630.77-1.000.953 (0.90-0.99)0.015428.0 (<.01)75

>1005 (13)12170.77-1.000.917 (0.83-0.97)0.019633.9 (<.01)88
Type of sleep apnea.06

All6 (16)4560.90-1.000.959 (0.93-1.00)0.005913.0 (.02)62

Obstructive sleep apnea7 (18)11240.77-1.000.903 (0.83-0.96)0.017937.8 (<.01)84
Status of the WDa.06

Commercial9 (24)12540.77-1.000.916 (0.85-0.96)0.017043.2 (<.01)81

Noncommercial4 (11)3260.93-1.000.974 (0.93-1.00)0.00487.5 (.06)60
Placement of the WD<.001b

Wrist4 (11)8330.77-0.900.837 (0.76-0.91)0.007316.1 (<.01)81

Finger3 (8)1480.92-1.000.966 (0.90-1.00)0.00005.3 (.07)62

Chest and abdomen3 (8)1300.98-1.000.997 (0.97-1.00)0.00831.1 (.59)0
Data type.39

Respiration data and SpO2c4 (11)3780.92-1.000.980 (0.93-1.00)0.008011.0 (.01)73

Respiration data, HRd data, and body movement4 (11)3220.92-0.950.954 (0.91-0.99)0.00406.2 (.10)52
Ground truth.80

Polysomnography11 (29)14950.77-1.000.941 (0.89-0.97)0.012652.6 (<.01)81

WD2 (5)850.77-1.000.917 (0.57-1.00)0.077312.1 (<.01)92
Validation method.89

K-fold cross-validation4 (11)9410.89-1.000.944 (0.89-0.98)0.006310.7 (.01)72

Train-test split7 (18)5270.77-1.000.941 (0.87-0.99)0.019941.9 (<.01)86

Leave-one-out cross-validation2 (5)1120.77-0.980.896 (0.61-1.00)0.054213.2 (<.01)92
Overall sensitivity13 (34)15800.77-1.000.938 (0.89-0.97)0.016267.0 (<.001)82e

aWD: wearable device.

bItalicized values are statistically significant (P<.05).

cSpO2: blood oxygen saturation.

dHR: heart rate.

eNot applicable.

Specificity

Meta-analyses were performed to pool 13 estimates of specificity derived from 436 participants across 13 (34%) of the 38 studies (Table 10). The pooled mean specificity of these estimates was 0.752 (95% CI 0.63-0.86). There was considerable statistical heterogeneity (P<.001; I2=78%) in the meta-analyzed studies. Our subgroup meta-analyses showed that there was no statistically significant difference in the pooled mean specificity between subgroups in all groups.

Table 10. Pooled mean estimates of specificity in detecting sleep apnea by several factors.
GroupsStudies, n (%)Sample size, NSpecificity (%), rangePooled mean specificity (%; 95% CI)Heterogeneity measuresTest for subgroup differences, P value





Tau2Q (P value)I² (%)
Type of algorithm.53

Machine learning3 (8)2200.60-0.880.796 (0.63-0.92)0.01687.9 (.02)75

Deep learning9 (24)1940.29-1.000.735 (0.55-0.89)0.057537.1 (<.01)78
Sample size, n.21

≤1008 (21)1330.29-1.000.690 (0.48-0.87)0.061535.8 (<.01)80

>1005 (13)3030.72-1.000.818 (0.72-0.90)0.007711.6 (.02)65
Type of sleep apnea.62

All6 (16)2200.29-0.890.810 (0.55-0.99)0.078034.6 (<.01)86

Obstructive sleep apnea7 (18)2160.36-1.000.730 (0.64-0.81)0.006812.3 (.06)51
Status of the WDa.43

Commercial9 (24)3780.29-1.000.784 (0.66-0.89)0.029532.9 (<.01)76

Noncommercial4 (11)580.36-1.000.672 (0.39-0.91)0.049711.3 (.01)73
Placement of the WD.71

Wrist4 (11)1490.72-1.000.803 (0.62-0.94)0.030910.9 (.01)73

Finger3 (8)640.29-0.880.658 (0.29-0.89)0.072711.5 (<.01)83

Chest and abdomen3 (8)180.60-1.000.777 (0.47-0.99)0.01572.7 (.26)26
Data type.35

Respiration data and SpO2b4 (11)1790.60-1.000.855 (0.66-0.99)0.01865.6 (.13)47

Respiration data, HRc data, and body movement4 (11)860.29-0.890.700 (0.46-0.90)0.041111.5 (<.01)74
Ground truth.53

Polysomnography11 (29)4140.29-1.000.756 (0.62-0.88)0.044452.8 (<.01)81

WD2 (5)220.67-0.750.691 (0.46-0.89)0.00000.04 (.85)0
Validation method.48

K-fold cross-validation4 (11)2360.36-0.880.713 (0.44-0.93)0.054726.9 (<.01)89

Train-test split7 (18)1720.29-1.000.793 (0.72-0.86)0.045622.9 (<.01)74

Leave-one-out cross-validation2 (5)280.60-0.670.644 (0.45-0.82)0.00000.13 (.72)0
Overall specificity13 (34)4360.29-1.000.752 (0.63-0.86)0.036654.5 (<.001)78d

aWD: wearable device.

bSpO2: blood oxygen saturation.

cHR: heart rate.

dNot applicable.

Severity of Sleep Apnea
Accuracy

We performed meta-analyses of 9 estimates of accuracy derived from 1661 participants across 9 (24%) of the 38 studies (Table 11). The pooled mean accuracy of these estimates was 0.651 (95% CI 0.54-0.75). The meta-analyzed studies exhibited considerable statistical heterogeneity (P<.001; I2=93%). In addition, there was a statistically significant difference in the pooled mean accuracy between subgroups in “type of sleep apnea” group (P=.03) and “data type” group (P=.01), while no statistically significant difference (P>.05) was found in the pooled mean accuracy between subgroups in the remaining groups.

Table 11. Pooled mean estimates of accuracy in detecting the severity of sleep apnea by several factors.
GroupsStudies, n (%)Sample size, NAccuracy (%), rangePooled mean accuracy (%; 95% CI)Heterogeneity measuresTest for subgroup difference, P value





Tau2Q (P value)I² (%)
Type of algorithm.28

Machine learning3 (8)11410.63-0.800.716 (0.60-0.82)0.010540.6 (<.01)95

Deep learning6 (16)5200.36-0.890.615 (0.46-0.76)0.031847.8 (<.01)90
Sample size, n.28

≤1004 (11)2740.36-0.710.584 (0.43-0.74)0.021922.4 (<.01)87

>1005 (13)13870.55-0.890.698 (0.56-0.82)0.025272.9 (<.01)95
Type of sleep apnea.03a

All4 (11)5910.58-0.890.757 (0.62-0.87)0.019620.3 (<.01)85

Obstructive sleep apnea5 (13)10700.36-0.670.564 (0.46-0.66)0.011123.3 (<.01)83
Status of the WDb.08

Commercial7 (18)15430.36-0.800.606 (0.50-0.71)0.021987.9 (<.01)93

Noncommercial2 (5)1180.71-0.890.809 (0.60-0.95)0.01816.2 (.01)84
Placement of the WD.11

Wrist3 (8)9220.55-0.630.596 (0.54-0.65)0.00113.5 (.18)42

Finger3 (8)2120.36-0.670.542 (0.35-0.72)0.023516.5 (<.01)88

Chest and abdomen2 (5)1180.71-0.890.809 (0.60-0.95)0.02196.2 (.01)84
Data type.01

Respiration data and SpO2c3 (8)5270.71-0.890.807 (0.71-0.89)0.00736.2 (.04)68

Body movement, HRd data, and SpO23 (8)2120.36-0.670.542 (0.35-0.72)0.023516.5 (<.01)88
Validation method.37

K-fold cross-validation2 (5)10790.63-0.800.719 (0.53-0.87)0.019540.6 (<.01)98

Train-test split6 (16)5200.36-0.890.615 (0.46-0.76)0.031847.8 (<.01)90
Overall accuracy9 (24)16610.36-0.890.651 (0.54-0.75)0.0243106.1 (<.001)93e

aItalicized values are statistically significant (P<.05).

bWD: wearable device.

cSpO2: blood oxygen saturation.

dHR: heart rate.

eNot applicable.

Correlation Coefficient

As shown in Table 12, meta-analyses were carried out on 12 estimates of correlation coefficient (r) derived from 1266 participants across 12 (32%) of the 38 studies. The pooled mean r of these estimates was 0.877 (95% CI 0.82-0.92). The meta-analyzed evidence has considerable statistical heterogeneity (P<.001; I2=82%). With regard to subgroup analyses, there was a statistically significant difference in the pooled mean r between subgroups in the “placement of wearable device” group (P<.001) and the “data type” group (P<.001). However, no statistically significant difference (P>.05) was found in the pooled mean r between subgroups in the remaining groups.

Table 12. Pooled mean estimates of correlation coefficient (r) in detecting the severity of sleep apnea by several factors.
GroupsStudies, n (%)Sample size, NCorrelation coefficient (%), rangePooled mean correlation coefficient (%; 95% CI)Heterogeneity measuresTest for subgroup differences, P value





Tau2Q (P value)I² (%)
Type of algorithm.12

Machine learning3 (8)5260.90 to 0.960.922 (0.66 to0.98)0.098725.6 (<.01)92

Deep learning9 (24)7400.64 to 0.910.856 (0.79 to0.90)0.055253.1 (<.01)85
Sample size, n.79

≤1009 (24)5410.64 to 0.900.879 (0.85 to0.90)0.000112.7 (.12)37

>1003 (8)7250.71 to 0.960.896 (0.12 to0.99)0.2849173.0 (<.01)99
Type of sleep apnea.54

All8 (21)8020.64 to 0.960.886 (0.81 to0.93)0.091880.2 (<.01)91

Obstructive sleep apnea4 (11)4640.71 to 0.900.859 (0.68 to0.94)0.75934.7 (<.01)91
Status of the WDa.35

Commercial10 (26)11770.64 to 0.960.881 (0.82 to 0.92)0.0977191.1 (<.01)95

Noncommercial2 (5)890.84 to 0.860.856 (0.72 to 0.93)0.00000.1 (.79)0
Placement of the WD<.001b

Wrist2 (5)3160.71 to 0.910.833 (–0.99 to 1.00)0.194820.1 (<.01)95

Chest2 (5)4520.87 to 0.960.929 (–0.98 to 1.00)0.176516.7 (<.01)94

Finger3 (8)2120.89 to 0.900.894 (0.88 to0.91)0.00000.1 (.93)0

Neck2 (5)890.84 to 0.860.856 (0.72 to0.93)0.00000.1 (.79)0

Abdomen2 (5)1580.89 to 0.900.894 (0.81 to 0.94)0.00000.1 (.76)0
Data type<.001

Respiration data2 (5)890.84 to 0.860.856 (0.72 to 0.93)0.00000.1 (.79)0

Respiration data and SpO2c2 (5)4380.64 to 0.960.878 (–1.00 to 1.00)0.684934.5 (<.01)97

Respiration data and HRd data2 (5)1580.89 to 0.900.894 (0.81 to 0.94)0.00000.1 (.76)0

Respiration data, HR data, and body movement3 (8)3690.71 to 0.910.844 (0.38 to 0.97)0.103024.5 (<.01)92

Body movement, HR data, and SpO23 (8)2120.89 to 0.900.894 (0.88 to 0.91)0.00000.1 (.93)0
Validation method.90

K-fold cross-validation4 (11)5270.64 to 0.960.869 (0.49-0.97)0.231559.3 (<.01)95

Train-test split7 (18)4500.71 to 0.910.877 (0.82-0.92)0.048650.8 (<.01)88
Overall accuracy12 (32)12660.64 to 0.960.877 (0.82-0.92)0.0828194.5 (<.001)94e

aWD: wearable device.

bItalicized values are statistically significant (P<.05).

cSpO2: blood oxygen saturation.

dHR: heart rate.

eNot applicable.


Principal Findings

This systematic review investigated how well wearable AI performs in detecting sleep apnea. Overall, the findings indicate that wearable AI demonstrated a performance that is deemed acceptable, although not optimal, for detecting sleep apnea. Specifically, wearable AI was able to correctly classify apnea events and nonapnea events in 89.3% of respiratory events. This performance was notably higher when using CNN in particular or deep learning algorithms in general. The superiority of CNN architectures can be attributed to their ability to capture the localized dependencies inherent in apnea patterns through convolution kernels. The meta-analyses conducted in this review revealed that wearable AI performed better in detecting nonapnea respiratory events (94.6%) compared to apnea respiratory events (79.3%). This could be linked to the training of AI models using an unrepresentative sample, wherein the number of nonapnea respiratory events (n=1,699,503) was approximately twice as high as the number of apnea respiratory events (n=872,443). This highlights the challenge of applying data balancing techniques for heterogeneous and time-dependent measurements, particularly evident in longitudinal recordings as observed in apnea studies.

Although the sensitivity of wearable AI in detecting apnea events in respiration remained unaffected by any moderating factors, the specificity was influenced by the status of the wearable device, where noncommercial devices exhibited higher specificity than commercial devices. This can be because all studies that used noncommercial wearable devices applied deep learning algorithms, whereas more than one-third (6/16, 38%) of studies that used commercial wearable devices applied ML algorithms (eg, random forest, AdaBoost, and k-nearest neighbors). Introducing scalable AI models, such as deep learning models, into commercial apnea detection applications presents challenges due to their computational expense and resource requirements, thereby complicating market penetration and impacting profit margins. However, recent advancements in tiny ML models and edge AI implementations offer potential solutions to mitigate these challenges. This review also demonstrated that wearable AI was able to correctly differentiate between different types of apnea events (eg, apnea, hypopnea, obstructive apnea, and central apnea) in 81.5% of respiratory events, and this performance was not influenced by any moderators. This can be attributed to the lack of studies (≤4) in all subgroup analyses related to this outcome.

In this review, wearable AI demonstrated 86.9% accuracy in correctly identifying patients with and patients without sleep apnea. This performance was notably higher when the wearable AI was used for detecting sleep apnea in general (92%) rather than OSA in particular (82.3%). This difference may be attributed to the fact that approximately 83% (5/6) of studies focusing on general sleep apnea detection used respiration data to develop the AI models. By contrast, only 29% (2/7) of studies concentrating on OSA detection incorporated respiration data. Given that respiration data are widely acknowledged as the most crucial indicator of sleep apnea, this disparity in use may explain the varying performance levels observed in the review.

Unlike apnea event detection, wearable AI exhibited superior performance in identifying patients with sleep apnea (93.8%) compared to those without sleep apnea (75.2%). This could be associated with the training of AI models using an unrepresentative sample, wherein the number of patients with sleep apnea (1580) was >3 times higher than the number of patients without sleep apnea (436). The specificity of wearable AI in detecting sleep apnea was not affected by any moderator, while its sensitivity was higher when wearable devices were placed on both the chest and abdomen in comparison with other placements (wrist or fingers). This moderation effect could be attributed to the fact that all studies that placed wearable devices on both the chest and abdomen focused on detecting sleep apnea in general, while 6 (86%) out of 7 studies that placed wearable devices in other places focused on detecting OSA in particular. Further, all studies that placed wearable devices on other body parts used commercial wearable devices, whereas only 1 of the studies that placed wearable devices on both the chest and abdomen used commercial wearable devices.

Our meta-analyses also revealed that wearable AI accurately differentiated between various levels of sleep apnea severity (normal, mild, moderate, and severe) in 65.1% of cases. This performance was higher when the wearable AI was used for detecting the severity of sleep apnea in general rather than OSA in particular. This could be linked to the fact that all studies that aimed to detect OSA used commercial devices that were placed on either fingers or wrists, while two-thirds of the studies that focused on sleep apnea in general used noncommercial devices that were placed on both the abdomen and chest. This performance was also higher when the model was developed using both respiration and oxygen saturation data in comparison with using a combination of body movement, HR, and oxygen saturation data. This could be associated with the fact that all studies using the combination of body movement, HR, and oxygen saturation data focused on the detection of OSA using commercial devices placed on fingers, while all studies using both respiration and oxygen saturation data focused on detecting any sleep apnea type using noncommercial devices (in 2, 67% out 3 studies) placed on the abdomen and chest.

Finally, the accuracy of wearable AI in estimating the severity of sleep apnea (ie, the apnea-hypopnea index score) reached 87.7%. This accuracy was higher when the wearable device was placed on the chest and when using both respiration and HR data or a combination of HR, oxygen saturation, and body movement.

Research and Practical Implications

Our analysis revealed that wearable AI shows promise in identifying sleep apnea, distinguishing its type, and gauging its severity; however, it is not yet ready for widespread use in clinical practices for 3 reasons. First, its current performance falls below the optimal level. Second, only 9 (24%) of the 38 studies were judged to have a low risk of bias in all domains. Third, heterogeneity between studies was considerable in most meta-analyses. Therefore, we cannot propose the use of wearable AI as a replacement for traditional sleep assessments (eg, polysomnography and home sleep apnea testing), but we recommend that wearable AI be used in conjunction with these assessments, taking into account factors such as cost-effectiveness and practical challenges in real-world implementation.

Among all wearable devices used in the included studies, only 1 was specifically designed for diagnosing sleep and monitoring sleep health and obtained clearance from the US Food and Drug Administration. This may be due to a shortage of such wearable devices in the market or a scarcity of studies evaluating them. We urge manufacturers of wearable devices to extend their focus beyond evaluating sleep quality and incorporate AI into their devices for identifying sleep apnea, its various types, and its severity. Further, researchers should pay more attention to such wearable devices in their future studies. The main challenge of conducting such studies is the cost of such wearable devices.

Our meta-analyses indicate that the performance of wearable AI was notably higher when using CNN in particular or deep learning algorithms in general. Therefore, we recommend that manufacturers of wearable devices and researchers prioritize these techniques during the development of devices intended for the detection of sleep apnea. However, obtaining large, high-quality, and standardized data sets for training and validating CNN or deep learning models can be challenging.

Our meta-analysis suggests the need for implementing AI on the edge through specially crafted tiny ML modules with federated learning protocols. Such an approach not only enhances performance metrics but also addresses critical considerations regarding resource efficiency, latency reduction, and privacy preservation inherent in commercial apnea detection systems. However, implementing AI on wearable devices, especially with tiny ML modules, poses challenges related to hardware constraints, such as limited processing power, limited memory, and high energy consumption. Further, ensuring that AI algorithms can run efficiently on resource-constrained devices without compromising performance is a significant challenge. Implementing federated learning protocols for edge devices introduces additional complexities related to communication, synchronization, and security. Designing robust federated learning frameworks that can effectively train AI models across distributed devices while preserving data privacy and security requires careful consideration and expertise.

Most studies included in this review focused on the application of wearable AI for the detection of existing sleep apnea, its type, or its severity, rather than the anticipation of its occurrence. Foreseeing the onset of sleep apnea in the future is as pivotal as, if not more pivotal than, recognizing the current sleep apnea state, as it can pave the way for the development and implementation of proactive interventions. Consequently, we encourage researchers to undertake additional investigations into the capacity of wearable AI to predict future instances of sleep apnea. Such studies collect longitudinal data over an extended period to train and validate predictive models accurately. However, obtaining continuous and comprehensive sleep data from individuals over time can be challenging due to factors such as participant compliance, dropout rates, and the need for long-term monitoring.

In this review, only a single study evaluated the effectiveness of wearable AI in identifying CSA. In addition, only 7 (18%) of the 38 studies investigated the capability of wearable AI to differentiate between different types of sleep apnea. More research is urgently needed to evaluate the performance of wearable AI in these crucial areas. Our study also suggests that more open-source data sets with prepared manual labels for different types of sleep apnea are needed. Collecting large-scale, comprehensive, and well-annotated data from individuals with CSA poses challenges due to the rarity of CSA cases. Furthermore, identifying informative features and physiological signals from wearable devices that can distinguish between different types of sleep apnea is challenging due to the overlap in the clinical presentation and physiological characteristics of different types of sleep apnea, particularly in cases of mixed sleep apnea where both obstructive and central events occur concurrently.

Merely 3 (8%) of the 38 included studies used self-reported data and nonwearable device data alongside wearable device data for the detection of sleep apnea. The inclusion of self-reported data (eg, data regarding demographics, BMI, medical history, family history, and medications) and nonwearable device data (eg, data collected via mobile phones, smart pillows, smart mattresses, voice recorders, and Internet of Things) has the potential to enhance the efficacy of wearable AI in identifying sleep apnea. Hence, manufacturers and researchers are encouraged to take these types of data into consideration, alongside wearable device data, when developing wearable AI for the diagnosis of sleep apnea. However, challenges arise in transferring nonwearable data to wearable devices and the potential impact on the performance of wearable devices in terms of processing speed, memory use, energy consumption, synchronization, and security.

A few studies in this review compared the performance of wearable devices worn on different parts of the body (eg, wrist, abdomen, and chest) and developed wearable AI for not only detecting but also intervening in sleep apnea. This points to a crucial gap in research, which urges further investigation into the different performances of wearable AI with different placements and integrated treatment delivery via wearable AI for sleep apnea management.

Among the 38 studies in our review, 11 (29%) were excluded from the meta-analyses due to insufficient details crucial for their conduct (eg, confusion matrices and the number of apnea and nonapnea cases). They also did not provide multiple performance measures (eg, accuracy, sensitivity, and specificity), which are essential for estimating the necessary information. It is recommended that researchers include these specific details in their reports to facilitate the conduct of meta-analyses by other researchers. However, we acknowledge that the space constraints imposed by journals and conference proceedings may present a challenge for researchers seeking to include more comprehensive details in their reports.

Limitations

Our review intentionally excluded studies involving (1) nonwearable devices, near-body wearable devices, in-body wearable devices, wearable devices wired to nonwearable devices, and wearable devices requiring an expert for their placement on users; (2) wearable AI in detecting other sleep disorders (eg, insomnia, narcolepsy, and restless legs syndrome); and (3) wearable AI in predicting outcomes of sleep apnea interventions or detecting sleep quality or sleep stages. Therefore, our findings are specifically applicable to wearable AI for sleep apnea detection and may not be generalizable to the excluded devices, disorders, or outcomes. Our findings are based on studies conducted in only 16 countries. Further, while most studies were carried out in hospitals, only 4 (11%) of the 38 studies were conducted in health care centers. Therefore, extrapolating our results to broader populations and clinical settings requires caution. This limitation acknowledges the need for further reviews in these broader areas.

Another limitation of this review is the likelihood of an overestimation or underestimation of the results of our meta-analyses due to 2 reasons. First, some relevant studies could have been overlooked, as our search was confined to English-language publications, and we did not explore other widely used databases, such as CINAHL and Web of Science. Secondly, 11 of the 38 studies in this review were excluded from the meta-analyses, as they did not report details required for meta-analyses.

Conclusions

Our review underscores the potential of wearable AI in identifying sleep apnea, differentiating its type, and gauging its severity. However, wearable AI is not yet ready for integration into routine clinical practices due to its suboptimal performance. Therefore, until further evidence demonstrates an ideal performance, we suggest the concurrent use of wearable AI with traditional sleep apnea assessments (eg, polysomnography and home sleep apnea testing), rather than a complete substitution. Manufacturers need to develop certified commercial wearable devices that can easily detect sleep apnea, predict its occurrence, and deliver proactive interventions. CNN in particular or deep learning algorithms in general should be prioritized during the development of wearable AI for the detection of sleep apnea. Further studies are needed to assess the performance of wearable AI in detecting CSA and distinguishing it from other types of sleep apnea. Researchers should consider incorporating self-reported and nonwearable device data alongside wearable data to enhance the efficacy of wearable AI in detecting sleep apnea. Additional research is required to evaluate the varying performance of wearable devices with different placements. Researchers should also report sufficient details about their findings to enable other researchers to conduct meta-analyses effectively.

Data Availability

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Authors' Contributions

A Abd-alrazaq, A Ahmed, RA, and SA developed the protocol with guidance from and under the supervision of JS. A Abd-alrazaq searched the electronic databases and conducted backward and forward reference list checking. The study selection process and data extraction were carried out by HA and A Abd-alrazaq. A risk-of-bias assessment was conducted by HA and RA. Data synthesis was conducted by A Abd-alrazaq. The Introduction and Methods sections were written by MA and A Abd-alrazaq. The Results section was written by A Abd-alrazaq and RA. The Discussion and Conclusions sections were written by A Abd-alrazaq and RD. All authors critically revised the manuscript for important intellectual content, approved the manuscript for publication, and agreed to be accountable for all aspects of the work.

Conflicts of Interest

None declared.

Multimedia Appendix 1

PRISMA-DTA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Diagnostic Test Accuracy) checklist.

DOC File , 66 KB

Multimedia Appendix 2

Search strategy.

DOCX File , 37 KB

Multimedia Appendix 3

Data extraction form.

DOCX File , 26 KB

Multimedia Appendix 4

Modified version of Quality Assessment of Studies of Diagnostic Accuracy-Revised.

DOCX File , 27 KB

Multimedia Appendix 5

Characteristics of each included study.

DOCX File , 57 KB

Multimedia Appendix 6

Features of wearable devices.

DOCX File , 59 KB

Multimedia Appendix 7

Features of artificial intelligence algorithms.

DOCX File , 76 KB

Multimedia Appendix 8

Reviewers’ judgments about each “risk of bias” and applicability domain for each included study.

DOCX File , 72 KB

  1. Salari N, Hosseinian-Far A, Mohammadi M, Ghasemi H, Khazaie H, Daneshkhah A, et al. Detection of sleep apnea using machine learning algorithms based on ECG signals: a comprehensive systematic review. Expert Syst Appl. Jan 2022;187:115950. [CrossRef]
  2. Singh N, Talwekar RH. "Comparison of machine learning and deep learning classifier to detect sleep apnea using single-channel ECG and HRV: a systematic literature review". J Phys Conf Ser. May 01, 2022;2273(1):012015. [CrossRef]
  3. Ferreira-Santos DA, Amorim P, Silva Martins T, Monteiro-Soares M, Pereira Rodrigues P. Enabling early obstructive sleep apnea diagnosis with machine learning: systematic review. J Med Internet Res. Sep 30, 2022;24(9):e39452. [FREE Full text] [CrossRef] [Medline]
  4. Sateia MJ. International classification of sleep disorders-third edition: highlights and modifications. Chest. Nov 2014;146(5):1387-1394. [CrossRef] [Medline]
  5. Benjafield AV, Ayas NT, Eastwood PR, Heinzer R, Ip MS, Morrell MJ, et al. Estimation of the global prevalence and burden of obstructive sleep apnoea: a literature-based analysis. Lancet Respir Med. Aug 2019;7(8):687-698. [CrossRef]
  6. Senaratna CV, Perret JL, Lodge CJ, Lowe AJ, Campbell BE, Matheson MC, et al. Prevalence of obstructive sleep apnea in the general population: a systematic review. Sleep Med Rev. Aug 2017;34:70-81. [CrossRef] [Medline]
  7. Berg S. What doctors wish patients knew about sleep apnea. American Medical Association. URL: https:/​/www.​ama-assn.org/​delivering-care/​public-health/​what-doctors-wish-patients-knew-about-sleep-apnea [accessed 2024-04-29]
  8. Young T, Evans L, Finn L, Palta M. Estimation of the clinically diagnosed proportion of sleep apnea syndrome in middle-aged men and women. Sleep. Sep 1997;20(9):705-706. [CrossRef] [Medline]
  9. Braley TJ, Dunietz GL, Chervin RD, Lisabeth LD, Skolarus LE, Burke JF. Recognition and diagnosis of obstructive sleep apnea in older Americans. J Am Geriatr Soc. Jul 09, 2018;66(7):1296-1302. [FREE Full text] [CrossRef] [Medline]
  10. Kapur V, Strohl KP, Redline S, Iber C, O'Connor G, Nieto J. Underdiagnosis of sleep apnea syndrome in U.S. communities. Sleep Breath. 2002;06(2):49-54. [CrossRef]
  11. Lu MK, Tan H, Tsai I, Huang L, Liao X, Lin S. Sleep apnea is associated with an increased risk of mood disorders: a population-based cohort study. Sleep Breath. May 5, 2017;21(2):243-253. [CrossRef] [Medline]
  12. Quan SF, Budhiraja R, Batool-Anwar S, Gottlieb D, Eichling P, Patel S, et al. Lack of impact of mild obstructive sleep apnea on sleepiness, mood and quality of life. Southwest J Pulm Crit Care. Jul 25, 2014;9(1):44-56. [FREE Full text] [CrossRef] [Medline]
  13. Lang CJ, Appleton SL, Vakulin A, McEvoy RD, Vincent AD, Wittert GA, et al. Associations of undiagnosed obstructive sleep apnea and excessive daytime sleepiness with depression: an Australian population study. J Clin Sleep Med. Apr 15, 2017;13(4):575-582. [FREE Full text] [CrossRef] [Medline]
  14. Marin JM, Agusti A, Villar I, Forner M, Nieto D, Carrizo SJ, et al. Association between treated and untreated obstructive sleep apnea and risk of hypertension. JAMA. May 23, 2012;307(20):2169-2176. [FREE Full text] [CrossRef] [Medline]
  15. Redline S. Obstructive sleep apnea–hypopnea and incident stroke: the sleep heart health study. Am J Respir Crit Care Med. Nov 15, 2010;182(10):1332-1333. [CrossRef]
  16. Marchi NA, Solelhac G, Berger M, Haba-Rubio J, Gosselin N, Vollenweider P, et al. Obstructive sleep apnoea and 5-year cognitive decline in the elderly. Eur Respir J. Apr 16, 2023;61(4):2201621. [FREE Full text] [CrossRef] [Medline]
  17. Yaffe K, Laffan AM, Harrison SL, Redline S, Spira AP, Ensrud KE, et al. Sleep-disordered breathing, hypoxia, and risk of mild cognitive impairment and dementia in older women. JAMA. Aug 10, 2011;306(6):613-619. [FREE Full text] [CrossRef] [Medline]
  18. Gottlieb DJ, Ellenbogen JM, Bianchi MT, Czeisler CA. Sleep deficiency and motor vehicle crash risk in the general population: a prospective cohort study. BMC Med. Mar 20, 2018;16(1):44. [FREE Full text] [CrossRef] [Medline]
  19. Kales SN, Straubel MG. Obstructive sleep apnea in North American commercial drivers. Ind Health. 2014;52(1):13-24. [FREE Full text] [CrossRef] [Medline]
  20. Young T, Finn L, Peppard PE, Szklo-Coxe M, Austin D, Nieto FJ, et al. Sleep disordered breathing and mortality: eighteen-year follow-up of the Wisconsin sleep cohort. Sleep. Aug 2008;31(8):1071-1078. [FREE Full text] [Medline]
  21. Punjabi NM, Caffo BS, Goodwin JL, Gottlieb DJ, Newman AB, O'Connor GT, et al. Sleep-disordered breathing and mortality: a prospective cohort study. PLoS Med. Aug 18, 2009;6(8):e1000132. [FREE Full text] [CrossRef] [Medline]
  22. Abdel-Basset M, Ding W, Abdel-Fatah L. The fusion of internet of intelligent things (IoIT) in remote diagnosis of obstructive sleep apnea: a survey and a new model. Inf Fusion. Sep 2020;61:84-100. [CrossRef]
  23. Aiyer I, Shaik L, Sheta A, Surani S. Review of application of machine learning as a screening tool for diagnosis of obstructive sleep apnea. Medicina (Kaunas). Nov 01, 2022;58(11):1574. [FREE Full text] [CrossRef] [Medline]
  24. Ramachandran A, Karuppiah A. A survey on recent advances in machine learning based sleep apnea detection systems. Healthcare (Basel). Jul 20, 2021;9(7):914. [FREE Full text] [CrossRef] [Medline]
  25. Ankitha V, Manimegalai P, Jose PS, Raji P. Literature review on sleep APNEA analysis by machine learning algorithms using ECG signals. J Phys Conf Ser. Jun 01, 2021;1937(1):012054. [CrossRef]
  26. Mendonca F, Mostafa SS, Ravelo-Garcia AG, Morgado-Dias F, Penzel T. A review of obstructive sleep apnea detection approaches. IEEE J Biomed Health Inform. Mar 2019;23(2):825-837. [CrossRef]
  27. Pei B, Xia M, Jiang H. Artificial intelligence in screening for obstructive sleep apnoea syndrome (OSAS): a narrative review. J Med Artif Intell. Feb 2023;6:1. [CrossRef]
  28. Tran NT, Tran HN, Mai AT. A wearable device for at-home obstructive sleep apnea assessment: state-of-the-art and research challenges. Front Neurol. Feb 7, 2023;14:1123227. [FREE Full text] [CrossRef] [Medline]
  29. Duarte MA, Pereira-Rodrigues P, Ferreira-Santos D. The role of novel digital clinical tools in the screening or diagnosis of obstructive sleep apnea: systematic review. J Med Internet Res. Jul 26, 2023;25:e47735. [FREE Full text] [CrossRef] [Medline]
  30. McInnes MD, Moher D, Thombs BD, McGrath TA, Bossuyt PM, the PRISMA-DTA Group, et al. Preferred reporting items for a systematic review and meta-analysis of diagnostic test accuracy studies: the PRISMA-DTA statement. JAMA. Jan 23, 2018;319(4):388-396. [CrossRef] [Medline]
  31. Lefebvre C, Glanville J, Briscoe S, Littlewood A, Marshall C, Metzendorf MI, et al. Searching for and selecting studies. In: Higgins PT, Thomas J, Chandler J, Cumpston M, Li T, Page MJ, et al, editors. Cochrane Handbook for Systematic Reviews of Interventions. Hoboken, NJ. John Wiley & Sons; 2019:67.
  32. Whiting PF, Rutjes AW, Westwood ME, Mallett S, Deeks JJ, Reitsma JB, et al. QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies. Ann Intern Med. Oct 18, 2011;155(8):529-536. [FREE Full text] [CrossRef] [Medline]
  33. Wolff RF, Moons KG, Riley RD, Whiting PF, Westwood M, Collins GS, et al. PROBAST: a tool to assess the risk of bias and applicability of prediction model studies. Ann Intern Med. Jan 01, 2019;170(1):51. [CrossRef]
  34. Ebert D, Harrer M, Cuijpers P, Furukawa T. Doing Meta-Analysis With R: A Hands-On Guide. Boca Raton, FL. CRC Press; 2012.
  35. Borenstein MH, Hedges LV, Higgins JP, Rothstein HR. Introduction to Meta‐Analysis. Hoboken, NJ. John Wiley & Sons; 2009.
  36. Freeman MF, Tukey JW. Transformations related to the angular and the square root. Ann Math Stat. Dec 1950;21(4):607-611. [CrossRef]
  37. Schwarzer G. Meta: an R package for meta-analysis. R News. 2007. URL: https://cran.r-project.org/web/packages/meta/meta.pdf [accessed 2024-04-29]
  38. Assink M, Wibbelink CJ. Fitting three-level meta-analytic models in R: a step-by-step tutorial. Quant Method Psychol. Oct 01, 2016;12(3):154-174. [CrossRef]
  39. Higgins JP, Thompson SG, Deeks JJ, Altman DS. Measuring inconsistency in meta-analyses. BMJ. Sep 06, 2003;327(7414):557-560. [FREE Full text] [CrossRef] [Medline]
  40. Benedetti D, Olcese U, Bruno S, Barsotti M, Maestri Tassoni M, Bonanni E, et al. Obstructive sleep apnoea syndrome screening through wrist-worn smartbands: a machine-learning approach. Nat Sci Sleep. May 2022;Volume 14:941-956. [CrossRef]
  41. Chang HC, Wu H, Huang P, Ma H, Lo Y, Huang Y. Portable sleep apnea syndrome screening and event detection using long short-term memory recurrent neural network. Sensors (Basel). Oct 25, 2020;20(21):6067. [FREE Full text] [CrossRef] [Medline]
  42. Chen M, Wu S, Chen T, Wang C, Liu G. Information-based similarity of ordinal pattern sequences as a novel descriptor in obstructive sleep apnea screening based on wearable photoplethysmography bracelets. Biosensors (Basel). Nov 28, 2022;12(12):1089. [FREE Full text] [CrossRef] [Medline]
  43. Chen X, Xiao Y, Tang Y, Fernandez-Mendoza J, Cao G. ApneaDetector: detecting sleep apnea with smartwatches. Proc ACM Interact Mob Wearable Ubiquitous Technol. Jun 24, 2021;5(2):1-22. [CrossRef]
  44. Fallmann S, Chen L. Detecting chronic diseases from sleep-wake behaviour and clinical features. In: Proceedings of the 5th International Conference on Systems and Informatics. 2018. Presented at: ICSAI '18; November 10-12, 2018:1076-1084; Nanjing, China. URL: https://ieeexplore.ieee.org/document/8599388 [CrossRef]
  45. Fedorin I, Slyusarenko K, Nastenko M. Respiratory events screening using consumer smartwatches. In: Proceedings of the 2020 ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the 2020 ACM International Symposium on Wearable Computers. 2020. Presented at: UbiComp/ISWC '20; September 12-17, 2020:25-28; Virtual Event. URL: https://dl.acm.org/doi/10.1145/3410530.3414399 [CrossRef]
  46. Ganglberger W, Bucklin AA, Tesh RA, Da Silva Cardoso M, Sun H, Leone MJ, et al. Sleep apnea and respiratory anomaly detection from a wearable band and oxygen saturation. Sleep Breath. Sep 18, 2022;26(3):1033-1044. [FREE Full text] [CrossRef] [Medline]
  47. Gu W, Leung L, Kwok KC, Wu I, Folz RJ, Chiang AA. Belun Ring platform: a novel home sleep apnea testing system for assessment of obstructive sleep apnea. J Clin Sleep Med. Sep 15, 2020;16(9):1611-1617. [FREE Full text] [CrossRef] [Medline]
  48. Hafezi M, Montazeri N, Zhu K, Alshaer H, Yadollahi A, Taati B. Sleep apnea severity estimation from respiratory related movements using deep learning. In: Proceedings of the 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society. 2019. Presented at: EMBC '19; July 23-27, 2019:1601-1604; Berlin, Germany. URL: https://ieeexplore.ieee.org/abstract/document/8857524 [CrossRef]
  49. Hafezi M, Montazeri N, Saha S, Zhu K, Gavrilovic B, Yadollahi A, et al. Sleep apnea severity estimation from tracheal movements using a deep learning model. IEEE Access. 2020;8:22641-22649. [CrossRef]
  50. Hung PD. Central sleep apnea detection using an accelerometer. In: Proceedings of the 1st International Conference on Control and Computer Vision. 2018. Presented at: ICCCV '18; June 15-18, 2018:106-111; Singapore, Singapore. URL: https://dl.acm.org/doi/abs/10.1145/3232651.3232660 [CrossRef]
  51. Jeon Y, Heo K, Kang SJ. Real-time sleep apnea diagnosis method using wearable device without external sensors. In: Proceedings of the 2020 IEEE International Conference on Pervasive Computing and Communications Workshops. 2020. Presented at: PerCom Workshops '20; March 23-27, 2020:1-5; Austin, TX. URL: https://ieeexplore.ieee.org/document/9156119 [CrossRef]
  52. Ji X, Rao Z, Zhang W, Liu C, Wang Z, Zhang S, et al. Airline point-of-care system on seat belt for hybrid physiological signal monitoring. Micromachines (Basel). Nov 01, 2022;13(11):1880. [FREE Full text] [CrossRef] [Medline]
  53. Kristiansen S, Nikolaidis K, Plagemann T, Goebel V, Traaen GM, Øverland B, et al. Machine learning for sleep apnea detection with unattended sleep monitoring at home. ACM Trans Comput Healthcare. Feb 09, 2021;2(2):1-25. [CrossRef]
  54. Kristiansen S, Nikolaidis K, Plagemann T, Goebel V, Traaen GM, Øverland B, et al. A clinical evaluation of a low-cost strain gauge respiration belt and machine learning to detect sleep apnea. Smart Health. Mar 2023;27:100373. [CrossRef]
  55. Kwon S, Kim HS, Kwon K, Kim H, Kim YS, Lee SH, et al. At-home wireless sleep monitoring patches for the clinical assessment of sleep quality and sleep apnea. Sci Adv. May 24, 2023;9(21):eadg9671. [FREE Full text] [CrossRef] [Medline]
  56. Le TQ, Cheng C, Sangasoongsong A, Wongdhamma W, Bukkapatnam ST. Wireless wearable multisensory suite and real-time prediction of obstructive sleep apnea episodes. IEEE J Transl Eng Health Med. 2013;1:2700109. [FREE Full text] [CrossRef] [Medline]
  57. McClure K, Erdreich B, Bates JH, McGinnis RS, Masquelin A, Wshah S. Classification and detection of breathing patterns with wearable sensors and deep learning. Sensors (Basel). Nov 13, 2020;20(22):6481. [FREE Full text] [CrossRef] [Medline]
  58. Papini GB, Fonseca P, van Gilst MM, Bergmans JW, Vullings R, Overeem S. Wearable monitoring of sleep-disordered breathing: estimation of the apnea-hypopnea index using wrist-worn reflective photoplethysmography. Sci Rep. Aug 11, 2020;10(1):13512. [FREE Full text] [CrossRef] [Medline]
  59. Petrenko A. Breathmonitor: sleep apnea mobile detector. In: Proceedings of the 2nd International Conference on System Analysis & Intelligent Computing. 2020. Presented at: SAIC '20; October 5-9, 2020:1-4; Kyiv, Ukraine. URL: https://ieeexplore.ieee.org/document/9239236 [CrossRef]
  60. Rossi M, Sala D, Bovio D, Salito C, Alessandrelli G, Lombardi C, et al. SLEEP-SEE-THROUGH: explainable deep learning for sleep event detection and quantification from wearable somnography. IEEE J Biomed Health Inform. Jul 2023;27(7):3129-3140. [CrossRef]
  61. Ryser F, Hanassab S, Lambercy O, Werth E, Gassert R. Respiratory analysis during sleep using a chest-worn accelerometer: a machine learning approach. Biomed Signal Process Control. Sep 2022;78:104014. [CrossRef]
  62. Selvaraj N, Narasimhan R. Automated prediction of the apnea-hypopnea index using a wireless patch sensor. Annu Int Conf IEEE Eng Med Biol Soc. 2014;2014:1897-1900. [CrossRef] [Medline]
  63. Shen Q, Yang X, Zou L, Wei K, Wang C, Liu G. Multitask residual shrinkage convolutional neural network for sleep apnea detection based on wearable bracelet photoplethysmography. IEEE Internet Things J. Dec 15, 2022;9(24):25207-25222. [CrossRef]
  64. Strumpf Z, Gu W, Tsai C, Chen P, Yeh E, Leung L, et al. Belun Ring (Belun Sleep System BLS-100): deep learning-facilitated wearable enables obstructive sleep apnea detection, apnea severity categorization, and sleep stage classification in patients suspected of obstructive sleep apnea. Sleep Health. Aug 2023;9(4):430-440. [FREE Full text] [CrossRef] [Medline]
  65. Tsouti V, Kanaris A, Tsoutis K, Chatzandroulis S. Development of an automated system for obstructive sleep apnea treatment based on machine learning and breath effort monitoring. Microelectron Eng. Jul 2020;231:111376. [CrossRef]
  66. van Steenkiste T, Groenendaal W, Dreesen P, Lee S, Klerkx S, de Francisco R, et al. Portable detection of apnea and hypopnea events using bio-impedance of the chest and deep learning. IEEE J Biomed Health Inform. Sep 2020;24(9):2589-2598. [CrossRef]
  67. Wang S, Xuan W, Chen D, Gu Y, Liu F, Chen J, et al. Machine learning assisted wearable wireless device for sleep apnea syndrome diagnosis. Biosensors (Basel). Apr 17, 2023;13(4):483. [FREE Full text] [CrossRef] [Medline]
  68. Wang Z, Peng C, Li B, Penzel T, Liu R, Zhang Y, et al. Single-lead ECG based multiscale neural network for obstructive sleep apnea detection. Internet Things. Nov 2022;20:100613. [CrossRef]
  69. Wu HT, Wu J, Huang P, Lin T, Wang T, Huang Y, et al. Phenotype-based and self-learning inter-individual sleep apnea screening with a level IV-like monitoring system. Front Physiol. Jul 2, 2018;9:723. [FREE Full text] [CrossRef] [Medline]
  70. Wu S, Chen M, Wei K, Liu G. Sleep apnea screening based on Photoplethysmography data from wearable bracelets using an information-based similarity approach. Comput Methods Programs Biomed. Nov 2021;211:106442. [CrossRef] [Medline]
  71. Xu Y, Ou Q, Cheng Y, Lao M, Pei G. Comparative study of a wearable intelligent sleep monitor and polysomnography monitor for the diagnosis of obstructive sleep apnea. Sleep Breath. Mar 26, 2023;27(1):205-212. [FREE Full text] [CrossRef] [Medline]
  72. Yeh E, Wong E, Tsai C, Gu W, Chen P, Leung L, et al. Detection of obstructive sleep apnea using Belun Sleep platform wearable with neural network-based algorithm and its combined use with STOP-Bang questionnaire. PLoS One. Oct 11, 2021;16(10):e0258040. [FREE Full text] [CrossRef] [Medline]
  73. Yeo M, Byun H, Lee J, Byun J, Rhee HY, Shin W, et al. Respiratory event detection during sleep using electrocardiogram and respiratory related signals: using polysomnogram and patch-type wearable device data. IEEE J Biomed Health Inform. Feb 2022;26(2):550-560. [CrossRef]
  74. Yüzer AH, Sümbül H, Nour M, Polat K. A different sleep apnea classification system with neural network based on the acceleration signals. Appl Acoust. Jun 2020;163:107225. [CrossRef]
  75. Zhang H, Fu B, Su K, Yang Z. Long-term sleep respiratory monitoring by dual-channel flexible wearable system and deep learning-aided analysis. IEEE Trans Instrum Meas. 2023;72:1-9. [CrossRef]
  76. Zhou G, Zhou W, Zhang Y, Zeng Z, Zhao W. Automatic monitoring of obstructive sleep apnea based on multi-modal signals by phone and smartwatch. Annu Int Conf IEEE Eng Med Biol Soc. Jul 2023;2023:1-4. [CrossRef] [Medline]
  77. Yeo M, Byun H, Lee J, Byun J, Rhee H, Shin W, et al. Robust method for screening sleep apnea with single-lead ECG using deep residual network: evaluation with open database and patch-type wearable device data. IEEE J Biomed Health Inform. Nov 2022;26(11):5428-5438. [CrossRef]


AI: artificial intelligence
CNN: convolutional neural network
CSA: central sleep apnea
HR: heart rate
ML: machine learning
OSA: obstructive sleep apnea
PRISMA-DTA: Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Diagnostic Test Accuracy
QUADAS-2: Quality Assessment of Studies of Diagnostic Accuracy-Revised


Edited by A Mavragani; submitted 08.03.24; peer-reviewed by BS Ibrahim, K Dhou, A Wani; comments to author 26.04.24; revised version received 07.05.24; accepted 23.07.24; published 10.09.24.

Copyright

©Alaa Abd-alrazaq, Hania Aslam, Rawan AlSaad, Mohammed Alsahli, Arfan Ahmed, Rafat Damseh, Sarah Aziz, Javaid Sheikh. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 10.09.2024.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research (ISSN 1438-8871), is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.