A New Approach for Detecting Sleep Apnea Using a Contactless Bed Sensor: Comparison Study

Background At present, there is an increased demand for accurate and personalized patient monitoring because of the various challenges facing health care systems. For instance, rising costs and lack of physicians are two serious problems affecting the patient’s care. Nonintrusive monitoring of vital signs is a potential solution to close current gaps in patient monitoring. As an example, bed-embedded ballistocardiogram (BCG) sensors can help physicians identify cardiac arrhythmia and obstructive sleep apnea (OSA) nonintrusively without interfering with the patient’s everyday activities. Detecting OSA using BCG sensors is gaining popularity among researchers because of its simple installation and accessibility, that is, their nonwearable nature. In the field of nonintrusive vital sign monitoring, a microbend fiber optic sensor (MFOS), among other sensors, has proven to be suitable. Nevertheless, few studies have examined apnea detection. Objective This study aims to assess the capabilities of an MFOS for nonintrusive vital signs and sleep apnea detection during an in-lab sleep study. Data were collected from patients with sleep apnea in the sleep laboratory at Khoo Teck Puat Hospital. Methods In total, 10 participants underwent full polysomnography (PSG), and the MFOS was placed under the patient’s mattress for BCG data collection. The apneic event detection algorithm was evaluated against the manually scored events obtained from the PSG study on a minute-by-minute basis. Furthermore, normalized mean absolute error (NMAE), normalized root mean square error (NRMSE), and mean absolute percentage error (MAPE) were employed to evaluate the sensor capabilities for vital sign detection, comprising heart rate (HR) and respiratory rate (RR). Vital signs were evaluated based on a 30-second time window, with an overlap of 15 seconds. In this study, electrocardiogram and thoracic effort signals were used as references to estimate the performance of the proposed vital sign detection algorithms. Results For the 10 patients recruited for the study, the proposed system achieved reasonable results compared with PSG for sleep apnea detection, such as an accuracy of 49.96% (SD 6.39), a sensitivity of 57.07% (SD 12.63), and a specificity of 45.26% (SD 9.51). In addition, the system achieved close results for HR and RR estimation, such as an NMAE of 5.42% (SD 0.57), an NRMSE of 6.54% (SD 0.56), and an MAPE of 5.41% (SD 0.58) for HR, whereas an NMAE of 11.42% (SD 2.62), an NRMSE of 13.85% (SD 2.78), and an MAPE of 11.60% (SD 2.84) for RR. Conclusions Overall, the recommended system produced reasonably good results for apneic event detection, considering the fact that we are using a single-channel BCG sensor. Conversely, satisfactory results were obtained for vital sign detection when compared with the PSG outcomes. These results provide preliminary support for the potential use of the MFOS for sleep apnea detection.


Data Analysis
For each patient, encrypted binary files were first decrypted using proprietary software and stored in comma-separated values (CSV) file format. Afterward, all CSV files were concatenated into a single CSV file representing a one-night data recording. Each CSV file contained seven data columns, i.e., Unix timestamp, amplified raw data (1e7 x electric current), filtered BCG signal, ambient sound, ambient temperature, ambient light, unamplified raw data (1e6 x electric current), and the power supplied to the light source. We only considered the first and the seventh data columns in our data analysis, i.e., Unix timestamp and unamplified raw data. Upon binding data chunks for each patient, we synchronized acquired raw data according to the start and end time of the PSG study. Essentially, the unamplified raw data represent a mixture of two signals (i.e., BCG and respiratory effort signals), in addition to noise data or motion artifacts that represent frequent body movements.

Vital signs detection
Chebyshev Type I bandpass filter was applied to artifact-free data to obtain BCG and respiratory signals. The cutoff frequencies were selected such as (2.5Hz -5Hz, 0.5dB) and (0.01Hz -0.4Hz, 0.5dB), respectively. Several attempts in literature were made to compute the heart rate from BCG signals. These attempts include time-domain approaches, frequency-domain approaches, wavelet analysis, and clustering-based approaches [4]. A recent comparative study performed by Suliman et al. [4] concluded that the wavelet analysis-based approach proposed by Sadek et al. [5] was one of the two-high performing methods in terms of average peak detection rate, average false alarm rate, and average mean absolute error between true and predicted peaks. This particular task is challenging because the J-peaks of the BCG signal (equivalent to R-peaks of the ECG signal) are not consistent and vary across and between subjects. For Jpeak detection, we used Sadek et al. [5] approach, which utilizes the multiresolution analysis of the "maximal overlap discrete wavelet transform", or a.k.a., MODWT [6]. This method aimed at reducing the BCG signal into smooth and detail time series components by passing the signal through low-pass and high-pass filters and then selecting the component that shows an agreement with the J-peaks. The Biorthogonal wavelet Bior3.9 basis function with level 4 was accommodated for the analysis, while the fourth level smooth coefficient was chosen to represent the cardiac cycles. In the end, J-peaks were traced through a peak detector. The same wavelet basis function was used across all patients recruited in the study. Heart rates were measured using a sliding time window of 30 seconds with an overlap of 15 seconds. The ECG signal was used as a reference to detect interbeat intervals (IBIs). For this purpose, we selected the well-known Pan and Tompkins algorithm owing to its reasonable results [7]. Respiratory rate, on the other hand, can be measured directly from the band passed filtered data via a peak detector. However, before locating breathing cycles, we first removed the nonlinear trend from the signal by subtracting a polynomial fit of the 3 rd order. Respiratory rates were calculated using a sliding time window of 30 seconds with an overlap of 15 seconds. The effort signal obtained from the thoracic belt was used as a reference to detect respiratory cycles. Compared to abdominal effort and airflow (i.e., pressure and thermistor) signals, effort thoracic signal was highly correlated with the one acquired from the optical fiber mat.

Statistical analysis
All data processing and analysis were presented in Python (version 3.7.6) using PyCharm Professional Edition. Graphical illustrations of data analysis and evaluation metrics, including, for example, the Pearson correlation coefficient and bar plots with error bars were produced by python. Seaborn 0.10.0 Python data visualization library was used to create the Pearson correlation coefficient plots, while RStudio version 1.2.5033 (Rstudio Inc) was used to create the Bland-Altman plots. Table 1 shows the error metrics used in our approach along with their mathematical formulas.

Sensitivity
Sensitivity (Sens) is often presented in proportion and describes the probability that a test will yield a positive result if the disorder is present. It is determined as the number of correct positive predictions divided by the total number of positives [8][9][10]. Likewise, it can be described as a recall or true positive rate. In our case, it defines the proportion of correctly identified apneic events.

Specificity
Specificity (Spec) is often presented in proportion and describes the probability that a test will yield a negative result if the disorder is not present. It is calculated as the number of correct negative predictions divided by the total number of negatives [8][9][10]. Likewise, it can be described as a true negative rate. In our case, it defines the proportion of correctly identified non-apneic events.
Accuracy Accuracy (Acc) is often presented in proportion and describes the probability of all instances that are correctly classified. It is calculated as the number of all correct predictions divided by the total number of all instances in the dataset [8][9][10].

Cohen kappa coefficient
The Cohen kappa coefficient, i.e., kappa statistic, is often adopted to measure the inter-annotator agreement. In other words, it presents the percentage of agreement beyond that predicted by chance [11]. In common with the correlation coefficient, it can vary from -1 to 1, where 0 defines the amount of agreement that can be predicted from random chance, and 1 suggests a precise agreement between the raters. According to Cohen [12], we can translate kappa results as follows: "values ≤ 0 as indicative of no agreement and 0.01-0.20 as none to slight, 0.21-0.40 as fair, 0.41-0.60 as moderate, 0.61-0.80 as substantial, and 0.81-1.00 as an almost perfect agreement" [13].

Matthews correlation coefficient
Matthews correlation coefficient (MCC) is a special case of the Pearson correlation coefficient, i.e., it is a cross-tabulation method of calculating the Pearson correlation coefficient between true and predicted values. The value of the coefficient varies between -1 and +1. A coefficient of +1 denotes a perfect classification, 0 a coin-tossing classifier and -1 a perfect misclassification. It is also identified as the phi coefficient and the only binary classifier that can yield a high score only when the binary predictor could accurately predict the most positive data instances and most negative data instances [14].

Bland-Altman plot
The Bland-Altman plot [15,16] is a tool to quantify the agreement between two quantitative estimations. This is done by creating limits of agreement (LoA). The LoA are calculated using the mean and standard deviations of the differences between the two measurements [17]. This graphical representation plots the differences between the two measurements on the y-axis and the averages of the two measurements on the x-axis. It is a favored method to measure the agreement between two medical devices because devices are not likely to have the exact agreement. Most importantly, it estimates how close pairs of measurements are as small differences between devices are not likely to influence patient decisions [18].