Automatic Detection of Hypoglycemic Events From the Electronic Health Record Notes of Diabetes Patients: Empirical Study

Background Hypoglycemic events are common and potentially dangerous conditions among patients being treated for diabetes. Automatic detection of such events could improve patient care and is valuable in population studies. Electronic health records (EHRs) are valuable resources for the detection of such events. Objective In this study, we aim to develop a deep-learning–based natural language processing (NLP) system to automatically detect hypoglycemic events from EHR notes. Our model is called the High-Performing System for Automatically Detecting Hypoglycemic Events (HYPE). Methods Domain experts reviewed 500 EHR notes of diabetes patients to determine whether each sentence contained a hypoglycemic event or not. We used this annotated corpus to train and evaluate HYPE, the high-performance NLP system for hypoglycemia detection. We built and evaluated both a classical machine learning model (ie, support vector machines [SVMs]) and state-of-the-art neural network models. Results We found that neural network models outperformed the SVM model. The convolutional neural network (CNN) model yielded the highest performance in a 10-fold cross-validation setting: mean precision=0.96 (SD 0.03), mean recall=0.86 (SD 0.03), and mean F1=0.91 (SD 0.03). Conclusions Despite the challenges posed by small and highly imbalanced data, our CNN-based HYPE system still achieved a high performance for hypoglycemia detection. HYPE can be used for EHR-based hypoglycemia surveillance and population studies in diabetes patients.


Recurrent Neural Network
Recurrent Neural Network (RNN) is a common type of neural networks used for sequential data. It is a modification of traditional feed-forward neural network with recurrent connections via a recurrent unit. The recurrent connections allow it to deal with variable sequence length naturally. It sequentially processes the elements in the input sentence one by one. Let l be the sentence length, at the ℎ step (1 ≤ ≤ ), it computes a fixed dimension hidden state ℎ = ( , ℎ −1 ), where is the ℎ row of the matrix and ℎ −1 is the hidden state from the last step. The hidden state at the last step ℎ thus contains a summarization of all the information in the original sequence. We simply set = ℎ as our final vector representation. A common modification to the RNN network is to use a bidirectional version, where two hidden vectors ℎ → and ℎ ← are computed by processing the input sequence forward and backward and then two vectors are concatenated together to produce the final representation = ℎ → ⨁ℎ ← .
Long Short-Term Memory (LSTM) unit is the most commonly used recurrent unit in RNN, it helps solve the vanished gradient problem of the standard matrix unit and was therefore shown to improve performance in NLP applications.
In addition to the hidden state ℎ −1 and the current vector , the LSTM unit takes one additional state vector −1 from last step and produce the next hidden state ℎ and state vector . The new hidden state is computed as: In our work, we tested both standard and bidirectional RNN networks with LSTM units.

Convolutional Neural Network
Convolutional Neural Networks (CNN) utilize convolution layers to extract local features presented in data with a known, grid-like topology. Convolution is a specialized kind of linear operation. It involves a filter ∈ ℝ , which is applied to a window of -dimensional word embeddings to produce a real value feature. For example, a feature is generated from the ℎ to the ( + − 1) ℎ rows by using a non-linear operation = ( • : + −1 + ).
This operation is applied to every possible window of the matrix to produce a feature vector = [ 1 , 2 , … , − +1 ].

Temporal Convolutional Neural Network
A Temporal Convolutional Network (TCN) is a recently proposed and more complex architecture of convolutional network. It utilizes 1D fully-convolutional network and causal convolutions at the same time, which means the network produces an output of the same length as the input and there can be no leakage from the future to the past.
Because the sequence length is invariant after TCN layer, we could stack multiple TCN layers together to create a larger network, which will allow the causal convolution to look back at a history with size linear in the depth of the network. To achieve long effective history size as well as restrict the number of layers so it could be feasibly trained, we use a dilated convolution, which defined as: is the dilation factor. For = 1, the dilated operation reduces to a regular convolution and for > 1, it allows the operation to have a history size grow exponentially with the network depth.
To mitigate memory loss and vanishing gradient commonly encountered in very deep neural networks, residual connections are added to the TCN networks. A residual block contains a branch skipping a certain number of layers to transformation ℱ, and directly added to the outputs: = ( + ℱ( ). This modification has been repeatedly shown to benefit very deep neural networks.