TY  - JOUR
AU  - Lin, Chin
AU  - Hsu, Chia-Jung
AU  - Lou, Yu-Sheng
AU  - Yeh, Shih-Jen
AU  - Lee, Chia-Cheng
AU  - Su, Sui-Lung
AU  - Chen, Hsiang-Cheng
PY  - 2017
DA  - 2017/11/06
TI  - Artificial Intelligence Learning Semantics via External Resources for Classifying Diagnosis Codes in Discharge Notes
JO  - J Med Internet Res
SP  - e380
VL  - 19
IS  - 11
KW  - word embedding
KW  - convolutional neural network
KW  - neural networks (computer)
KW  - natural language processing
KW  - text mining
KW  - data mining
KW  - machine learning
KW  - electronic medical records
KW  - electronic health records
AB  - Background: Automated disease code classification using free-text medical information is important for public health surveillance. However, traditional natural language processing (NLP) pipelines are limited, so we propose a method combining word embedding with a convolutional neural network (CNN). Objective: Our objective was to compare the performance of traditional pipelines (NLP plus supervised machine learning models) with that of word embedding combined with a CNN in conducting a classification task identifying International Classification of Diseases, Tenth Revision, Clinical Modification (ICD-10-CM) diagnosis codes in discharge notes. Methods: We used 2 classification methods: (1) extracting from discharge notes some features (terms, n-gram phrases, and SNOMED CT categories) that we used to train a set of supervised machine learning models (support vector machine, random forests, and gradient boosting machine), and (2) building a feature matrix, by a pretrained word embedding model, that we used to train a CNN. We used these methods to identify the chapter-level ICD-10-CM diagnosis codes in a set of discharge notes. We conducted the evaluation using 103,390 discharge notes covering patients hospitalized from June 1, 2015 to January 31, 2017 in the Tri-Service General Hospital in Taipei, Taiwan. We used the receiver operating characteristic curve as an evaluation measure, and calculated the area under the curve (AUC) and F-measure as the global measure of effectiveness. Results: In 5-fold cross-validation tests, our method had a higher testing accuracy (mean AUC 0.9696; mean F-measure 0.9086) than traditional NLP-based approaches (mean AUC range 0.8183-0.9571; mean F-measure range 0.5050-0.8739). A real-world simulation that split the training sample and the testing sample by date verified this result (mean AUC 0.9645; mean F-measure 0.9003 using the proposed method). Further analysis showed that the convolutional layers of the CNN effectively identified a large number of keywords and automatically extracted enough concepts to predict the diagnosis codes. Conclusions: Word embedding combined with a CNN showed outstanding performance compared with traditional methods, needing very little data preprocessing. This shows that future studies will not be limited by incomplete dictionaries. A large amount of unstructured information from free-text medical writing will be extracted by automated approaches in the future, and we believe that the health care field is about to enter the age of big data. 
SN  - 1438-8871
UR  - http://www.jmir.org/2017/11/e380/
UR  - https://doi.org/10.2196/jmir.8344
UR  - http://www.ncbi.nlm.nih.gov/pubmed/29109070
DO  - 10.2196/jmir.8344
ID  - info:doi/10.2196/jmir.8344
ER  -