TY - JOUR AU - Lin, Chin AU - Hsu, Chia-Jung AU - Lou, Yu-Sheng AU - Yeh, Shih-Jen AU - Lee, Chia-Cheng AU - Su, Sui-Lung AU - Chen, Hsiang-Cheng PY - 2017 DA - 2017/11/06 TI - Artificial Intelligence Learning Semantics via External Resources for Classifying Diagnosis Codes in Discharge Notes JO - J Med Internet Res SP - e380 VL - 19 IS - 11 KW - word embedding KW - convolutional neural network KW - neural networks (computer) KW - natural language processing KW - text mining KW - data mining KW - machine learning KW - electronic medical records KW - electronic health records AB - Background: Automated disease code classification using free-text medical information is important for public health surveillance. However, traditional natural language processing (NLP) pipelines are limited, so we propose a method combining word embedding with a convolutional neural network (CNN). Objective: Our objective was to compare the performance of traditional pipelines (NLP plus supervised machine learning models) with that of word embedding combined with a CNN in conducting a classification task identifying International Classification of Diseases, Tenth Revision, Clinical Modification (ICD-10-CM) diagnosis codes in discharge notes. Methods: We used 2 classification methods: (1) extracting from discharge notes some features (terms, n-gram phrases, and SNOMED CT categories) that we used to train a set of supervised machine learning models (support vector machine, random forests, and gradient boosting machine), and (2) building a feature matrix, by a pretrained word embedding model, that we used to train a CNN. We used these methods to identify the chapter-level ICD-10-CM diagnosis codes in a set of discharge notes. We conducted the evaluation using 103,390 discharge notes covering patients hospitalized from June 1, 2015 to January 31, 2017 in the Tri-Service General Hospital in Taipei, Taiwan. We used the receiver operating characteristic curve as an evaluation measure, and calculated the area under the curve (AUC) and F-measure as the global measure of effectiveness. Results: In 5-fold cross-validation tests, our method had a higher testing accuracy (mean AUC 0.9696; mean F-measure 0.9086) than traditional NLP-based approaches (mean AUC range 0.8183-0.9571; mean F-measure range 0.5050-0.8739). A real-world simulation that split the training sample and the testing sample by date verified this result (mean AUC 0.9645; mean F-measure 0.9003 using the proposed method). Further analysis showed that the convolutional layers of the CNN effectively identified a large number of keywords and automatically extracted enough concepts to predict the diagnosis codes. Conclusions: Word embedding combined with a CNN showed outstanding performance compared with traditional methods, needing very little data preprocessing. This shows that future studies will not be limited by incomplete dictionaries. A large amount of unstructured information from free-text medical writing will be extracted by automated approaches in the future, and we believe that the health care field is about to enter the age of big data. SN - 1438-8871 UR - http://www.jmir.org/2017/11/e380/ UR - https://doi.org/10.2196/jmir.8344 UR - http://www.ncbi.nlm.nih.gov/pubmed/29109070 DO - 10.2196/jmir.8344 ID - info:doi/10.2196/jmir.8344 ER -