Search Articles

View query in Help articles search

Search Results (1 to 1 of 1 Results)

Download search results: CSV END BibTex RIS


Application of Efficient Data Cleaning Using Text Clustering for Semistructured Medical Reports to Large-Scale Stool Examination Reports: Methodology Study

Application of Efficient Data Cleaning Using Text Clustering for Semistructured Medical Reports to Large-Scale Stool Examination Reports: Methodology Study

Open Refine can identify all types of strings and remove duplicates without the difficulties of programming and is a free, open source tool. Open Refine contains the following 2 clustering methods: key collision methods and nearest neighbor methods. We proposed a data cleaning process using both text clustering methods in Open Refine to improve accuracy of semistructured data. We performed data cleaning of 574,266 stool examination reports conducted at Samsung Medical Center from 1995 to 2015.

Hyunki Woo, Kyunga Kim, KyeongMin Cha, Jin-Young Lee, Hansong Mun, Soo Jin Cho, Ji In Chung, Jeung Hui Pyo, Kun-Chul Lee, Mira Kang

J Med Internet Res 2019;21(1):e10013