TY - JOUR AU - Zeng, Qing T AU - Tse, Tony AU - Divita, Guy AU - Keselman, Alla AU - Crowell, Jon AU - Browne, Allen C AU - Goryachev, Sergey AU - Ngo, Long PY - 2007 DA - 2007/3/14 TI - Term Identification Methods for Consumer Health Vocabulary Development JO - J Med Internet Res SP - e4 VL - 9 IS - 1 KW - Consumer health information KW - vocabulary KW - natural language processing AB - Background: The development of consumer health information applications such as health education websites has motivated the research on consumer health vocabulary (CHV). Term identification is a critical task in vocabulary development. Because of the heterogeneity and ambiguity of consumer expressions, term identification for CHV is more challenging than for professional health vocabularies. Objective: For the development of a CHV, we explored several term identification methods, including collaborative human review and automated term recognition methods. Methods: A set of criteria was established to ensure consistency in the collaborative review, which analyzed 1893 strings. Using the results from the human review, we tested two automated methods—C-value formula and a logistic regression model. Results: The study identified 753 consumer terms and found the logistic regression model to be highly effective for CHV term identification (area under the receiver operating characteristic curve = 95.5%). Conclusions: The collaborative human review and logistic regression methods were effective for identifying terms for CHV development. SN - 1438-8871 UR - http://www.jmir.org/2007/1/e4/ UR - https://doi.org/10.2196/jmir.9.1.e4 UR - http://www.ncbi.nlm.nih.gov/pubmed/17478413 DO - 10.2196/jmir.9.1.e4 ID - info:doi/10.2196/jmir.9.1.e4 ER -