%0 Journal Article %@ 1438-8871 %I Gunther Eysenbach %V 9 %N 1 %P e4 %T Term Identification Methods for Consumer Health Vocabulary Development %A Zeng,Qing T %A Tse,Tony %A Divita,Guy %A Keselman,Alla %A Crowell,Jon %A Browne,Allen C %A Goryachev,Sergey %A Ngo,Long %+ Harvard Medical School, Decision Systems Group, Brigham and Women's Hospital, Thorn 304, 75 Francis Street, Boston, MA 02115, USA, +1 617 732 7694, qzeng@dsg.harvard.edu %K Consumer health information %K vocabulary %K natural language processing %D 2007 %7 14.3.2007 %9 Original Paper %J J Med Internet Res %G English %X Background: The development of consumer health information applications such as health education websites has motivated the research on consumer health vocabulary (CHV). Term identification is a critical task in vocabulary development. Because of the heterogeneity and ambiguity of consumer expressions, term identification for CHV is more challenging than for professional health vocabularies. Objective: For the development of a CHV, we explored several term identification methods, including collaborative human review and automated term recognition methods. Methods: A set of criteria was established to ensure consistency in the collaborative review, which analyzed 1893 strings. Using the results from the human review, we tested two automated methods—C-value formula and a logistic regression model. Results: The study identified 753 consumer terms and found the logistic regression model to be highly effective for CHV term identification (area under the receiver operating characteristic curve = 95.5%). Conclusions: The collaborative human review and logistic regression methods were effective for identifying terms for CHV development. %M 17478413 %R 10.2196/jmir.9.1.e4 %U http://www.jmir.org/2007/1/e4/ %U https://doi.org/10.2196/jmir.9.1.e4 %U http://www.ncbi.nlm.nih.gov/pubmed/17478413