Academic Journal of Computing & Information Science, 2019, 2(2); doi: 10.25236/AJCIS.010040.
You Yu, Yu Fu
Department of Information Security, Naval University of Engineering, Wuhan 430033, China
In view of the complexity of text categorization and search in the era of big data, Based on the diversity of Chinese words, and the task of constructing feature lexicon in text classification and searching, this paper designs a feature lexicon method based on word level. By learning the existing samples and identifying new words using CRF model, discriminating the importance of the words, reasonably dividing the word level and assigning weights, constructing an efficient and accurate feature lexicon, this method could obtain stable word segmentation effects, and effectively improve the accuracy of subsequent classification.
feature lexicon, word level, word segmentation, CRF
You Yu, Fu Yu. A Method of Constructing Feature Lexicon Based on Word Level. Academic Journal of Computing & Information Science (2019), Vol. 2, Issue 2: 68-77. https://doi.org/10.25236/AJCIS.010040.
 Li Hui (2015). A Review on the Research of Word Similarity Algorithm. Modern Information, vol.35, no.4, p.172-177.
 Xiong Huixiang, Ye Jiaxin (2018). Research on Hierarchical Structure Construction of Socialized Tags Based on TongYiCiCiLin. Journal of Information, vol.37, no.1, p.126-131.
 Dong Zhendong, Dong Qiang. HowNet [DB/OL].[2012-9-11]. http://www. keenage.com.
 XUE N (2003). Chinese Word Segmentation as Character Tagging. Computational Linguistics Chinese Language Pro-cessing, vol.8, no.1, p. 29-48.
 Wang Xianfang, Du Limin (2003). Using Chinese Coverage Ambiguity Detection and Statistical Language Model for Automatic Chinese Word Segmentation.Journal of Electronics & Information Technology, no.9, p.1168-1173.
 Liu Qun, Zhang Huaping, Yu Hongkui, et al (2004). Chinese Lexical Analysis Using Hhidden Markov Model. Computer Research and Development, vol.41, no.8, p.1421-1429.
 Liu Wenpeng (2012). Research on Chinese Word Segmentation Based on the Thesaurus and Bayes' theorem. Huazhong University of Science and Technology.
 Zhang Honggang, Li Huan (2017). Chinese Word Segmentation Method Based on Two-way Llong-term Memory model.Journal of South China University of Technology(Natural Science Edition),vol.45, no.3, p.61-67.
 Liu Zewen, Ding Dong, Li Chunwen (2015). Chinese Word Segmentation Method for Short Chinese Text Based on conditional random field. Journal of Tsinghua University(Science and Technology), vol. 55, no.8, p. 906-910+915.
 Qian Tao, Ji Donghong, Dai Wenhua (2015). A Model of Weibo Participle and Text Normalization Based on Migration. Journal of South China University of Technology (Natural Science Edition), vol.43, no.11, p.47-53.
 Han Dongxi, Chang Bing (2015). Approach to Domain Adaptive Chinese Segmentation Model. Chinese Journal of Computers, vol.38, no.7, p.272-281.
 Chen Shouqin (2017). Research on Chinese Short Text Unregistered Word Discovery and Sentiment Analysis Method. Beijing University of Technology.
 Sohu News Data [DB/OL]. . https://www.sogou.com/labs/resource/cs. php.