用Wordsimth可以统计汉语语料吗?

hd_lwj

初级会员
用Wordsmith中的WordList统计汉语文本文件为什么出来的是乱码呢?WordSmith不是支持汉字词语的统计吗?
 
http://www.corpus4u.com/forum_view.asp?forum_id=7&view_id=420 has details about the question you asked. tokenized all the Chinese you want to process in wordsmith is probably the way out.
 
我用的是经过词性标注的人民日报语料库,是经过分词和词性标注的,想先用CWordList统计出来一个词语频率列表,用的是version 4,结果为乱码
 
Use Text Converter in Utilities to convert your data from GB into Unicode first.
 
Back
顶部