清风出袖 高级会员 2005-12-06 #2 http://www.corpus4u.com/forum_view.asp?forum_id=7&view_id=420 has details about the question you asked. tokenized all the Chinese you want to process in wordsmith is probably the way out.
http://www.corpus4u.com/forum_view.asp?forum_id=7&view_id=420 has details about the question you asked. tokenized all the Chinese you want to process in wordsmith is probably the way out.
xujiajin 管理员 Staff member 2005-12-06 #4 WordSmith version 4 processes unicodified Chinese texts reliably.
H hd_lwj 初级会员 2005-12-07 #5 我用的是经过词性标注的人民日报语料库,是经过分词和词性标注的,想先用CWordList统计出来一个词语频率列表,用的是version 4,结果为乱码
X xiaoz 永远的超级管理员 Staff member 2005-12-07 #7 Use Text Converter in Utilities to convert your data from GB into Unicode first.
xujiajin 管理员 Staff member 2005-12-07 #9 Unicode here, acceptable by WS4, refers specifically to utf-16, not utf-8.