WordSmith 3其实可以处理中文

回复:WordSmith 3其实可以处理中文

以下是引用 superyangt2005-8-9 20:45:56 的发言:
请问各位,有谁知道怎么用concordancer处理中文文本?
感谢。

Pls read the following post for clues
汉语文本处理常见问题及解答FAQ
http://www.corpus4u.org/showthread.php?t=699

And there are many other posts on this site discussing the similar topic.
 
回复:WordSmith 3其实可以处理中文

以下是引用 xiaoz2005-7-12 22:04:50 的发言:
Haven't tried Concordance, but if it is based Unicode as 动态语法 suggested, there is be no problem with this tool.

But for WordSmith 3, only Concord works on segemnted Chinese texts. Wordlist, and relatedly Cluster, and Keyword, do not work.

2005071222044071.jpg


xiaoz, i read this thread carefully this time. do you mean that there is not a solution to the problem? i mean we can't "wordlist" Chinese data in Wordsmith3.0.
 
Unfortunately, but it is true, no way with version 3.

If you convert your Chinese data into Unicode, WS 3 will not recognise it.

You will need to pay 10 pounds to get an upgrade from 3 to 4.
 
其实大家上面谈论的是一个软件编码的问题。我的体会是:不要努力使用不支持汉字编码的软件处理汉字。有些软件很好,有些就不怎么好。
 
I find that a Chinese file tokenized by segtag as mentioned at http://www.corpus4u.com/forum_view.asp?forum_id=8&view_id=557&page=2 doesn't work with WordSmith 3 when I try it in concordancing, while the same file tokenized by ICTCLAS can be recognized by WordSmith 3 in concordancing. Why? Are there any specific reasons for this?
 
回复:WordSmith 3其实可以处理中文

的确可以用WordSmith 3做中文的索引,条件是中文要经过切分。我想请教诸位另一个相关的问题:能否用WordSmith 3对经过切分的中文文本进行词频统计呢?本人试了几次,都失败了。不知道是何原因。本人使用的是正版WordSmith 3,版本应该没问题。
 
词匠3可以将分过词的中文文本索引出来但是无法统计词频和统计汉语词汇。好像上面已经说过的。这种统计汉语词汇和词频的功能只有在词匠4里面才有这个功能的。
 
回复:WordSmith 3其实可以处理中文

以下是引用 清风出袖2006-3-23 20:21:48 的发言:
词匠3可以将分过词的中文文本索引出来但是无法统计词频和统计汉语词汇。好像上面已经说过的。这种统计汉语词汇和词频的功能只有在词匠4里面才有这个功能的。

对的。我翻回前面看见有人已经实验过了。结果是不行。
谢谢“清风出袖”。
 
回复:WordSmith 3其实可以处理中文

以下是引用 清风出袖2006-3-23 8:29:59 的发言:
I find that a Chinese file tokenized by segtag as mentioned at http://www.corpus4u.com/forum_view.asp?forum_id=8&view_id=557&page=2 doesn't work with WordSmith 3 when I try it in concordancing, while the same file tokenized by ICTCLAS can be recognized by WordSmith 3 in concordancing. Why? Are there any specific reasons for this?

The tokenized result from Segtag or ICTCLAS should be resaved as GB format before you make concordances with WS3. However, you need resave the files as Unicode (not UTF-7 or UTF-8) to make concordances and wordlist with WS4.
 
Back
顶部