A Corpus Worker`s Toolkit:语料库工具箱-0908 更新

xiaoz

永远的超级管理员
Staff member
If you want to the additional clips given in this thread, I suppose the answer is yes, because these default paths are defined as such in these clips.
 

清风出袖

高级会员
see! mr xiao! thanks a lot! but you know I am a layperson in computer what do you mean by thread ?thanks again for any one's kind explanation .by the way what is it in Chinese?thanks again!
 

xiaoz

永远的超级管理员
Staff member
As you can see, postings on BBS are usually linked by the same topic - just like coins used in ancient China were kept together using a string or thread. So this reply is the 24th posting in this thread - "Announcing A Corpus Worker"s Toolkit"
 

动态语法

管理员
Staff member
回复:Announcing A Corpus Worker"s Toolkit

In order to prevent this fro m happening, the following steps must be followed.

4) Copy the following files to C:\Perl\bin: kwic.pl, kwic_e.pl, segment.pl, wordlist.txt.

5) Copy segmenter.pl to C:\Perl\lib.

If you have followed these steps and still got that error, then we have a problem.


以下是引用 xiaoz2005-8-17 19:38:52 的发言:
Shoud prove a very handy tool! Many thanks.

Any idea of what's happing in the following screen dump?

 

动态语法

管理员
Staff member
回复:Announcing A Corpus Worker"s Toolkit

以下是引用 清风出袖2005-8-17 22:00:12 的发言:
which choice will be better choice for winxp pro os?
Don't know. Haven't tried. Can you (or anyone) try it out and give a report back? Thanks.
 

xiaoz

永远的超级管理员
Staff member
Thanks, the problem in No 25 is solved. My fault - have extracted the files into a sub-folder in bin and lib and forgot to move them.
 

动态语法

管理员
Staff member
回复:Announcing A Corpus Worker"s Toolkit

With this release, I am still not comfortable with a number of
things. Among them,

1) T-Score, which has been confusing to
me when I read the literature. So don't write a dissertation based
on the T-Score results produced here!

2) XML. I hope the sample tags are not too far off, even though they
are relatively consistent.
 

清风出袖

高级会员
many thanks to Mr. Xiao's kind explanation! what does T-score mean? Could you take the trouble of giving a brief intrduction here on it, 动态语法?thanks alot!
 

xujiajin

管理员
Staff member
这个多合一的工具包的确不错,而且大家还可以根据实际需要自行编辑自己的附码集。
可是只是有一点始终让我不满意,就是界面上的字太小。现在终于有解决办法了。
我将界面上所有的字体都改为了Times New Roman 12 pt号(相当于汉字小四)字体。
具体做法是


除了clipbook,大家还可以相应地更改Documents, Toolbar etc。
 

xujiajin

管理员
Staff member

动态语法

管理员
Staff member
回复:Announcing A Corpus Worker"s Toolkit

以下是引用 xujiajin2005-8-18 9:18:52 的发言:
以下是引用 清风出袖2005-8-18 8:46:48 的发言:
many thanks to Mr. Xiao's kind explanation! what does T-score mean? Could you take the trouble of giving a brief intrduction here on it, 动态语法?thanks alot!
[转贴] 关于T-Score 和 Mutual Information 的好文章
http://www.corpus4u.org/showthread.php?t=58

T-SCORE AND MUTUAL INFORMATION SCORE
http://www.corpus4u.org/showthread.php?t=26
J. Clear says that:

the null hypothesis,
(f(post) * span ) * relative_freq(the)
which is
(2579 * 8) * (1 / 20) = 20632 / 20 = 1031

And in calculating both MI/T-Score, the notion of span is used as a
variable. My question (and confusion) is, why choosing 8, why not
other numbers? is there an optimal number to use?

Ultimately people can choose any # for the span, but
in many cases I have found that they just use 1 as the window size,
hence the XY method (i.e. XY are immediately next to each other). I
believe this has been the case for many people, but I also believe that
people can do it differently (i.e. choosing different numbers).
 

动态语法

管理员
Staff member
回复:Announcing A Corpus Worker"s Toolkit

以下是引用 清风出袖2005-8-18 8:46:48 的发言:
many thanks to Mr. Xiao's kind explanation! what does T-score mean? Could you take the trouble of giving a brief intrduction here on it, 动态语法?thanks alot!
See the reference to Jerome Clear above by xujiajin.

My general understanding is that, these statistics are designed to
distinguish between truly interesting collocations from the random
superficial ones.
 

动态语法

管理员
Staff member
回复:Announcing A Corpus Worker"s Toolkit

There was also a paper that was posted here a while ago:

汉语搭配定量分析初探
孙茂松黄昌宁方捷
清华大学计算机科学与技术系

http://www.corpus4u.org/showthread.php?t=56
where they provide some different perspectives on collocations in Chinese.
 

动态语法

管理员
Staff member
回复:Announcing A Corpus Worker"s Toolkit

以下是引用 xujiajin2005-8-18 8:55:29 的发言:
这个多合一的工具包的确不错,而且大家还可以根据实际需要自行编辑自己的附码集。
可是只是有一点始终让我不满意,就是界面上的字太小。现在终于有解决办法了。
我将界面上所有的字体都改为了Times New Roman 12 pt号(相当于汉字小四)字体。
具体做法是
-------
除了clipbook,大家还可以相应地更改Documents, Toolbar etc。
Yes, that's certainly an option, especially when you are using a large
monitor, you can set the font larger for easy viewing.
 
顶部