回复:Announcing A Corpus Worker"s Toolkit
以下是引用 xujiajin 在 2005-8-18 9:18:52 的发言:
以下是引用 清风出袖 在 2005-8-18 8:46:48 的发言:
many thanks to Mr. Xiao's kind explanation! what does T-score mean? Could you take the trouble of giving a brief intrduction here on it, 动态语法?thanks alot!
[转贴] 关于T-Score 和 Mutual Information 的好文章
http://www.corpus4u.org/showthread.php?t=58
T-SCORE AND MUTUAL INFORMATION SCORE
http://www.corpus4u.org/showthread.php?t=26
J. Clear says that:
the null hypothesis,
(f(post) * span ) * relative_freq(the)
which is
(2579 * 8) * (1 / 20) = 20632 / 20 = 1031
And in calculating both MI/T-Score, the notion of span is used as a
variable. My question (and confusion) is, why choosing 8, why not
other numbers? is there an optimal number to use?
Ultimately people can choose any # for the span, but
in many cases I have found that they just use 1 as the window size,
hence the XY method (i.e. XY are immediately next to each other). I
believe this has been the case for many people, but I also believe that
people can do it differently (i.e. choosing different numbers).