just now I tried to figure out how to use the function of T-score calculation with ACWT. I tried two methods of comparing the likelihoods of association of words "strong" and "thick" with word "smell" by calculating them in the function of compute t-ccore and compute t-score by the ele method. yet the differences in the results often come from the order of the input, i.e. if I input strong in the box then , strong is more likely to asscoiate with word smell, if i input thick first then thick is more so! what is wrong with my calculation?
Result1(this is the most of the result I got when I input the strong first on the top box after I click the module compute t-score by ele method):
* 1st node word 'strong', N=7809
* 2nd node word 'thick', N=1984
* Collocate word 'smell', N=13428
* Frequency of 'strong smell'=175
* Frequency of 'thick smell'=2
* No. of words following either 'strong' or 'thick'=1841
* Corpus size = 44300000 words
Results( if I reverse the order of input, then the result is difeerent)
* 1st node word 'thick ', N=7809
* 2nd node word 'strong', N=1984
* Collocate word 'smell', N=13428
* Frequency of 'thick smell'=175
* Frequency of 'strong smell'=2
* No. of words following either 'thick ' or 'strong'=1841
* Corpus size = 44300000 words
T-Score by the (ELE) method: t=11.94
Hints: 'thick smell' is 11.94 standard deviations more likely than 'strong smell',
or,
'strong smell' is 11.94 standard deviations less likely than 'thick smell'.
According to Church et al, the confidence threshold should be at least 2.15 instead of 1.65.
T-Score by the (ELE) method: t=11.94
Hints: 'strong smell' is 11.94 standard deviations more likely than 'thick smell',
or,
'thick smell' is 11.94 standard deviations less likely than 'strong smell'.
[本贴已被 作者 于 2005年09月20日 14时54分42秒 编辑过]