[求助]SARA中的Z-score怎么跟其它方法算出来的不一样?

I've finally given up trying to make out the differentiation between Z-scores calculated by SARA and the method proposed in Yang's book, 'An Introduction to Corpus Linguistics'. I took 动态语法's suggestion "最好是找到不同语料库的相匹配的原始数据,然后用同一个统计软件计算", and left this issue unsolved.
However, by making a few tests, I don't tend to believe that the differentiation is caused by different delimitations of span. I use the clearest way to define S, left 5 and right 5, or left 4 and right 4, etc. Listed below are my tests:

C': co-occurrence with the node
C: occurrence of the collocate
W: total number of words in BNC
N: occurrence of the node word
 
本来想上传文件的,没找到什么地方可以。懒得找了,没有表格的数据,大家凑合看看吧。
N (anything) =27487, C (else) =19931
Z-score
BNC proposed method
left=5 right=5 C’ =2295 321.4 287.9
left=4 right=4 C’ =2265 356.1 315.6
left=3 right=3 C’ =2223 405.2 352.8
 
用SARA算BNC里的一些搭配词的z值,放到专门算Z-score 的软件CalcZ里检验,结果差得很远,这是怎么回事啊?

据说,SARA软件里的公式可能有flaws.
 
Back
顶部