请问这个log-likelihood结果该怎么解读

majorlv511 · 2006-10-26

这个是我比较的一个词在两个语料库（corpus1, corpus 2）中的使用情况，

比如附件中的这个 LL值为 -135.92，该怎么解读呀？

谢谢大家了！

majorlv511 · 2006-10-26

回复: 请问这个log-likelihood结果该怎么解读

Log-likelihood calculator results
Key:
O1 is observed frequency in Corpus 1
O2 is observed frequency in Corpus 2
%1 and %2 values show relative frequencies in the texts.
+ indicates overuse in O1 relative to O2,
- indicates underuse in O1 relative to O2

xiaoz · 2006-10-26

Under use of your search word in corpus 1. The difference in the frequencies between corpus 1 and corpus 2 is highly significant in statistical terms p<0.001.

majorlv511 · 2006-10-26

回复: Re: 请问这个log-likelihood结果该怎么解读

作者 xiaoz:
Under use of your search word in corpus 1. The difference in the frequencies between corpus 1 and corpus 2 is highly significant in statistical terms p<0.001.

but what's the concrete use of the number?

can i say "the LL value (-135.92) means under-use of the search word in corpus 1 is 135.92? or 135.92%?"

xiaoz · 2006-10-26

the number is called loglikelihood (LL) score. in your example (a 2 x 2 contigency table, i.e. 1 degree of freedom), the critical value for an observed difference to be statistically significant (at p<0.05, i.e. you can be 95% confident that the observed difference is real but not due to chance) is 3.84
(6.64 for p<0.01 or 99% significance level, 10.83 for p<0.001 or 99.9%, 15.13 for p<0.0001 or 99.99%).

So in your case, you can be over 99.99% confident that the search item is under used in your corpus 1 in relation to corpus 2.

armstrong · 2006-10-26

回复: 请问这个log-likelihood结果该怎么解读

thans a lot for your explaination,Dr.Xiao.you are always helpful.

majorlv511 · 2006-10-26

回复: 请问这个log-likelihood结果该怎么解读

many thanx, Dr.xiao.

but where can i get the critical value of log-likelihood score? the same as Chi-square test score?

xiaoz · 2006-10-26

Yes, you are right. The distribution table for chi-square tests also applies to LL tests.

You will also find the following links of use if you do not have a distribution table at hand:

http://stattrek.com/Tables/ChiSquare.aspx

http://www.fon.hum.uva.nl/Service/Statistics/ChiSquare_distribution.html

armstrong · 2006-10-27

回复: 请问这个log-likelihood结果该怎么解读

可以从卡方分布表中查得，如果使用李绍山的《语言研究中的统计学》一书，可以在221页中获得，其中概率是0.05 是值为3.84;概率是0.01 时是6.64;概率是0.001时值是10.83.
你计算出的值是大于100，肯定超过99.999%的可信。
如果你使用的是韩宝成的《外语教学科研中的统计方法》一书可在188页中查到。

xujiajin · 2007-04-17

回复: 请问这个log-likelihood结果该怎么解读

是的，韩宝成书的188－189页是X2表。

freddyluo · 2009-05-06

回复: 请问这个log-likelihood结果该怎么解读

显著性水平p值设定的标准时什么呢？为什么有些时候是0，05，有些时候又设0，01

volfer · 2011-02-17

回复: 请问这个log-likelihood结果该怎么解读

作者 freddyluo:
显著性水平p值设定的标准时什么呢？为什么有些时候是0，05，有些时候又设0，01

设为0.05时，即在95%的情况下，我们研究得出的结论是可靠的。同理，设为0.01时，在99%的情况下，结论是可靠的。所以设为0.01比0.05更为精确，假设条件更为严格。当然一般情况下，设为0.05就可以被大家接受了。

lliwei5480 · 2011-02-17

回复: 请问这个log-likelihood结果该怎么解读

请问各位
我还是没有明白
1，怎样确定loglikelihood LL值是否有意义的确定方法，这个数值需要用chi-square吗，如果是这样，那么是否还要把两个语料库里的数据标准化呢，如果不需要，这个判定的数值是如何得来的，谢谢各位了，在下实在是搞不明白，请不吝赐教啊
2，如果需要查表，这个数据是怎么得来的，下表是我统计的一些数据，不知该如何确定两个语料库的差异，谢谢大家帮忙！
Pos CLEC BNC Log-likelihood
As prep 1100 155419 - 215.29
As conj 1975 263069 - 278.31
As adv. 587 54868 - 0.00

volfer · 2011-02-17

回复: 请问这个log-likelihood结果该怎么解读

作者 lliwei5480:
请问各位
我还是没有明白
1，怎样确定loglikelihood LL值是否有意义的确定方法，这个数值需要用chi-square吗，如果是这样，那么是否还要把两个语料库里的数据标准化呢，如果不需要，这个判定的数值是如何得来的，谢谢各位了，在下实在是搞不明白，请不吝赐教啊
2，如果需要查表，这个数据是怎么得来的，下表是我统计的一些数据，不知该如何确定两个语料库的差异，谢谢大家帮忙！
Pos CLEC BNC Log-likelihood
As prep 1100 155419 - 215.29
As conj 1975 263069 - 278.31
As adv. 587 54868 - 0.00

前两项p<0.001，说明两者的频数存在显著差异。
第三项p=0.986>0.05，不存在显著差异。
-号表示三者都是underuse。
推荐你使用许家金老师开发的对数似然率计算器，不用查表，可以直接计算出p值，非常直观。
http://www.corpus4u.org/showthread.php?t=4659&highlight=%E5%AF%B9%E6%95%B0%E4%BC%BC%E7%84%B6%E7%8E%87

lliwei5480 · 2011-02-17

回复: 请问这个log-likelihood结果该怎么解读

谢谢volfer
我看到计算器了，但后面sig一项应该怎么解读呢，我出了不少的数据，不能一一列出麻烦您，但涉及到对结果的诠释，我打算要从差异显著性上排序，差异大一点的详述，我看到变量有三个：

+ — 是 under/over use 吗
*的多少代表显著性吗？
后面的数值又代表什么呢？p值—似然率吗？
谢谢您的支持，谢谢！
元宵节快乐！

volfer · 2011-02-17

回复: 请问这个log-likelihood结果该怎么解读

作者 lliwei5480:
谢谢volfer
我看到计算器了，但后面sig一项应该怎么解读呢，我出了不少的数据，不能一一列出麻烦您，但涉及到对结果的诠释，我打算要从差异显著性上排序，差异大一点的详述，我看到变量有三个：

+ — 是 under/over use 吗
*的多少代表显著性吗？
后面的数值又代表什么呢？p值—似然率吗？
谢谢您的支持，谢谢！
元宵节快乐！

±号指相对于corpus2来说，corpus1中的频数是over-还是underuse。这点肖老师已经说明白了。
sig.是significance level的缩写，就是p值。
一般用*表示显著水平。一个*表示在0.05的水平上呈现显著差异，两个在0.01的水平上呈现显著差异，以此类推。

lliwei5480 · 2011-02-17

回复: 请问这个log-likelihood结果该怎么解读

million of thanks！

请问这个log-likelihood结果该怎么解读

majorlv511

高级会员

附件

majorlv511

高级会员

xiaoz

永远的超级管理员

majorlv511

高级会员

xiaoz

永远的超级管理员

armstrong

高级会员

majorlv511

高级会员

xiaoz

永远的超级管理员

armstrong

高级会员

xujiajin

管理员

freddyluo

volfer

Moderator

lliwei5480

volfer

Moderator

lliwei5480

volfer

Moderator

lliwei5480