HH Clark的统计算法

miaohy · 2012-04-11

有人知道他是怎么计算同一样本中不同变量百分数的吗？如P82的。谢谢

qhdjason · 2012-04-12

回复: HH Clark的统计算法

这个确实比较诡异啊。百分数差异用方差分析？
难道作者这样列的数据:

uh uhm
0 1
1 0
0 0
1 1
1 0

0表示没有delay；1表示有delay。这样算的均值倒也是百分比

(0+1+0+1+1) / 5 = 0.6 = 60%

从他列的自由度来看，（1:342）很有可能。

qhdjason · 2012-04-12

回复: HH Clark的统计算法

单因素方差分析的自由度：

分子自由度： k-1；k为自变量的类别数量；文中为1；因为自变量填充语只有两类：uh和um;

分母自由度：n-k；n为样本量；文中为342；说明作者在LL corpus中共发现了344例uh和uhm。

哪位熟悉LL格式的C友可以验证下。它那格式太复杂，我实在看不明白啊。

qhdjason · 2012-04-12

回复: HH Clark的统计算法

作者还提到uh[FONT=宋体]和[/FONT]um[FONT=宋体]后的停顿的时间是否有显著差异；这个还比较好解释。[/FONT]

[FONT=宋体]自变量是[/FONT] [FONT=宋体]填充语，是分类变量[/FONT] [FONT=宋体]它有两个[/FONT] level[FONT=宋体]：[/FONT] Level 1: uh[FONT=宋体]；[/FONT]Level 2[FONT=宋体]：[/FONT]um
[FONT=宋体]因变量是[/FONT] [FONT=宋体]停顿时间[/FONT] [FONT=宋体]是连续性变量[/FONT]

[FONT=宋体]当自变量是分类变量；因变量是连续性变量时使用方差分析。当只有一个自变量时，用单因素方差分析。注意对比的是因变量的均值差异（如[/FONT]83[FONT=宋体]页图[/FONT]2[FONT=宋体]所示）[/FONT]

[FONT=宋体]你可以用统计软件[/FONT]R[FONT=宋体]做一次实验。你把附件中的数据放到我的文档中，打开[/FONT]R[FONT=宋体]，输入下面的命令：[/FONT]
data <- read.table("03-1_uh(m).txt",sep="\t",header=T)
attach(data)
oneway.test(LENGTH ~ FILLER)

[FONT=宋体]得到下面的结果：[/FONT]
F = 0.1245, num df = 2.000, denom df = 637.164, p-value = 0.883

[FONT=宋体]论文中要汇报：[/FONT]
[FONT=宋体]（[/FONT]1[FONT=宋体]）[/FONT]F[FONT=宋体]值[/FONT] : 0.1245
(2) [FONT=宋体]分子和分母的自由度：[/FONT]2;637 ([FONT=宋体]我的数据中填充语有三个[/FONT]level[FONT=宋体]，所以分子自由度为[/FONT] 2[FONT=宋体]，你给的论文中有两个[/FONT]level[FONT=宋体]，因此是[/FONT]1)
(3) [FONT=宋体]显著水平：[/FONT]0.883

[FONT=宋体]数据来自[/FONT]Gries[FONT=宋体]的著作：[/FONT]Statistics for Linguistics with R

qhdjason · 2012-04-12

回复: HH Clark的统计算法

心理语言学的研究者好像特别喜欢用方差分析，各种方差分析。以至于他们连百分比差异都用方差分析，也挺有意思的。

有的心理语言学论文通篇都是F值，看来练好一种武功也能行走江湖了。

qhdjason · 2012-04-12

回复: HH Clark的统计算法

你给的文章数据呈现格式好像有些问题，这个清楚些：

http://www-psych.stanford.edu/~herb/2000s/Clark.FoxTree.02.pdf

miaohy · 2012-04-12

回复: HH Clark的统计算法

非常感谢!我看了许久也没弄明白他是怎么算的,真幸运,碰到大侠了,

统计软件R是什么软件？我孤陋寡闻啊，哈哈！

Statistics for Linguistics with R 可以分享下不？要恶补统计知识，

Haiyang Ai · 2012-04-13

回复: HH Clark的统计算法

http://cos.name 上面有很多关于 R 的讨论。

miaohy · 2012-04-13

回复: HH Clark的统计算法

谢谢分享！

miaohy · 2012-04-14

回复: HH Clark的统计算法

denom df是指分母自由度吗？自由度都是整数吧，但结果显示denom df为小数呢?
还有您给的数据样本数是1000，那分母自由度为什么是637呢？

qhdjason · 2012-04-14

回复: HH Clark的统计算法

http://en.wikipedia.org/wiki/Degrees_of_freedom_(statistics)

In this page, it says,

"http://en.wikipedia.org/wiki/Degrees_of_freedom_%28statistics%29

In some complicated settings, such as unbalanced split-plot designs, the sums-of-squares no longer have scaled chi-squared distributions. Comparison of sum-of-squares with degrees-of-freedom is no longer meaningful, and software may report certain fractional 'degrees of freedom' in these cases. Such numbers have no genuine degrees-of-freedom interpretation, but are simply providing an approximate chi-squared distribution for the corresponding sum-of-squares. The details of such approximations are beyond the scope of this page."

I think the mathematical foundation of many statistical methods are very complex and it needs a systematic course to get a grasp of them. I asked a teacher of mathematics in my school. He said statistics are based on calculus and linear algebra, so if you really want to know all the details you'd better read some very technical books otherwise just make do with the results ...

You can't find the complete mathematical interpretation in the "introductory" kind of books. That's why most of the books about "linguistics and statistics" won't answer your question.

qhdjason · 2012-04-14

回复: HH Clark的统计算法

In the R manual about oneway.test. It states that the algorithm is based on an article:

http://www.soph.uab.edu/Statgenetics/People/MBeasley/Courses/Welch1951.pdf

To understand all the formulas and notations is hard, and maybe unnecessary for us.

qhdjason · 2012-04-14

回复: HH Clark的统计算法

In the attached paper, the degree of freedom of T-test (pp. 129) is fractional. Maybe we need to spell out the software and algorithm used in our paper.

miaohy · 2012-04-17

回复: HH Clark的统计算法

OK. Got it.

HH Clark的统计算法

附件

附件

Administrator

附件