Some free online Chinese corpora

#22
新手上路求助各位高手:两个平均值的差异显著性检验用哪种方法?

两个样本的平均值已知,想知道他们这两个数值之间是否存在显著性差异用哪种方法呢?是单一样本t检验吗?请求指教!
 
#24
回复: Some free online Chinese corpora

I have checked all the following websites, but some are not available in my computer. I don't no why. I hope if you know what's worry please tell me.


Thanks



Peking University Modern Chinese Corpus


http://ccl.pku.edu.cn/ccl_corpus/xiandaihanyu/ (not available)


Peking University Ancient Chinese Corpus


http://ccl.pku.edu.cn/ccl_corpus/jse....jsp?dir=gudai



the two above may be available here


http://ccl.pku.edu.cn:8080/ccl_corpus/index.jsp


[FONT=宋体]北京大学汉语语言学研究中心
(Center for Chinese Linguistics PKU )CCL[FONT=宋体]语料库[/FONT]


[FONT=宋体]分为现代汉语和古代汉语[/FONT][FONT=宋体]规模:[/FONT]838,803,906[FONT=宋体]字节[/FONT]




Xiamen University corpora (registration required but free)


http://xmuoec.com/gb/hanyu/hanyu/data/corpus/index.htm [FONT=宋体]([/FONT]not available[FONT=宋体])[/FONT]



Beijing Language and Culture University corpus


http://202.112.195.8:8089/ccir_login?input=* [FONT=宋体]([/FONT]not available[FONT=宋体])[/FONT]









PFR People's Daily corpus (01/1998)


http://bowland-files.lancs.ac.uk/cor...s/pdcorpus.htm (not available)



PH corpus (Xinhua newswire data 1990-1991)


http://bowland-files.lancs.ac.uk/cor...s/phcorpus.htm (not available)



People's Daily 2000 corpus


http://bowland-files.lancs.ac.uk/cor...00/default.htm (not available )


some related information here


http://www.lancs.ac.uk/fass/projects/corpus/pdc2000/default.htm




Sinica corpus of early Chinese [FONT=宋体]近代汉语标记语料库[/FONT][FONT=宋体]中央研究所[/FONT]


http://www.sinica.edu.tw/Early_Mandarin/



Sheffield Corpus of Chinese for Diachronic Linguistic Study


http://www.shef.ac.uk/scc/



--Academia Sinica Balanced Corpus of Modern Chinese [FONT=宋体]中央研究院现代汉语平衡语料库[/FONT]


http://www.sinica.edu.tw/SinicaCorpus/

[/FONT]
 
#25
回复: Some free online Chinese corpora

I am sorry that I found most questions about availability of those websites have been settleed in the above replies
the C-ers please refer to all the messages.
 

xujiajin

管理员
Staff member
#30
回复: Some free online Chinese corpora

Torch Corpus: Texts of Recent Chinese (2013 summer edition)

  Torch2009语料库是由全国64所以上高校的115位老师和硕士生及博士生参与语料收集和校对,共同创建的现代汉语语料库。该语料库大小为1,087,619词,1,703,635字(平均一个词大约对应1.57个汉字)。该语料库中所收文本绝大部分为2009年出版。语料库的名称Torch为Texts Of Recent CHinese的缩略词。我们希望这个语料库将来能以类似的模式,每隔几年出一个版本,从而可以考察现代汉语的动态发展。因此,我们希望这个语料库成为一个系列,此次的Torch2009是该系列的第一个语料库。Torch一次的含义也体现出我们希望这个语料库系列可以“薪火相传”,不断延续下去的含义。
  该语料库与此前创建的Crown和CLOB语料库(参看:http://icame.uib.no/ij37/Pages_175-184.pdf )构成英汉可比语料库(comparable corpora),可用于英汉对比研究。
  Crown、CLOB及Torch2009语料库皆可通过BFSU CQPweb语料库平台(http://124.193.83.252/cqp/ )在线检索。
  该语料库的一个突出特点是“共建共享”。它由上百位老师和同学共同完成,并与语料库的创建者、广大语料库研究者免费共享的一个新型语料库。

Download Torch 2009 at http://ishare.iask.sina.com.cn/f/37898721.html

Download Crown and CLOB at http://ishare.iask.sina.com.cn/f/35671465.html
 
顶部