Some free online Chinese corpora

本文由 xiaoz2005-06-16 发表於 "汉语语料库" 讨论区

  1. 回复: Some free online Chinese corpora

    请问有中文短信语料库没?
     
  2. 新手上路求助各位高手:两个平均值的差异显著性检验用哪种方法?

    两个样本的平均值已知,想知道他们这两个数值之间是否存在显著性差异用哪种方法呢?是单一样本t检验吗?请求指教!
     
  3. 回复: Some free online Chinese corpora

    very useful!thanks!but why cannot open the peking university corpus?
     
  4. 回复: Some free online Chinese corpora

    I have checked all the following websites, but some are not available in my computer. I don't no why. I hope if you know what's worry please tell me.


    Thanks



    Peking University Modern Chinese Corpus


    http://ccl.pku.edu.cn/ccl_corpus/xiandaihanyu/ (not available)


    Peking University Ancient Chinese Corpus


    http://ccl.pku.edu.cn/ccl_corpus/jse....jsp?dir=gudai



    the two above may be available here


    http://ccl.pku.edu.cn:8080/ccl_corpus/index.jsp


    [FONT=宋体]北京大学汉语语言学研究中心
    (Center for Chinese Linguistics PKU )CCL[FONT=宋体]语料库[/FONT]


    [FONT=宋体]分为现代汉语和古代汉语[/FONT][FONT=宋体]规模:[/FONT]838,803,906[FONT=宋体]字节[/FONT]




    Xiamen University corpora (registration required but free)


    http://xmuoec.com/gb/hanyu/hanyu/data/corpus/index.htm [FONT=宋体]([/FONT]not available[FONT=宋体])[/FONT]



    Beijing Language and Culture University corpus


    http://202.112.195.8:8089/ccir_login?input=* [FONT=宋体]([/FONT]not available[FONT=宋体])[/FONT]









    PFR People's Daily corpus (01/1998)


    http://bowland-files.lancs.ac.uk/cor...s/pdcorpus.htm (not available)



    PH corpus (Xinhua newswire data 1990-1991)


    http://bowland-files.lancs.ac.uk/cor...s/phcorpus.htm (not available)



    People's Daily 2000 corpus


    http://bowland-files.lancs.ac.uk/cor...00/default.htm (not available )


    some related information here


    http://www.lancs.ac.uk/fass/projects/corpus/pdc2000/default.htm




    Sinica corpus of early Chinese [FONT=宋体]近代汉语标记语料库[/FONT][FONT=宋体]中央研究所[/FONT]


    http://www.sinica.edu.tw/Early_Mandarin/



    Sheffield Corpus of Chinese for Diachronic Linguistic Study


    http://www.shef.ac.uk/scc/



    --Academia Sinica Balanced Corpus of Modern Chinese [FONT=宋体]中央研究院现代汉语平衡语料库[/FONT]


    http://www.sinica.edu.tw/SinicaCorpus/

    [/FONT]
     
  5. 回复: Some free online Chinese corpora

    I am sorry that I found most questions about availability of those websites have been settleed in the above replies
    the C-ers please refer to all the messages.
     
  6. waynereed

    waynereed 初级会员

    回复: Some free online Chinese corpora

    CERNET users may also try the following url for Lu Wei's corpora:
    http://luwei.2288.org/
     
  7. 回复: Some free online Chinese corpora

    您的现在也打不开链接了:(
     
  8. xujiajin

    xujiajin 管理员 Staff Member

    回复: Some free online Chinese corpora

    Torch Corpus: Texts of Recent Chinese (2013 summer edition)

      Torch2009语料库是由全国64所以上高校的115位老师和硕士生及博士生参与语料收集和校对,共同创建的现代汉语语料库。该语料库大小为1,087,619词,1,703,635字(平均一个词大约对应1.57个汉字)。该语料库中所收文本绝大部分为2009年出版。语料库的名称Torch为Texts Of Recent CHinese的缩略词。我们希望这个语料库将来能以类似的模式,每隔几年出一个版本,从而可以考察现代汉语的动态发展。因此,我们希望这个语料库成为一个系列,此次的Torch2009是该系列的第一个语料库。Torch一次的含义也体现出我们希望这个语料库系列可以“薪火相传”,不断延续下去的含义。
      该语料库与此前创建的Crown和CLOB语料库(参看:http://icame.uib.no/ij37/Pages_175-184.pdf )构成英汉可比语料库(comparable corpora),可用于英汉对比研究。
      Crown、CLOB及Torch2009语料库皆可通过BFSU CQPweb语料库平台(http://124.193.83.252/cqp/ )在线检索。
      该语料库的一个突出特点是“共建共享”。它由上百位老师和同学共同完成,并与语料库的创建者、广大语料库研究者免费共享的一个新型语料库。

    Download Torch 2009 at http://ishare.iask.sina.com.cn/f/37898721.html

    Download Crown and CLOB at http://ishare.iask.sina.com.cn/f/35671465.html
     
  9. 回复: Some free online Chinese corpora

    最近需要这些语料库做训练样本,请问这些语料库怎么使用?
     
  10. 回复: Some free online Chinese corpora

    why i can't operate these webs?