Free Chinese & E-C parallel corpora汉语/英-汉语料

xiaoz

永远的超级管理员
Staff member
Q: Are there free Chinese corpus resources?
A: Yes. There are a number of freely avail Chinese and E-C parallel corpora publicly available. Most of them can be concordanced online. Only a few can be downloaded to your own machine.

Academia Sinica Balanced Corpus of Modern Chinese
http://www.sinica.edu.tw/SinicaCorpus/

Peking University Modern Chinese Corpus
http://ccl.pku.edu.cn/ccl_corpus/xiandaihanyu/

Xiamen University Chinese corpus (registration required but free)
http://xmuoec.com/gb/hanyu/hanyu/data/corpus/index.htm

Xiamen University E-C parallel corpus (registration required but free)
http://xmuoec.com/gb/hanyu/hanyu/data/ec-corpus/logon1.asp

Beijing Language and Culture University corpus
http://202.112.195.8:8089/ccir_login?input=*

Lancaster Corpus of Mandarin Chinese
http://bowland-files.lancs.ac.uk/corplang/cgi-bin/conc.pl

Leeds Chinese corpus
http://corpus.leeds.ac.uk/query-zh.html

PFR People's Daily corpus (01/1998)
http://bowland-files.lancs.ac.uk/corplang/pdcorpus/pdcorpus.htm

PH corpus (Xinhua newswire data 1990-1991)
http://bowland-files.lancs.ac.uk/corplang/phcorpus/phcorpus.htm

People's Daily 2000 corpus
http://bowland-files.lancs.ac.uk/corplang/pdc2000/default.htm

Peking University Ancient Chinese Corpus
http://ccl.pku.edu.cn/ccl_corpus/jsearch/index.jsp?dir=gudai

Peking University E-C parallel corpus (not working online?)
http://www.icl.pku.edu.cn/icl_groups/parallel/concordance.asp

Sinica corpus of early Chinese
http://www.sinica.edu.tw/Early_Mandarin/

Sheffield Corpus of Chinese for Diachronic Linguistic Study
http://www.shef.ac.uk/scc/
 
The fifth link in the above list does not exist now.
Beijing Language and Culture University corpus
http://202.112.195.8:8089/ccir_login?input=*
 
北京大学计算语言学研究所 Peking University E-C parallel corpus
http://www.icl.pku.edu.cn/icl_groups/parallel/concordance.asp
在线检索的确不行。

报错:处理 URL 时服务器出错。请与系统管理员联系。
 
在http://www.icl.pku.edu.cn/icl_groups/parallel/download.htm上有
江泽民在中国共产党第十六次全国代表大会上的报告
的中英文xml语料可给下载。
 
Are there any free Chinese & E-C parallel corpora?

Their codes have been on strike for a couple of months and Bai Xiaojing has promised to let me know when they are fixed.


以下是引用 xujiajin2005-8-6 23:02:51 的发言:
北京大学计算语言学研究所 Peking University E-C parallel corpus
http://www.icl.pku.edu.cn/icl_groups/parallel/concordance.asp
在线检索的确不行。

报错:处理 URL 时服务器出错。请与系统管理员联系。
 
北京语言文化大学的语料库词语检索系统。
http://202.112.195.8:8089/ccir_login?input=*

The asterisk should be included as part of the complete URL.

It is better to copy the whole URL, asterisk inclusive, to the browser.

This is a pretty cool Chinese corpus.

See also
http://www.corpus4u.com/forum_view.asp?forum_id=10&view_id=972
 
http://www.gmw.cn/03rili/jbcx_gb.htm
光明日报 生活时报 中华读书报 文摘报 信息化周刊 《书摘》杂志 《博览群书》杂志
 
温州口语语料库Wenzhou Spoken Corpus
a new online searchable corpus of Wenzhou of about 150,000 words:

http://corpora.tapor.ualberta.ca/wenzhou/
 
Back
顶部