The PKU 863 Chinese-English Parallel Corpus

xiaoz · 2005-11-14

The PKU 863 Chinese-English Parallel Corpus

The PKU Chinese-English Parallel Corpus is developed on the 863 Project by the Institute of Computational Linguistics of Peking University. The corpus consists of over 200,000 aligned sentence pairs taken from quality bilingual texts (3,066,435 English words and 2,874,462 Chinese words), covering a range of genres and domains including, for example, government white papers, official documents, news texts, essays, speech scripts, literary texts, academic prose, as well as literature of politics, law, tourism, food industry, economics and business. The majority of the texts are taken from established bilingual websites while some are digitalised using OCR scanning.

The PKU 863 corpus is now transferred into Unicode and tagged with part-of-speech information, using CLAWS for English (C7 tagset) and ICTCLAS for Chinese. It can be accessed via the online parallel concordancer (for internal use only).

http://www.ling.lancs.ac.uk/corplang/863parallel/

tiger · 2005-11-14

for internal use only again...

[本贴已被作者于 2005年11月14日 22时49分20秒编辑过]

laohong · 2005-11-14

yeah, any possibility for Corpus4U members to try it out?

刘语料 · 2005-11-14

It's a pity,The PKU 863 Chinese-English Parallel Corpus is for internal use only.

xiaoz · 2005-11-14

Sorry I cannot make the full release available for obvious reasons, but here is a trial version which shows you the initial 100 concordances and gives you the total counts.

http://www.ling.lancs.ac.uk/corplang/cgi-bin/863trial.pl

laohong · 2005-11-14

Really appreciated. The trial version is good enough for most of us.

清风出袖 · 2005-11-14

thanks a lot for such a new stuff!

The PKU 863 Chinese-English Parallel Corpus

xiaoz

永远的超级管理员

tiger

高级会员

laohong

管理员

刘语料

封禁用户

xiaoz

永远的超级管理员

laohong

管理员

清风出袖

高级会员