

我原来以为自己明白这两个术语的意思。王克非老师在他的文章中多次用到平行语料库,后来在出版书(2004)的时候虽然成果是同一个,但是,名称却改为双语对应库了。今天读到xiaoz的文章(在线刊物《中国英语教育》http://www.sinoss.com/portal/webgate/CmdArticleShow?articleID=1148 ),里面用的对应库,平行库,可比库,我有点糊涂了。
回复: “对应库”和“平行库”

The terminologies in this area are indeed quite confusing, and they are further complicated when they are translated.

Parallel corpus: To me and many others, this is a corpus composed of source texts plus translations; but for some, this term refers to what I would call a comparable corpus and they use "translation corpus" to refer to my "parallel corpus". (My Babel English-Chinese parallel corpus).

Comparable corpus: composed of native language data collected deom different native languages using the same sampling criteria and techniques to ensure similar balance and coverage. (e.g. LCMC vs. FLOB)

Comparative corpus: Similar to a corpus corpus but used for comparing different varieties of the same language (e.g. ICE).

Translational corpus: comprises translated language (e.g. Translational English Corpus TEC hosted at Manchester University).

Some bi-directional parallel corpora have used the sampling criteria to collect source texts in both languages, plus their translations in the other languages (e.g. ENPC). Parallel corpora of this kind are a sort of "comparable-parallel corpora", but basically I consider them as comparable corpora.

Prof. Wang's corpus is of this nature. Before the corpus was fully completed and contained comparable source texts, he used "parallel corpus" in his paper. When the book was published, the source texts had been balanced for comparability (and because the book has a chapter based on truly comparable corpora), the term "bilingual comparable corpus" was used in the book title.
回复: “对应库”和“平行库”

I see what you mean now. So, “可比库” in your above mentioned paper referrs to the " bi-directional parallel corpora" sampled using the same criteria.
Thank you, Dr. Xiao, I really appreciate your reply which clarifies my confusion of those terms.

The terminologies in this area are indeed quite confusing, and they are further complicated when they are translated.

Parallel corpus: To me and many others, this is a corpus composed of source texts plus translations; but for some, this term refers to what I would call a comparable corpus and they use "translation corpus" to refer to my "parallel corpus". (My Babel English-Chinese parallel corpus).

Comparable corpus: composed of native language data collected deom different native languages using the same sampling criteria and techniques to ensure similar balance and coverage. (e.g. LCMC vs. FLOB)

Comparative corpus: Similar to a corpus corpus but used for comparing different varieties of the same language (e.g. ICE).

Translational corpus: comprises translated language (e.g. Translational English Corpus TEC hosted at Manchester University).

Some bi-directional parallel corpora have used the sampling criteria to collect source texts in both languages, plus their translations in the other languages (e.g. ENPC). Parallel corpora of this kind are a sort of "comparable-parallel corpora", but basically I consider them as comparable corpora.

Prof. Wang's corpus is of this nature. Before the corpus was fully completed and contained comparable source texts, he used "parallel corpus" in his paper. When the book was published, the source texts had been balanced for comparability (and because the book has a chapter based on truly comparable corpora), the term "bilingual comparable corpus" was used in the book title.
回复: “对应库”和“平行库”

还是不太明白,comparable到底是可比还是类比?在《双语对应语料库研制与开发》(p7)中是类比,而在《翻译研究词典》(P32)(谭载喜 主译 王克非 校)中是可比,廖七一的“语料库与翻译研究”(外语教学与研究,2000,9)中也是可比,请指教!
回复: “对应库”和“平行库”

还是不太明白,comparable到底是可比还是类比?在《双语对应语料库研制与开发》(p7)中是类比,而在《翻译研究词典》(P32)(谭载喜 主译 王克非 校)中是可比,廖七一的“语料库与翻译研究”(外语教学与研究,2000,9)中也是可比,请指教!

回复: “对应库”和“平行库”

Generally, there are three types of corpora involving more than one language:
? Type A: Source texts plus translations, e.g. Canadian Hansard (cf. Brown, Lai & Mercer, 1991), CRATER (cf. McEnery & Oakes, 1995).
? Type B: Monolingual subcorpora designed using the same sampling frame, e.g. The Aarhus corpus of contract law (cf. Faber & Lauridsen, 1991).
? Type C: A combination of A and B, e.g. the ENPC (cf. Johansson & Hofland, 1994), the EMIILE.3
其中,Type B: (Monolingual subcorpora designed using the same sampling frame)指的是相同抽样框架的单语子库吗?请指点。谢谢