什么是“平衡语料库”?

今天看到一个名词叫“平衡语料库”,起初还以为是“平行语料库”(parallel corpus),但后来看到“平衡语料库”是 balanced corpus,才知道不是一回事。那么,“平衡语料库”有什么特点呢?
谢谢。
 
回复: 什么是“平衡语料库”?

Definition given by A Glossary of Corpus Linguistics

balanced corpus A corpus that contains texts from a wide range of different language genres and text domains, so that, for example, it may include both spoken and written, and public and private texts. Balanced corpora are sometimes referred to as reference, general or core corpora.

P. 18
 
回复: 什么是“平衡语料库”?

The balanced corpus of contemporary composed Japanese (BCCWJ) is Japan's initial 100 million words adjusted corpus. It comprises of three subcorpora (production subcorpus, library subcorpus, and unique reason subcorpus) and blankets an extensive variety of content registers incorporating books when all is said in done, magazines, daily papers, legislative white papers, top rated books, a web release board, a website, school course readings, minutes of the national eating regimen, exposure bulletins of neighborhood governments, laws, and verse verses.
 
回复: 什么是“平衡语料库”?

BCCWJ is a balanced corpus of one hundred million words of contemporary written Japanese. BCCWJ is one of the components of KOTONOHA. It is probably the most important of all the KOTONOHA component corpora, because it is the written register of the contemporary Japanese that is the greatest focus of interest for language researchers as well as the general public. It is also the contemporary written language that has the greatest applicability to such applications as dictionaries and teaching materials.
 
回复: 什么是“平衡语料库”?

Compilation of a hundred million word balanced corpus named Balanced Corpus of Contemporary Written Japanese (BCCWJ) is underway at the National Institute for Japanese Language. This corpus is a component of the KOTONOHA super-corpus that covers the full range of modern Japanese from the middle of the 19th century up to the present.
 
Back
顶部