What is a corpus?


Staff member
The word "corpus", derived from the Latin word meaning "body", may be used to refer to any text in written or spoken form. However, in modern Linguistics this term is used to refer to large collections of texts which represent a sample of a particular variety or use of language(s) that are presented in machine readable form. Other definitions, broader or stricter, exist. See, for example, the definition in the book "Corpus Linguistics" by Tony McEnery and Andrew Wilson or read more about different kinds of corpora in the Systematic Dictionary of Corpus Linguistics.

Computer-readable corpora can consist of raw text only, i.e. plain text with no additional information. Many corpora have been provided with some kind of linguistic information, here called mark-up or annotation.
Corpus markup and annotation have sometimes been used interchangeably in the corpus literature, though I choose to maintain a distinction between the two.
目前国内由上海外语教育出版社和外语教学与研究出版社共引进了4本英文原版的语料库语言学的专著和论文集。它们是1)John Sinclair的Corpus, Concordance, Collocation;2)Graeme Kennedy的An Introduction to Corpus Linguistics;3)Douglas Biber, Susan Conrad, and Randi Reppen的Corpus Linguistics和4)Jenny Thomas and Mick Short所编的论文集Using Corpora for Language Research。其实还应算上Biber等人所编纂的巨著Longman Grammar of Spoken and Written English,但严格说来,它只是利用语料库生产出来的一个产品,算不得语料库语言学的理论作品。
1)成书较早(1991),主要是基于COBUILD项目的借助检索(concordancing)的办法来研究英语的搭配问题。1)内容较为局限,不是本文的重点。4)是论文集也不是本文讨论的重点。本文将重点讨论2)、3)和笔者手头刚刚拿到的剑桥大学出版社2002出版的由Charles Meyer编写的English Corpus Linguistics: An Introduction。
Corpus Linguistics at Work
Lexis in Contrast
Small Corpus
Corpora in Applied Linguistics
Tony McEnery. 1996. Corpus Linguistics.
Stubbs, Michael. 1996. Text and Corpus Analysis
International Journal of Corpus Linguistics
Register variation 是近来语料库语言学中的一个热点。
Corpus annotation
Learner corpus on Computer
would you please make a systematic recommendation to Chinese books on corpus linguistics as well?
)John Sinclair的Corpus, Concordance, Collocation;2)Graeme Kennedy的An Introduction to Corpus Linguistics;3)Douglas Biber, Susan Conrad, and Randi Reppen的Corpus Linguistics和4)Jenny Thomas and Mick Short所编的论文集Using Corpora for Language Research。 以上四本书,我在武汉光谷书城见到过,其他的书好象没有。武汉的朋友有书籍信息,一定要共享啊!
回复:What is a corpus?

以下是引用 hancunxin2005-7-1 17:18:02 的发言:
)John Sinclair的Corpus, Concordance, Collocation;2)Graeme Kennedy的An Introduction to Corpus Linguistics;3)Douglas Biber, Susan Conrad, and Randi Reppen的Corpus Linguistics和4)Jenny Thomas and Mick Short所编的论文集Using Corpora for Language Research。 以上四本书,我在武汉光谷书城见到过,其他的书好象没有。武汉的朋友有书籍信息,一定要共享啊!


何安平著 语料库语言学与英语教学 2004 外研社
语料库语言学导论 杨惠中主编 ; 卫乃兴等编著 专著 2002 上海外语教育出版社
语料库语言学 / 黄昌宁,李涓子著 专著 2002 好像是商务印书馆的。一本小册子很薄。该书更偏重技术层面的内容,因为作者是工科北京的计算语言学研究者。



[本贴已被 作者 于 2005年07月01日 21时22分42秒 编辑过]
回复:What is a corpus?

以下是引用 hancunxin2005-7-4 12:51:09 的发言:

既然是外语系的,就真的建议你不要读中文书了。其实一般introduction to corpus ling的书都不会太多设计技术内容的,比较好懂的。
回复: What is a corpus?

A linguistic Corpus is a wide situated of genuine cases of utilization of a dialect. Cases may be in content or sound structure.