The American National Corpus (ANC) will be a carefully designed corpus of 100 million
words of American written and spoken language that generally follows the framework of the
British National Corpus. The ANC project will provide both a standard format for text encoding
and a format for different types of corpus annotation (e.g., parts of speech, rhetorical
features, etc.), as well as different versions of the same type of annotation (e.g. multiple part
of speech taggings). As the only widely available large corpus of spoken and written American
English containing a variety of registers, the ANC will represent a synchronic slice of
American English across many registers. The First Release of the ANC, described in this article,
is a previewof the corpus and a chance for researchers to contribute feedback on format
and related issues, while allowing them access to data rather thanwaiting until the entire corpus
is completed.
http://forum.corpus4u.org/upload/forum/2005053103402860.pdf
words of American written and spoken language that generally follows the framework of the
British National Corpus. The ANC project will provide both a standard format for text encoding
and a format for different types of corpus annotation (e.g., parts of speech, rhetorical
features, etc.), as well as different versions of the same type of annotation (e.g. multiple part
of speech taggings). As the only widely available large corpus of spoken and written American
English containing a variety of registers, the ANC will represent a synchronic slice of
American English across many registers. The First Release of the ANC, described in this article,
is a previewof the corpus and a chance for researchers to contribute feedback on format
and related issues, while allowing them access to data rather thanwaiting until the entire corpus
is completed.
http://forum.corpus4u.org/upload/forum/2005053103402860.pdf
[本贴已被 作者 于 2005年05月31日 03时40分36秒 编辑过]