Evolution and present situation of corpus research in China
Zhiwei Feng
Institute of Applied Linguistics, China
International Journal of Corpus Linguistics 11:2 (2006), 73–207.
issn 1384–6655 / e-issn 1569–9811 ? John Benjamins Publishing Company.
Abstract:
In this paper, the author introduces in detail the development and present situation of corpus linguistics in China: earlier corpora, large-scale & authentic text corpora, national corpora, speech corpora, bilingual corpora and corpora of minority languages in China. The various processing techniques for corpora are also introduced: automatic word segmentation of Chinese text, automatic PoS tagging, automatic tagging of phrase structure and automatic alignment of bilingual corpora. This paper is a bird’s-eye view of corpus linguistics of China. Finally, the author discusses several problems in present corpus research: standardization of corpus specifications, commonly sharing of language resources, knowledge properties, etc.
Keywords:
corpus; large-scale & authentic text; speech corpora; bilingual corpora; corpora of minority languages in China; automatic word
segmentation; automatic PoS tagging; automatic tagging of phrase structure; automatic alignment of bilingual corpora.
Zhiwei Feng
Institute of Applied Linguistics, China
International Journal of Corpus Linguistics 11:2 (2006), 73–207.
issn 1384–6655 / e-issn 1569–9811 ? John Benjamins Publishing Company.
Abstract:
In this paper, the author introduces in detail the development and present situation of corpus linguistics in China: earlier corpora, large-scale & authentic text corpora, national corpora, speech corpora, bilingual corpora and corpora of minority languages in China. The various processing techniques for corpora are also introduced: automatic word segmentation of Chinese text, automatic PoS tagging, automatic tagging of phrase structure and automatic alignment of bilingual corpora. This paper is a bird’s-eye view of corpus linguistics of China. Finally, the author discusses several problems in present corpus research: standardization of corpus specifications, commonly sharing of language resources, knowledge properties, etc.
Keywords:
corpus; large-scale & authentic text; speech corpora; bilingual corpora; corpora of minority languages in China; automatic word
segmentation; automatic PoS tagging; automatic tagging of phrase structure; automatic alignment of bilingual corpora.
Last edited: