New Corpora: The third generation Brown corpus (CROWN2021) and its siblings

xujiajin

管理员
Staff member
Dear colleagues,

We are proud to announce the release of a new Brown type of American English corpus, i.e. CROWN2021, and six comparable corpora of Catalan, Danish, German, Farsi/Persian, Finnish, Italian, and dozens of similar corpora to come in the next few months.

CROWN2021 is a balanced Brown family American English corpus of one million words containing texts published in 2021. It was developed under the leadership of Prof. Jiajin Xu and the texts were collected by Mingchen Sun and 12 other graduate students at Beijing Foreign Studies University (BFSU). CROWN2021 serves as an updated language resource of present-day American written English, and a reference corpus for contrastive studies involving diachronic variation (with Brown, Frown, Crown), regional variation (with LOB, FLOB, CLOB) and cross-linguistic comparison (with LCMC, ToRCH family corpora, GLOBE family corpora).

Users can have access to the online version of CROWN2021 and other BFSU-made Brown family corpora at BFSU CQPweb Corpus Portal (http://114.251.154.212/cqp/). Both user ID and passcode are "test".

KEY INFORMATION

Project leader: Jiajin Xu of the National Research Centre for Foreign Language Education (NRCFLE), BFSU
Text collectors: Mingchen Sun (359 texts), Yagang Chen (47 texts), Shujuan Deng (21 texts), Tingyan Zhangchen (19 texts), Meijia Hao (15 texts), Xingke Lv (13 texts), Jiaxi Shen (5 texts), Yuanyuan Lin (4 texts), Junyu Mao (4 texts), Xinzhi Yang (4 texts), Zinuo Zuo (4 texts), Xinkai Deng (3 texts), Ruotong Zha (2 texts)
Time of compilation: April 2022 - October 2022
Size: Approximately one million words
Language: Contemporary American English
Number of texts/samples: 500 samples of 2000+ words each (Short texts are pieced together to form one 2000-word text, but saved separately and marked with A, B, C etc. in the filenames.)
Sampling strategy: The Brown Corpus model (see: http://korpus.uib.no/icame/manuals/BROWN/INDEX.HTM)
Period: The texts were published in 2021.
Released in: November 2022
POS TagSet: The BNC Basic (C5) Tagset
POS Tagger: TreeTagger
Lemmatiser: TreeTagger
Sentence Segmenter: spaCy

How to cite:
Mingchen Sun, Jiajin Xu et al. 2022. The CROWN2021 Corpus. National Research Centre for Foreign Language Education, Beijing Foreign Studies University

Related work:
Xu, Jiajin & Maocheng Liang. 2013. A tale of two C's: Comparing English varieties with Crown and CLOB (The 2009 Brown family corpora). ICAME Journal 37: 175-183.

Jiajin Xu
Professor
Beijing Foreign Studies University
 
Back
顶部