The deGLOBE Corpus (V1.0)
INTRODUCTION
The deGLOBE Corpus (Version 1.0) is a balanced collection of contemporary German written texts, totaling one million words.The text samples in the corpus were gathered and cleaned up by Guying Zhou and four students of German, namely, Zhe Shu, Yingming Song, Yu Sun and Liang Xu, at Beijing Foreign Studies University (BFSU), China.The online version of the deGLOBE Corpus is available at BFSU CQPweb Corpus Portal (http://114.251.154.212/cqp/). Both user ID and passcode are "test".
KEY INFORMATION
Project leader: Guying Zhou of the National Research Centre for Foreign Language Education (NRCFLE), BFSU
Text collectors: Yingming Song of the National Research Centre for Foreign Language Education (NRCFLE), BFSU; Zhe Shu , Yu Sun and Liang Xu of the School of German Studies, BFSU
Time of compilation: September 2021 – September 2022
Size: Approximately one million words
Language: Contemporary German
Number of texts/samples: 500 samples of 2000+ words each (Short texts are pieced together to form one 2000-word text, but saved separately and marked with A, B, C etc. in the filenames.)
Period: The texts were published between 2010 and 2022.
Released in: September 2022
BACKGROUND
On 29 December 2021, Jiajin Xu launched the GLOBE (Global Languages Out of BFSU Expertise) Corpus project, an initiative which aims to collect present-day written texts in all 101 languages that are taught at BFSU. The sampling frame of the Brown Corpus was followed to make the multilingual GLOBE corpus family comparable to the Brown family corpora. The immediate application of the GLOBE is meant to be corpus-based dictionary compilation. The first batch of the corpora covers about 30 languages.
Table 1. Text categories in the GLOBE Corpus.
(Adapted from https://varieng.helsinki.fi/CoRD/corpora/BROWN/basic.html)
The deGLOBE Corpus is a sub-project of the BFSU-funded GLOBE Corpus projects (Ref. 2022SYLZD015 and 2022SYLPY004), whose principal investigator is Prof. Jiajin Xu at the National Research Centre for Foreign Language Education, BFSU.
Please cite: Zhou, Guying et al. 2022. The construction of the deGLOBE Corpus. Yuliaoku Yuyanxue (Corpus Linguistics) 9(2).
INTRODUCTION
The deGLOBE Corpus (Version 1.0) is a balanced collection of contemporary German written texts, totaling one million words.The text samples in the corpus were gathered and cleaned up by Guying Zhou and four students of German, namely, Zhe Shu, Yingming Song, Yu Sun and Liang Xu, at Beijing Foreign Studies University (BFSU), China.The online version of the deGLOBE Corpus is available at BFSU CQPweb Corpus Portal (http://114.251.154.212/cqp/). Both user ID and passcode are "test".
KEY INFORMATION
Project leader: Guying Zhou of the National Research Centre for Foreign Language Education (NRCFLE), BFSU
Text collectors: Yingming Song of the National Research Centre for Foreign Language Education (NRCFLE), BFSU; Zhe Shu , Yu Sun and Liang Xu of the School of German Studies, BFSU
Time of compilation: September 2021 – September 2022
Size: Approximately one million words
Language: Contemporary German
Number of texts/samples: 500 samples of 2000+ words each (Short texts are pieced together to form one 2000-word text, but saved separately and marked with A, B, C etc. in the filenames.)
Period: The texts were published between 2010 and 2022.
Released in: September 2022
BACKGROUND
On 29 December 2021, Jiajin Xu launched the GLOBE (Global Languages Out of BFSU Expertise) Corpus project, an initiative which aims to collect present-day written texts in all 101 languages that are taught at BFSU. The sampling frame of the Brown Corpus was followed to make the multilingual GLOBE corpus family comparable to the Brown family corpora. The immediate application of the GLOBE is meant to be corpus-based dictionary compilation. The first batch of the corpora covers about 30 languages.
Table 1. Text categories in the GLOBE Corpus.
(Adapted from https://varieng.helsinki.fi/CoRD/corpora/BROWN/basic.html)
Genre group | Category | Content of category | #. of texts | |
I. Informative prose (374) | Press (88) | A | Reportage | 44 |
B | Editorial | 27 | ||
C | Review | 17 | ||
General prose (206) | D | Religion | 17 | |
E | Skills, trades and hobbies | 36 | ||
F | Popular lore | 48 | ||
G | Belles lettres, biographies, essays | 75 | ||
H | Miscellaneous | 30 | ||
Learned (80) | J | Science | 80 | |
II. Imaginative prose (126) | Fiction (126) | K | General fiction | 29 |
L | Mystery and detective fiction | 24 | ||
M | Science fiction | 6 | ||
N | Adventure and Western | 29 | ||
P | Romance and love story | 29 | ||
R | Humour | 9 | ||
Total | 500 |
The deGLOBE Corpus is a sub-project of the BFSU-funded GLOBE Corpus projects (Ref. 2022SYLZD015 and 2022SYLPY004), whose principal investigator is Prof. Jiajin Xu at the National Research Centre for Foreign Language Education, BFSU.
Please cite: Zhou, Guying et al. 2022. The construction of the deGLOBE Corpus. Yuliaoku Yuyanxue (Corpus Linguistics) 9(2).