A List of Corpora of Learner English

laohong

管理员
Staff member
A List of Corpora of Learner English

CLC (Cambridge Learner Corpus)
http://www.cambridge.org/elt/corpus/learner_corpus.htm

CLEC (Chinese Learner English Corpus)
http://langbank.engl.polyu.edu.hk/corpus/clec.html

HKUST (Hong Kong University of Science and Technology)
- currently around 30 million words.
- texts written by university students (mostly Cantonese speakers studying Engineering, Science and Business)
- mostly untimed assignments from EFL courses. (400-100 words mostly) & about a million words of school leaving exams
- around 200,000 words are POS-tagged with CLAWS
- the error taxonomy and tags used are loosely described in a revised version of John Milton's PhD thesis - see http://repository.ust.hk/dspace/handle/1783.1/1055
- the corpus was used in developing materials and syllabi - including development of writing tools such as an interactive grammar guide, a learners' online concordancer and a number of 'blended' (online+classroom) courses. These were also informed by comparing the learners corpus to the writing of students who took the Cambridge General Studies examination.

ICLE (International Corpus of Learner English)
http://cecl.fltr.ucl.ac.be/research.html

JEFLL (Japanese EFL Learner)
http://leo.meikai.ac.jp/~tono/

JPU (Janus Pannonius University Corpus)
- corpus blog, http://joeandco.blogspot.com. (221 scrips are there and free to use. There is thematic search available via the labels.)
- Jozsef Horvath's PhD dissertation http://www.geocities.com/writing_site/thesis.

LLC (Longman Learners' Corpus)
http://www.longman-elt.com/dictionaries/corpus/learners.html

MELD (Montclair Electronic Language Database)
http://www.chss.montclair.edu/linguistics/MELD/

Polish Learner English Corpus
http://pelcra.ia.uni.lodz.pl/corpora_en.php

SILS (School of International Liberal Studies at Waseda University)
http://www.f.waseda.jp/vicky/learner/index.html

TeleNex Student Corpus
http://www.telenex.hku.hk/telec/smain/sintro/intro.htm

USE (Uppsala Student English Project)
http://www.engelska.uu.se/use.html

COMET Project
- corpus of Learner English and Learner German (as well as Italian and Spanish), collected at the University of S?o Paulo, Brasil
- http://www.fflch.usp.br/dlm/comet/comaprend.html
- article on the corpus: ttp://www.fflch.usp.br/dlm/comet/artigos/A%20multilingual%20learner%20corpus%20in%20Brazil.pdf


ISLE
Approx. 20 minutes of speech (per speaker) from 23 German and 23 Italian intermediate learners of English. Each speaker recorded sentences from several blocks of differing types (reading simple sentences, using minimal pairs, giving answers to multiple choice questions). The prompts were of varying perplexities. For more details, see:

Menzel, W; Atwell, E; Bonaventura, P; Herron, D; Howarth, P; Morton, R; Souter, C. The ISLE Corpus of non-native spoken English. in Proc LREC2000 vol. 2, pp. 957-964, European Language Resources Association. 2000. http://www.comp.leeds.ac.uk/eric/menzel00lrec.pdf

Atwell, Eric; Howarth, Peter; Souter, Clive. The ISLE corpus: Italian and German spoken learner's English. ICAME Journal, vol. 27, pp. 5-18. 2003. http://www.comp.leeds.ac.uk/eric/atwell03icamej.pdf
 
Back
顶部