A List of Corpora of Learner English
CLC (Cambridge Learner Corpus)
http://www.cambridge.org/elt/corpus/learner_corpus.htm
CLEC (Chinese Learner English Corpus)
http://langbank.engl.polyu.edu.hk/corpus/clec.html
HKUST (Hong Kong University of Science and Technology)
- currently around 30 million words.
- texts written by university students (mostly Cantonese speakers studying Engineering, Science and Business)
- mostly untimed assignments from EFL courses. (400-100 words mostly) & about a million words of school leaving exams
- around 200,000 words are POS-tagged with CLAWS
- the error taxonomy and tags used are loosely described in a revised version of John Milton's PhD thesis - see http://repository.ust.hk/dspace/handle/1783.1/1055
- the corpus was used in developing materials and syllabi - including development of writing tools such as an interactive grammar guide, a learners' online concordancer and a number of 'blended' (online+classroom) courses. These were also informed by comparing the learners corpus to the writing of students who took the Cambridge General Studies examination.
ICLE (International Corpus of Learner English)
http://cecl.fltr.ucl.ac.be/research.html
JEFLL (Japanese EFL Learner)
http://leo.meikai.ac.jp/~tono/
JPU (Janus Pannonius University Corpus)
- corpus blog, http://joeandco.blogspot.com. (221 scrips are there and free to use. There is thematic search available via the labels.)
- Jozsef Horvath's PhD dissertation http://www.geocities.com/writing_site/thesis.
LLC (Longman Learners' Corpus)
http://www.longman-elt.com/dictionaries/corpus/learners.html
MELD (Montclair Electronic Language Database)
http://www.chss.montclair.edu/linguistics/MELD/
Polish Learner English Corpus
http://pelcra.ia.uni.lodz.pl/corpora_en.php
SILS (School of International Liberal Studies at Waseda University)
http://www.f.waseda.jp/vicky/learner/index.html
TeleNex Student Corpus
http://www.telenex.hku.hk/telec/smain/sintro/intro.htm
USE (Uppsala Student English Project)
http://www.engelska.uu.se/use.html
COMET Project
- corpus of Learner English and Learner German (as well as Italian and Spanish), collected at the University of S?o Paulo, Brasil
- http://www.fflch.usp.br/dlm/comet/comaprend.html
- article on the corpus: ttp://www.fflch.usp.br/dlm/comet/artigos/A%20multilingual%20learner%20corpus%20in%20Brazil.pdf
ISLE
Approx. 20 minutes of speech (per speaker) from 23 German and 23 Italian intermediate learners of English. Each speaker recorded sentences from several blocks of differing types (reading simple sentences, using minimal pairs, giving answers to multiple choice questions). The prompts were of varying perplexities. For more details, see:
Menzel, W; Atwell, E; Bonaventura, P; Herron, D; Howarth, P; Morton, R; Souter, C. The ISLE Corpus of non-native spoken English. in Proc LREC2000 vol. 2, pp. 957-964, European Language Resources Association. 2000. http://www.comp.leeds.ac.uk/eric/menzel00lrec.pdf
Atwell, Eric; Howarth, Peter; Souter, Clive. The ISLE corpus: Italian and German spoken learner's English. ICAME Journal, vol. 27, pp. 5-18. 2003. http://www.comp.leeds.ac.uk/eric/atwell03icamej.pdf
CLC (Cambridge Learner Corpus)
http://www.cambridge.org/elt/corpus/learner_corpus.htm
CLEC (Chinese Learner English Corpus)
http://langbank.engl.polyu.edu.hk/corpus/clec.html
HKUST (Hong Kong University of Science and Technology)
- currently around 30 million words.
- texts written by university students (mostly Cantonese speakers studying Engineering, Science and Business)
- mostly untimed assignments from EFL courses. (400-100 words mostly) & about a million words of school leaving exams
- around 200,000 words are POS-tagged with CLAWS
- the error taxonomy and tags used are loosely described in a revised version of John Milton's PhD thesis - see http://repository.ust.hk/dspace/handle/1783.1/1055
- the corpus was used in developing materials and syllabi - including development of writing tools such as an interactive grammar guide, a learners' online concordancer and a number of 'blended' (online+classroom) courses. These were also informed by comparing the learners corpus to the writing of students who took the Cambridge General Studies examination.
ICLE (International Corpus of Learner English)
http://cecl.fltr.ucl.ac.be/research.html
JEFLL (Japanese EFL Learner)
http://leo.meikai.ac.jp/~tono/
JPU (Janus Pannonius University Corpus)
- corpus blog, http://joeandco.blogspot.com. (221 scrips are there and free to use. There is thematic search available via the labels.)
- Jozsef Horvath's PhD dissertation http://www.geocities.com/writing_site/thesis.
LLC (Longman Learners' Corpus)
http://www.longman-elt.com/dictionaries/corpus/learners.html
MELD (Montclair Electronic Language Database)
http://www.chss.montclair.edu/linguistics/MELD/
Polish Learner English Corpus
http://pelcra.ia.uni.lodz.pl/corpora_en.php
SILS (School of International Liberal Studies at Waseda University)
http://www.f.waseda.jp/vicky/learner/index.html
TeleNex Student Corpus
http://www.telenex.hku.hk/telec/smain/sintro/intro.htm
USE (Uppsala Student English Project)
http://www.engelska.uu.se/use.html
COMET Project
- corpus of Learner English and Learner German (as well as Italian and Spanish), collected at the University of S?o Paulo, Brasil
- http://www.fflch.usp.br/dlm/comet/comaprend.html
- article on the corpus: ttp://www.fflch.usp.br/dlm/comet/artigos/A%20multilingual%20learner%20corpus%20in%20Brazil.pdf
ISLE
Approx. 20 minutes of speech (per speaker) from 23 German and 23 Italian intermediate learners of English. Each speaker recorded sentences from several blocks of differing types (reading simple sentences, using minimal pairs, giving answers to multiple choice questions). The prompts were of varying perplexities. For more details, see:
Menzel, W; Atwell, E; Bonaventura, P; Herron, D; Howarth, P; Morton, R; Souter, C. The ISLE Corpus of non-native spoken English. in Proc LREC2000 vol. 2, pp. 957-964, European Language Resources Association. 2000. http://www.comp.leeds.ac.uk/eric/menzel00lrec.pdf
Atwell, Eric; Howarth, Peter; Souter, Clive. The ISLE corpus: Italian and German spoken learner's English. ICAME Journal, vol. 27, pp. 5-18. 2003. http://www.comp.leeds.ac.uk/eric/atwell03icamej.pdf