What data for a learner corpus


Staff member
In the Preface (p. xvii) to Learner English on Computer (ed. S. Granger, Longman 1998), G. Leech appears to think that a learner corpus consists of "the data that the language learners produce more or less naturalistically - in non-test, non-classroom conditions." Most existing learner corpoa of English - including Cambridge Learner Corpus, and a number of learner corpora produced in China and Japan are nearly exclusively composed of data produced in test conditions - not "naturalistic" enough. However, there are not many naturalistic opportunities for learners to USE the language they are learning. In addition, except for spoken data in natural conversation (as opposed to retelling stories, repeat sentences), much of the writing produced by learners in their own time cannot really reflect their level of proficiency.

What data do you think should be included in a learner corpus?