McEnery, T. and Wilson, A. 2001. Corpus Linguistics. (2nd Ed.) Edinburgh: Edinburgh University Press
(This is the book on which the course will be based)
Biber, D., S. Conrad and R. Reppen. 1998. Corpus Linguistics: Investigating Language Structure and Use Cambridge: Cambridge University Press
(An alternative introductory text book).
further reading
Some slightly more advanced reading, which can complement and deepen the material covered in the textbook.
McEnery, T., Xiao, R. and Tono, Y. 2006. Corpus-based language studies : an advanced resource book London: Routledge.
course description
Corpus Linguistics is the study of language through the use of corpora, i.e. large archives of linguistic data (such as texts and speech transcriptions), which are usually analysed using computers. Over the past decades, Corpus Linguistics has emerged as an important methodology m in the study of languages, primarily because it has helped to place linguistic theory on a sounder empirical footing. This course aims to introduce participants to the general field of Corpus Linguistics and its methods, and to a number of topics that have benefited from the study of linguistic corpora. In so doing, it will motivate them to assess critically some of the basic assumptions in this field, in comparison to those made in more theoretically-oriented work.

The lectures for this course are divided into two main parts. The first covers the fundamentals of the corpus-based methodology, its origins, and the tools involved. The second looks at specific examples of how corpora can be used in linguistic research.

Part I: Foundational Issues
What is a corpus? A brief history of corpus linguistics, with particular reference to the relationship between corpus-based methodologies and the generative paradigm in linguistic theory.
Types of corpora (reference corpora, parallel corpora, web corpora, multilingual corpora, etc), including specific examples of existing corpora (e.g. the British National Corpus, the Maltese Language Resource Server)
Issues in the construction and design of linguistic corpora, specifically:
Representativeness and balance in corpus design;
Levels of annotation (markup) of a corpus;
Linguistic annotation of a corpus.
Methods for conducting linguistic analysis using corpora:
Some fundamentals of statistical analysis (frequencies, distributions, etc.)
An introduction to some useful software for corpus analysis.
Part II: Applications
This part of the course will consider how the use of electronic corpora has contributed to various sub-fields of linguistics and computational linguistics.

Throughout the course, emphasis will be placed on practical work. Participants will be encouraged to engage in hands-on exercises using software to exploit corpora to invetigate linguistic phenomena.


