HK Corpus of Conversational English
Facilitating a description of intercultural
conversations: the Hong Kong Corpus of
Conversational English
Winnie Cheng and Martin Warren
Department of English, The Hong Kong Polytechnic University
1 Background
The relative difficulty with which spoken corpora can be compiled by the
researcher compared with written discourses, coupled with the time needed to
fully transcribe spoken data, to say nothing of the additional expenses involved,
inevitably has made large spoken corpora a far rarer entity than written corpora.
And yet, if we are to further unravel the intricacies of spoken discourse, we need
to work with large, representative corpora of spoken discourses. The two largest
corpora of the English language, the Bank of English and the British National
Corpus, devote approximately 10 per cent of their corpora to spoken English.
This is not to say that spoken discourse has been forgotten or ignored by corpus
linguists. The London-Lund Corpus, for example, has been an invaluable source
of data for many years, and projects such as COLT (see eg Andersen and Stenström
1996) and CANCODE (see eg Carter and McCarthy 1997; McCarthy and
Carter 1997) are evidence of more recent initiatives to compile spoken corpora.
Conversations are the most prevalent form of discourse, accounting for morehttp://forum.corpus4u.org/upload/forum/2005070421025637.pdf
Facilitating a description of intercultural
conversations: the Hong Kong Corpus of
Conversational English
Winnie Cheng and Martin Warren
Department of English, The Hong Kong Polytechnic University
1 Background
The relative difficulty with which spoken corpora can be compiled by the
researcher compared with written discourses, coupled with the time needed to
fully transcribe spoken data, to say nothing of the additional expenses involved,
inevitably has made large spoken corpora a far rarer entity than written corpora.
And yet, if we are to further unravel the intricacies of spoken discourse, we need
to work with large, representative corpora of spoken discourses. The two largest
corpora of the English language, the Bank of English and the British National
Corpus, devote approximately 10 per cent of their corpora to spoken English.
This is not to say that spoken discourse has been forgotten or ignored by corpus
linguists. The London-Lund Corpus, for example, has been an invaluable source
of data for many years, and projects such as COLT (see eg Andersen and Stenström
1996) and CANCODE (see eg Carter and McCarthy 1997; McCarthy and
Carter 1997) are evidence of more recent initiatives to compile spoken corpora.
Conversations are the most prevalent form of discourse, accounting for morehttp://forum.corpus4u.org/upload/forum/2005070421025637.pdf