Bergen Corpus of London Teenage Language (COLT)


Staff member
The Bergen Corpus of London Teenage Language (COLT)

COLT is the first large English Corpus focusing on the speech of teenagers. It was collected in 1993 and consists of the spoken language of 13 to 17-year-old teenagers from different boroughs of London. The complete corpus, half a million words, has been orthographically transcribed and word-class tagged, and is a constituent of the British National Corpus.

[本贴已被 作者 于 2005年06月15日 01时18分18秒 编辑过]
I have collected some 20 hours of Chinese urban teenage talk. I will post the basic info of the corpus when I finish doing all the transciption and annotation.
My project--Spoken Chinese of Urban Teenagers (SCOUT)--is part of Prof. Gu's project, so i have to get permission from him before sharing with the public. Additionally, there are much privacy and ethical issues to be dealt with in such spoken data. so it is not right time now to make available SCOUT publicly.

[本贴已被 作者 于 2005年06月21日 15时44分14秒 编辑过]