回复: Early findings from the BNC2014 project
there are some more details about BNC14 initial results here -
i'll copy/paste as i know google not often accessible from china:
You may have noticed a number of newspaper articles/interviews reporting on initial findings of BNC14 spoken #corpus
. See end of this post for a list.
I had a few questions about what I read and Robbie Love one of the researchers on the project kindly agreed to answer these.
Is figure of 155 instances per million for marvellous correct? As I found 31.8 from BNC web?
Robbie: The figure of 155 per million was an error which I spotted too late into the media campaign to change. In the demographic component of the spoken BNC, marvellous has a raw frequency of 155 (rather than a relative frequency of 155 per million). The relative frequency is closer to your figure, 38.4 per million. Nonetheless the finding that marvellous is among the top ten words that have decreased in relative frequency most drastically, between the demographic spoken BNC (1990s) and our sample (2010s), is still true, so when I spotted the error I didn’t think it was worth changing the number – most people were concerned with the words more than the exact numbers. For the purposes of generating interest in the story, its place among the top ten, rather than its exact relative frequency, was what was important to me. But thanks for spotting this – you’re the only one (so far)!
How many tokens make up the approx. 200 recordings that have been used?
Robbie: The sample we used to generate the 2010s is a set of 219 recordings that were made in 2012. These contain the speech of 184 speakers, from a variety of areas across England, and comes to a total of 1,913,151 – just short of 2 million words. It’s not demographically balanced, but rather a sample of everything that we could collect at the time, so we’re not looking to use it for anything more than frequency-based “how has language changed” questions until we collect the rest of the corpus, which we aim to come to a total of at least 10 million words.
There was a tweet saying you are looking for 'native' speakers could you clarify this?
Robbie: We’re looking for speakers of British/Northern Irish nationality, whose first language is British English. So they may not have been born in the UK, or indeed they may not have lived in the UK for a large period of time. The important thing is that they are a UK national and that the first language variety that they acquired is British English.
Final comments by Robbie: Once the dust has finally settled from the first wave of media interest, we’ll likely put up a summary on the CASS blog. Until then, I would be grateful if you could put a little call for participation on your G+ corpus linguistics community, which looks like a fun page. Information about the project, and how to participate, can be found here:
Importantly, we’re encouraging speakers to email email@example.com to participate in the project.
Thanks to Robbie for taking time to answer these questions and if you feel like contributing recordings to get in touch with him. Although there is monetary compensation for your efforts what is more attractive is that the corpus will be made available to the public.
One final thing is that most of the media reports focused on the American English dominating angle, if you look at COCA for example the same patterns in rise in awesome and decline in marvelous (note single l) can be seen in US English -
Media report list:
No longer marvellous... now everything is awesome: How Britons are using more American words because traditional English is in decline
From marvellous to awesome: how spoken British English has changed
Awesome! How American words are changing the way we speak
‘Pussy cat’, ‘marvellous’ and ‘cheerio’ – just some of the words that no longer exist in our vocab
British people use more American words than ever due to rise of the digital age
Cheerio Marvellous... You're No Longer Awesome
BBC Radio 4 - English language becomes more Americanised (starts at 27:58)
Why marvellous isn't awesome anymore