谢谢各位,可是我没有作者的联系方式,,不知道他具体怎么做的这个库...昨天仔细看了下,,,他选的一部分语料接近50万字,,没有100万 论文用 Liu’s (2003) three corpora of spoken American English 和 Friends 里的习语进行比较 Liu uses three corpora containing transcribed spoken language: (a) Barlow’s (2000) Corpus of Spoken, Professional American English (hereafter Professional); (b) a corpus of spoken American media English (Liu, 2002) (hereafter Media), and (c) Simpson & Mendis (2002) Michigan Corpus of Academic Spoken English (hereafter MICASE). 以上提到的这四个语料库Liu, Barlow , Media ,, MICASE大家熟悉吗，，哪里可以看到呢
This may help you to understand.
The actual analysis of the Spoken Corpus is just beginning, but in this relatively short time span a number of important facts have already come to light. The use of idioms, for example, is far more widespread in spoken language than it is in written language. An idiom such as flash in the pan which already occurs several times in the Spoken Corpus, doesn't appear at all in the 30 million word.