推荐最新最大的美国英语语料库Corpus of Contemporary American English (COCA)

#1
The Corpus of Contemporary American English (COCA) is the largest freely-available corpus of English, and the only large and balanced corpus of American English. It was created by Mark Davies of Brigham Young University in 2008, and it is now used by tens of thousands of users every month (linguists, teachers, translators, and other researchers). COCA is also related to other large corpora that we have created or modified, including the British National Corpus (our architecture and interface), the 100 million word TIME Corpus (1920s-2000s), and the new 400 million word Corpus of Historical American English (COHA; 1810-2009).
The corpus contains more than 410 million words of text and is equally divided among spoken, fiction, popular magazines, newspapers, and academic texts. It includes 20 million words each year from 1990-2010 and the corpus is also updated once or twice a year (the most recent texts are from Summer 2010). Because of its design, it is perhaps the only corpus of English that is suitable for looking at current, ongoing changes in the language (see the 2010 article in Literary and Linguistic Computing).
The interface allows you to search for exact words or phrases, wildcards, lemmas, part of speech, or any combinations of these. You can search for surrounding words (collocates) within a ten-word window (e.g. all nouns somewhere near faint, all adjectives near woman, or all verbs near feelings), which often gives you good insight into the meaning and use of a word.
The corpus also allows you to easily limit searches by frequency and compare the frequency of words, phrases, and grammatical constructions, in at least two main ways:
  • By genre: comparisons between spoken, fiction, popular magazines, newspapers, and academic, or even between sub-genres (or domains), such as movie scripts, sports magazines, newspaper editorial, or scientific journals
  • Over time: compare different years from 1990 to the present time
You can also easily carry out semantically-based queries of the corpus. For example, you can contrast and compare the collocates of two related words (little/small, democrats/republicans, men/women), to determine the difference in meaning or use between these words. You can find the frequency and distribution of synonyms for nearly 60,000 words and also compare their frequency in different genres, and also use these word lists as part of other queries. Finally, you can easily create your own lists of semantically-related words, and then use them directly as part of the query.

该语料库地址为http://www.americancorpus.org,或美国英语词频http://www.wordfrequency.info
觉得好帮忙顶一下。
Compare the Corpus of Contemporary American English to the American National Corpus time corpus american english wordlists word lists frequency BYU Mark Davies
 
#2
回复: 推荐最新最大的美国英语语料库Corpus of Contemporary American English (COCA)

您好,那请问您了解美国另外一个语料库ANC吗?在线如何实现它的词频统计功能呢?谢谢!:p
 
#4
回复: 推荐最新最大的美国英语语料库Corpus of Contemporary American English (COCA)

COCA is the best thing I stumble upon today by a google search of superfection . Thanks so much for sharing. I will spend lot more time learning about how to use this site more effectively in my quest of becoming mastering English one last time.
 
#5
回复: 推荐最新最大的美国英语语料库Corpus of Contemporary American English (COCA)

请问这个语料库的光盘国内能否购买到,价格怎么样?谢谢!
 

xujiajin

管理员
Staff member
#6
回复: 推荐最新最大的美国英语语料库Corpus of Contemporary American English (COCA)

这个语料库只能在线检索,国内国外都买不到。
 
#7
回复: 推荐最新最大的美国英语语料库Corpus of Contemporary American English (COCA)

谢谢。那么,有没有其他可以购买的光盘的语料库?
 
#9
回复: 推荐最新最大的美国英语语料库Corpus of Contemporary American English (COCA)

支持!能用软件接口在线访问吗?
 
顶部