☆哪有免费英语语料资源?Where2find free English Corpora?

xujiajin

管理员
Staff member
哪里有可以免费使用的大型英语语料库资源?
1. BNC-World Simple Search
http://thetis.bl.uk/lookup.html
But no more than 50 hits will be displayed, with a fixed amount of context.
All you want to know about the BNC
http://www.corpus4u.org/showthread.php?t=220

2. FLOB and Frown are one million word balanced corpora for modern written British and American English (the 1991 updates of LOB and Brown).
They are now freely available (only on the corpus4u site) for searching using our WebConc
http://www.ling.lancs.ac.uk/corplang/cgi-bin/FreiburgConC.pl
If you have a copy of Xaira, you can also access FLOB by opening the coprus with your xaira Client using the following as the corpus name
http://www.ling.lancs.ac.uk/corplang/flob/flob.xcorpus

3. Brown, LOB, BNC sampler
Here are a few links for searching corpora online, inlcuding well known monolingual corpora like Brown, LOB, and BNC sampler and also some parallel English-Chinese corpora.
English: http://www.edict.com.hk/concordance/WWWConcappE.htm
English: http://www.lextutor.ca/concordancers/concord_e.html
Parallel: http://www.edict.com.hk/concordance/paralleltexts/

4. Collins Cobuild Corpus Concordance Sampler
http://www.collins.co.uk/Corpus/CorpusSearch.aspx
The Collins WordbanksOnline English corpus is composed of 56 million words of contemporary written and spoken text. To get a flavour of the type of linguistic data that a corpus like this can provide, you can type in some simple queries here and get a display of concordance lines from the corpus. The query syntax allows you to specify word combinations, wildcards, part-of-speech tags, and so on.

 
5. New BNC interface - VIEW:
http://view.byu.edu/

6. Samples (about 2 million words) from the British National Corpus: both written and spoken
The Brown Corpus
and many others - native, learner...
Go to http://www.lextutor.ca/concordancers/concord_e.html

If you have your own corpus data but don't have a concordancer, then...
Go to http://www.lextutor.ca/concordancers/text_concord/

you can Browse and upload large files from your machines.

[本贴已被 xujiajin 于 2005年08月05日 21时21分27秒 编辑过]
 
Trigrams in FLOB (frequency above 20)
http://www.corpus4u.org/showthread.php?t=286
Bigrams in FLOB (frequency above 100)
http://www.corpus4u.org/showthread.php?t=285

BNC wordlist
To extract keywords from your corpora, you will need a reference wordlist. The 100 million word BNC World Edition is an ideal reference corpus. You can download a word list based on the BNC World Edition here:

Users of WordSmith version 3.0: http://www.ling.lancs.ac.uk/coprlang/cbls/zipfiles/bnc.zip

Users of WordSmith version 4.0: http://www.lexically.net/downloads/version4/downloading BNC.htm
 
7. MICASE
http://www.hti.umich.edu/m/micase/
There are currently 152 transcripts (totaling 1,848,364 words) available at the site.

8. CLEC online concordancing
http://www.clal.org.cn/corpus/EngSearchEngine.aspx
CLEC收集了包括中学生、大学英语4级和6级、专业英语低年级和高年级在内的5种学生的语料一百多万词,并对言语失误进行标注。其目的就是观察各类学生的英语特征和言语失误的情况,希望通过定量和定性的方法对中国学习者英语作出较为精确的描写,为我国学生的英语教学提供有用的反馈信息。
For an introduction of the corpus, its error tagset and some statistics, see
http://www.clal.org.cn/baseinfo/achievement/Achievement1.htm

For a more general discussion, see Yang's keynote conference lecture at
lc.ust.hk/~centre/conf2001/keynote/subsect4/yang.pdf
 
If you have questions about the corpora/wordlists listed above, you may like to use the in-site search engine on the first page for solutions.
 
According to the news on the official site of ANC:

Search the ANC On-line is Coming Soon

Search the ANC On-line - Mark Davies of Brigham Young University is setting up a web-based interface to access the ANC 1st release data, similar to his on-line BNC search engine.
http://americannationalcorpus.org/index.html

[本贴已被 作者 于 2005年08月21日 16时51分08秒 编辑过]
 
15. Advanced Search C Explore N-Grams from the British National Corpus
http://pie.usna.edu
Grams
N-Grams Simple
N-Grams Advanced
Phrase-Frames
Char-Grams
POS-Grams

Documentation
Getting Started Tutorial
FAQ
Conventions
Part of Speech Tags
Fused Forms & Multiword Units
Character Entity Remapping
 
http://view.byu.edu/
Features at VIEW interface
2006020919032591.jpg
 
Back
顶部