Haiyang Ai
Administrator
http://americannationalcorpus.org/frequency.html
Total files are 160M, Windows ZIP format
These are preliminary word frequency counts for the first release of the ANC. The counts will be refined as texts are added and our part of speech tagger(s) are fine-tuned. The data is divided into counts for the entire first release as well as for the spoken texts and written texts.
In addition, three versions of the bigram counts are provided:
Sorted by frequency
Sorted by first word of the bigram
Sorted by second word of the bigram
Total files are 160M, Windows ZIP format
These are preliminary word frequency counts for the first release of the ANC. The counts will be refined as texts are added and our part of speech tagger(s) are fine-tuned. The data is divided into counts for the entire first release as well as for the spoken texts and written texts.
In addition, three versions of the bigram counts are provided:
Sorted by frequency
Sorted by first word of the bigram
Sorted by second word of the bigram