Corpus-based English word-frequency lists

xiaoz

永远的超级管理员
Staff member
Corpus-based English word-frequency lists:

  • Extended Version of A General Service List of English Words by Michael West (Longman, 1953) with semantic-field categories added for all entries. The General Service List was a pioneering corpus-based frequency list, and has been out of print for a number of years. It remains the only available such list to provide frequencies for words in particular senses (word-in-sense frequencies) rather than raw word frequencies (or frequencies of words according to word-class). Thus, with regard to the word 'able', for example, the General Service List gives the frequency of 'able' in the sense of 'having the ability to' as well as its frequency in the sense of 'competent, skilled'. I have taken the semantic-field categories from the Longman Lexicon of Contemporary English (LLCE) by Tom McArthur (Longman, 1981). For a list of the semantic- field categories used in the LLCE click here.

    The Extended Version of a General Service List is currrently in Excel format, allowing for both sorting and extraction of materials according to the following categories: A. Line numbering (order in which entries appear in the printed version of the General Service List; B. Headword as given in the General Service List; C. Lemmatized headword, i.e. standard dictionary-type headword; D. McArthur category; E. Word-class; F. Word count 1, as given in the General Service List; G. Word count 2, 'raw' word count without additional information given in the General Service List; H. Percentage scores for occurrences of words in a particular sense in the General Service List; I. Word-in- sense frequency; J. Source of information (given throughout as GenSerList); K. Meaning (as given in General Service List). Further information regarding these categories can be obtained here. I thank the Institute of Education, University of London, copyright holders of the General Service List of English Words, for permission to reproduce this material. Further information about the organisation of the General Service List can be found in the Introduction to the printed version of the List. The work of scanning the printed version of the General Service List and converting the material to Excel format was done by Nimish Shah.
  • Frequencies in Spoken and Written English by Geoffrey Leech, Paul Rayson and Andrew Wilson (Longman, 2001) with semantic-field categories added for all entries. This list is based on the British National Corpus. Information about the corpus and electronic versions of frequency lists derived from it can be found on: http://www.comp.lancs.ac.uk/computing/research/ucrel/bncfreq/. I have taken the semantic-field categories from the Longman Lexicon of Contemporary English by Tom McArthur (Longman, 1981). For a list of the semantic-field categories used in the LLCE click here.

    The Extended Version of Rank Frequency List: Spoken English is currrently in Excel format, allowing for both sorting and extraction of materials according to the following categories: A. Rank frequency order; B. Non-lemmatized head (as given in Leech, et al.); C. Lemmatized headword, i.e. standard dictionary- type headword; D. McArthur category; E. Word-class; F. Rounded frequency per million words in speech; G. Log likelihood; H. Rounded frequency per million words in writing; I. Source of information (given throughout as WoFreSpoWriEng). I thank Geoffrey Leech, Paul Rayson and Andrew Wilson, copyright holders of Word Frequencies in Spoken and Written English, for permission to reproduce their material.
Cited from http://www.languages.salford.ac.uk/staff/dickins.php
 

armstrong

高级会员
回复: Corpus-based English word-frequency lists

Huge thanks,Dr.Xiao.
Could you kindly provide us the frequency lists of the written part and the spoken part of BNC repectively? I badly need it.
 
顶部