
I made a wordlist for a sample of BNCS, and was surpised to find that it had about 300, 000 tokens and that there were no contracted forms in it, such as "it's, don't, we're" and so on. I opend it in Word, searched for those contracted forms, and found none. I do not have the BNCS, just the sample. I surfed online and found the wordlist of BNC which included a lot of those forms. But there were no wordlists for the BNCS. Can you tell me where I can find the wordlist?
回复: 求助:BNCS的单词列表

the following message was posted by Dr. xiao. the original link is this:

[下载]BNC Word List for use with WordSmith


To extract keywords from your corpora, you will need a reference wordlist. The 100 million word BNC World Edition is an ideal reference corpus. You can download a word list based on the BNC World Edition here:

Users of WordSmith version 3.0: http://www.ling.lancs.ac.uk/coprlang/cbls/zipfiles/bnc.zip

Users of WordSmith version 4.0: http://www.lexically.net/downloads/version4/downloading BNC.htm

Hope you find it useful.
回复: 求助:BNCS的单词列表

right after my posting, i found i got it wrong, you were asking for wordlist of BNSC, not BNC. :D
回复: 求助:BNCS的单词列表

You cannot find contracted forms in the wordlist because you were searching for two words: e.g. didn't consists of two tokens did and n't. In the BNC, all such words are separately annotated (e.g. won't -> wo + n't).
回复: 求助:BNCS的单词列表

There were none of contracted forms when I opened BNCS in Word. The BNCS I had was free of annotations, and was just a sample about 300,000 tokens. Is it too small to use it as a reference corpus?