谢谢volfer耐心指教。明白您说的通过larger reference corpus 来确定一个smaller corpus的主题的做法。刚才我在P. Baker的using corpora in DA,的p125关于keyness找到这句话:Using WordSmith, it is possible to compare the frequencies in one wordlist against another in order to determine which words occur statistically more often in wordlist A when compared with wordlist B and vice versa. Then all of the words that do occur more often than expected in one file when compared to another are compiled together into another list, called a keyword list.
他接着对比anti-hunting 和pro-hunting的两个subcorpus,得出一个图表(有keyness,p value),然后说“The first part shows words which occur more frequently in the anti-hunt speeches when compared to the pro-hunt speeches, while the opposite is true for the second part of the list.
Baker在书的较前面是有提到reference corpus,跟您的表述一样。所以关于keyness是不是有两个层次,一个是subcorpus参照larger reference corpus得出,一个是两个subcorpus之间的对比。我在思考的,与后者一样,即两个subcorpus之间的对比。所以比较文本A中相对于文本B中有unusual frequency的词汇,来反映涉及的主题的不同,还可行吗?希望volfer能继续帮忙解惑,谢谢!