语料库的可比性

#1
最近看到一篇文章:
Bolton, K., Nelson, G., & Hung, J. (2002).
A corpus-based study of connectors in student writing: Research from the
International Corpus in Hong Kong (ICE-HK). International Journal of
Corpus Linguistics, 7(2), 165-182.
1. 作者认为,学习者(ESL/EFL)的作文理想的参照语料应是已出版的学术文章,原话:
Again, we would challenge that assumption, and would instead argue that a better set of control data would be provided by a corpus of published academic writing in English. The target normin academic writing, for both ‘native’ and ‘non-native’ students is better defined as academic
writing itself, and the best texts for comparison are clearly those already
published in international English-language academic journals.
对此,本人存疑,不只各位有何见解,望不吝赐教!
2. 作者认为,连接词的计算不应以词数为计算机基数,而应以句子为基数
We would thus argue that it is illogical to base the calculation of
such a ratio on the fundamental unit of the word, and in the present study, it was crucially decided to adopt the sentence as the basic unit of analysis...
...Therefore the frequency
of connectors per 100,000 words, as presented by Granger and Tyson, is, we contend, not an appropriate measure of connector frequency. In all cases, our frequencies per sentence are multiplied by 1,000, in order to eliminate very low figures.
参考其它文献,本人认为,对学习者语言的考察,与其以句子为基数,还不如以T-unit为基数基数频率,望各位提出宝贵意见。
 
顶部