TTR值最高,怎么STTR值反而最低?

用WordSmith检索了下面三个语料库,为什么TTR值最高的LCMC(G)反而STTR值最低?这怎么解释啊?前面两个是汉语译文,后一个是原创汉语。有哪位老师知道么?谢谢了!
 

附件

  • 360截图20140721221706346.jpg
    360截图20140721221706346.jpg
    15.3 KB · 浏览: 7
回复: TTR值最高,怎么STTR值反而最低?

standardised type/token ratio One problem when calculating
the type/token ratio in a corpus is that the larger
the corpus, the lower the type/token ratio is likely to be.
This is because high frequency words like the tend to be
repeated whereas the probability of new types of words
appearing will always decrease, the larger the corpus
size. Therefore, the type/token ratio tends to reveal more
about corpus size than lexical repetition or uniqueness.
For example, the Freiburg–LOB Corpus of British
English (FLOB) Corpus is made up of fifteen files of
different sizes. From Table 9, it can be seen that the
larger the file, the smaller the type/token ratio tends
to be.
McEnery, T., Hardie, A., & Baker, P. (2006). A glossary of corpus linguistics. Edinburgh: Edinburgh University Press.p150-151
 
回复: TTR值最高,怎么STTR值反而最低?

Thanks! I got it. If the data are correct, it means, I think, translated Chinese has a higher STTR than native Chinese.
 
Back
顶部