回复: TTR值最高,怎么STTR值反而最低?
standardised type/token ratio One problem when calculating
the type/token ratio in a corpus is that the larger
the corpus, the lower the type/token ratio is likely to be.
This is because high frequency words like the tend to be
repeated whereas the probability of new types of words
appearing will always decrease, the larger the corpus
size. Therefore, the type/token ratio tends to reveal more
about corpus size than lexical repetition or uniqueness.
For example, the Freiburg–LOB Corpus of British
English (FLOB) Corpus is made up of fifteen files of
different sizes. From Table 9, it can be seen that the
larger the file, the smaller the type/token ratio tends
to be.
McEnery, T., Hardie, A., & Baker, P. (2006). A glossary of corpus linguistics. Edinburgh: Edinburgh University Press.p150-151