TTR值最高，怎么STTR值反而最低？

此祈彼福 · 2014-07-21

用WordSmith检索了下面三个语料库，为什么TTR值最高的LCMC(G)反而STTR值最低？这怎么解释啊？前面两个是汉语译文，后一个是原创汉语。有哪位老师知道么？谢谢了！

清风出袖 · 2014-07-25

回复: TTR值最高，怎么STTR值反而最低？

standardised type/token ratio One problem when calculating
the type/token ratio in a corpus is that the larger
the corpus, the lower the type/token ratio is likely to be.
This is because high frequency words like the tend to be
repeated whereas the probability of new types of words
appearing will always decrease, the larger the corpus
size. Therefore, the type/token ratio tends to reveal more
about corpus size than lexical repetition or uniqueness.
For example, the Freiburg–LOB Corpus of British
English (FLOB) Corpus is made up of fifteen files of
different sizes. From Table 9, it can be seen that the
larger the file, the smaller the type/token ratio tends
to be.
McEnery, T., Hardie, A., & Baker, P. (2006). A glossary of corpus linguistics. Edinburgh: Edinburgh University Press.p150-151

此祈彼福 · 2014-07-26

回复: TTR值最高，怎么STTR值反而最低？

Thanks! I got it. If the data are correct, it means, I think, translated Chinese has a higher STTR than native Chinese.

pengwentao · 2014-07-27

回复: TTR值最高，怎么STTR值反而最低？

你好请问怎么进行CLEC检索？

TTR值最高，怎么STTR值反而最低？

此祈彼福

附件

清风出袖

高级会员

此祈彼福

pengwentao