A Corpus-based empirical study on inter-textual vocabulary growth*
Fan Fengxiang
Foreign Language Department, Dalian Maritime University, Dalian, China
Abstract:
This paper examines inter-textual vocabulary growth. Eight sets of samples were drawn randomly from the British National Corpus, totalling 8,001,000 words. The results show that for texts of 500 and 2,000 words totalling about 1,000,000 words, the inter-textual vocabulary growth curves are as follows: from 60,000 cumulative word tokens downwards, the inter-textual vocabulary growth displays a cubic curve, and thereafter it approximates a power curve. The inter-textual vocabulary growth model parameters were obtained empirically from the eight sets of samples, with which inter-textual vocabulary growth and the new vocabulary a text produces can be estimated.