ArthurW老师说的应该是陆小飞老师的Lexical Complexity Analyzer https://sites.psu.edu/xxl13/lca/
另外,还可关注一下Kristopher Kyle的两个工具
The Tool for the Automatic Analysis if Lexical Diversity (TAALED)
The Tool for the Automatic Analysis of Lexical Sophistication (TAALES)
https://www.linguisticanalysistools.org/tools.html
多维语域分析的核心统计方法是factor analysis,方法的操作说明可参考:https://corpus.bfsu.edu.cn/info/1084/1873.htm页面上的Principal component analysis (PCA) and factor analysis (CA) data and codes (https://corpus.bfsu.edu.cn/PCA_FA0608.zip)
可参看相关研究:许家金、李佳蕾,2022,近百年汉语书面语的语域演变研究,《外语与外语教学》(4):76-86。
谢谢ArthurW的分享。有关MSTTR和MATTR,可见https://cran.r-project.org/web/packages/koRpus/koRpus.pdf
的66页和67页。
MSTTR (Mean Segmental Type-Token Ratio):
(1) segments the text into fixed-length segments (e.g., 100 words per segment).
(2) For each segment, the TTR is calculated as the ratio of unique words...
The workshop is jointly organised by the English Department of the University of Freiburg and the Institut für Deutsche Sprache (IDS) in Mannheim and, as a scoping workshop, designed to explore the major empirical, methodological and conceptual challenges facing our research community. Although...