回复: 词频统计的另一个问题
thanks a lot!
but until now the number(753169) of the tokens I have counted in the COLSEC is still much more than that mentioned in the book by 杨惠中,卫乃兴,(723299)
i have excluded the tag of <> and []. and i don't know if there are any other symbols i shall exclude?