以下是引用 csli 在 2006-1-21 23:37:19 的发言:
也许WordSmith Tools 4 Help里的这段话对你有用:
"If a text is 1,000 words long, it is said to have 1,000 "tokens". But a lot of these words will be repeated, and there may be only say 400 different words in the text. "Types", therefore, are the different words."
也就是说,不包括标点。有趣的是,在默认设置状态下,数字不算作词,在自定义设置状态下,数字是可以算作词的。
所以,与平时所说的“字符数”不是一个概念。