以下是引用 动态语法 在 2005-7-4 6:11:55 的发言:
The general problem behind this post is how your lemmatization is done. If it is based on an existing corpus, then cases like 'impairs' as the root form will occur. If, as suggested by XIAOZ, one starts with a word list (preferably some well known lists), then this problem could be avoided. But you may, on the other hand, end up with a lot of zero occurrences when some of the words do not show up in your corpus.