问题2: 要知道某个词性的词,如动词 head, 占某个语料库中 head 这个词出现总次数中的比例,首先就得给语料做词性标注(POS tagging)。 同理,要想做到概念层面的提取,就先得给语料做到这一层次的赋码。当然,这样做的计算成本就非常高了。因此,通过词汇搭配、语义网络、上下文语境等特征进行抽取会更省事些,如能在文本中通过上下文分辨出 river bank 和 money bank 就可以把概念为银行的 bank 拿出来的。这样做,自然也会因词而异。
I guess xiaoz is actually referring to USAS not Wmatrix.
USAS only differentiates the semantic categories of linguistic items, predominantly lexical items, but not necessarily different semantic options of a single lemma.
Yes. Wmatrix is a web interfeace for CLAWS and USAS.
Unfortunately, USAS does not make a distinction between the two meanings of bank in the following test example "There is a bank on the right bank of the river". Both instances of bank are tagged as I1 (Money generally).