在使用nltk时如何匹配两个以上的名词？

ZJUray · 2016-06-27

比如pattern = """ NP: {(<NN.*>){2,}} """或者pattern = """ NP: {(<NN.*>){2,5}} """，使用
nltk.RegexpParser(pattern)均会报错，说Illegal chunk pattern。求问原因。

qhdjason · 2016-06-29

For <NN.*>{2,} try

代码:

 pattern = r"""NP: {<NN.*><NN.*>+}"""

For <NN.*>{2,5} try

代码:

pattern = r"""NP: {<NN.*><NN.*>}
                  {<NN.*><NN.*><NN.*>}
                  {<NN.*><NN.*><NN.*><NN.*>}
                  {<NN.*><NN.*><NN.*><NN.*><NN.*>}
"""

Ugly but work.

Run the following command:

代码:

import nltk
sent = nltk.word_tokenize("Again, it depends on whether the UK government decides to introduce a work permit system of the kind that currently applies to non-EU citizens, limiting entry to skilled workers in professions where there are shortages.")
tagged_sent = nltk.pos_tag(sent)
pattern = r"""NP: {<NN.*><NN.*>+}"""
cp = nltk.RegexpParser(pattern)
print cp.parse(tagged_sent)

You get:

代码:

(S
  Again/RB
  ,/,
  it/PRP
  depends/VBZ
  on/IN
  whether/IN
  the/DT
  (NP UK/NNP government/NN)
  decides/VBZ
  to/TO
  introduce/VB
  a/DT
  (NP work/NN permit/NN system/NN)
  of/IN
  the/DT
  kind/NN
  that/IN
  currently/RB
  applies/VBZ
  to/TO
  (NP non-EU/NNP citizens/NNS)
  ,/,
  limiting/VBG
  entry/NN
  to/TO
  skilled/JJ
  workers/NNS
  in/IN
  professions/NNS
  where/WRB
  there/EX
  are/VBP
  shortages/NNS
  ./.)

ZJUray · 2016-06-29

Yeah, I figured it out yesterday, just like u said. Thanks anyway~

作者 qhdjason: