在使用nltk时如何匹配两个以上的名词?

#1
比如pattern = """ NP: {(<NN.*>){2,}} """或者pattern = """ NP: {(<NN.*>){2,5}} """,使用
nltk.RegexpParser(pattern)均会报错,说Illegal chunk pattern。求问原因。
 
#2
For <NN.*>{2,} try
代码:
 pattern = r"""NP: {<NN.*><NN.*>+}"""
For <NN.*>{2,5} try
代码:
pattern = r"""NP: {<NN.*><NN.*>}
                  {<NN.*><NN.*><NN.*>}
                  {<NN.*><NN.*><NN.*><NN.*>}
                  {<NN.*><NN.*><NN.*><NN.*><NN.*>}
"""
Ugly but work. :)

Run the following command:
代码:
import nltk
sent = nltk.word_tokenize("Again, it depends on whether the UK government decides to introduce a work permit system of the kind that currently applies to non-EU citizens, limiting entry to skilled workers in professions where there are shortages.")
tagged_sent = nltk.pos_tag(sent)
pattern = r"""NP: {<NN.*><NN.*>+}"""
cp = nltk.RegexpParser(pattern)
print cp.parse(tagged_sent)
You get:

代码:
(S
  Again/RB
  ,/,
  it/PRP
  depends/VBZ
  on/IN
  whether/IN
  the/DT
  (NP UK/NNP government/NN)
  decides/VBZ
  to/TO
  introduce/VB
  a/DT
  (NP work/NN permit/NN system/NN)
  of/IN
  the/DT
  kind/NN
  that/IN
  currently/RB
  applies/VBZ
  to/TO
  (NP non-EU/NNP citizens/NNS)
  ,/,
  limiting/VBG
  entry/NN
  to/TO
  skilled/JJ
  workers/NNS
  in/IN
  professions/NNS
  where/WRB
  there/EX
  are/VBP
  shortages/NNS
  ./.)
 
Last edited:
#3
Yeah, I figured it out yesterday, just like u said. Thanks anyway~
For <NN.*>{2,} try
代码:
 pattern = r"""NP: {<NN.*><NN.*>+}"""
For <NN.*>{2,5} try
代码:
pattern = r"""NP: {<NN.*><NN.*>}
                  {<NN.*><NN.*><NN.*>}
                  {<NN.*><NN.*><NN.*><NN.*>}
                  {<NN.*><NN.*><NN.*><NN.*><NN.*>}
"""
Ugly but work. :)

Run the following command:
代码:
import nltk
sent = nltk.word_tokenize("Again, it depends on whether the UK government decides to introduce a work permit system of the kind that currently applies to non-EU citizens, limiting entry to skilled workers in professions where there are shortages.")
tagged_sent = nltk.pos_tag(sent)
pattern = r"""NP: {<NN.*><NN.*>+}"""
cp = nltk.RegexpParser(pattern)
print cp.parse(tagged_sent)
You get:

代码:
(S
  Again/RB
  ,/,
  it/PRP
  depends/VBZ
  on/IN
  whether/IN
  the/DT
  (NP UK/NNP government/NN)
  decides/VBZ
  to/TO
  introduce/VB
  a/DT
  (NP work/NN permit/NN system/NN)
  of/IN
  the/DT
  kind/NN
  that/IN
  currently/RB
  applies/VBZ
  to/TO
  (NP non-EU/NNP citizens/NNS)
  ,/,
  limiting/VBG
  entry/NN
  to/TO
  skilled/JJ
  workers/NNS
  in/IN
  professions/NNS
  where/WRB
  there/EX
  are/VBP
  shortages/NNS
  ./.)
 
顶部