在使用nltk时如何匹配两个以上的名词?

本文由 ZJUray2016-06-27 发表於 "编程与工具开发" 讨论区

  1. 比如pattern = """ NP: {(<NN.*>){2,}} """或者pattern = """ NP: {(<NN.*>){2,5}} """,使用
    nltk.RegexpParser(pattern)均会报错,说Illegal chunk pattern。求问原因。
     
  2. For <NN.*>{2,} try
    代码:
     pattern = r"""NP: {<NN.*><NN.*>+}"""
    For <NN.*>{2,5} try
    代码:
    pattern = r"""NP: {<NN.*><NN.*>}
                      {<NN.*><NN.*><NN.*>}
                      {<NN.*><NN.*><NN.*><NN.*>}
                      {<NN.*><NN.*><NN.*><NN.*><NN.*>}
    """
    
    Ugly but work. :)

    Run the following command:
    代码:
    import nltk
    sent = nltk.word_tokenize("Again, it depends on whether the UK government decides to introduce a work permit system of the kind that currently applies to non-EU citizens, limiting entry to skilled workers in professions where there are shortages.")
    tagged_sent = nltk.pos_tag(sent)
    pattern = r"""NP: {<NN.*><NN.*>+}"""
    cp = nltk.RegexpParser(pattern)
    print cp.parse(tagged_sent)
    
    You get:

    代码:
    (S
      Again/RB
      ,/,
      it/PRP
      depends/VBZ
      on/IN
      whether/IN
      the/DT
      (NP UK/NNP government/NN)
      decides/VBZ
      to/TO
      introduce/VB
      a/DT
      (NP work/NN permit/NN system/NN)
      of/IN
      the/DT
      kind/NN
      that/IN
      currently/RB
      applies/VBZ
      to/TO
      (NP non-EU/NNP citizens/NNS)
      ,/,
      limiting/VBG
      entry/NN
      to/TO
      skilled/JJ
      workers/NNS
      in/IN
      professions/NNS
      where/WRB
      there/EX
      are/VBP
      shortages/NNS
      ./.)
    
     
    Last edited: 2016-06-29
    ZJUray 点赞!
  3. Yeah, I figured it out yesterday, just like u said. Thanks anyway~