各位好!新手入门,不懂的东西太多。在用Tree Tagger符码时,发现文本要是没有处理好的话(一句话不是一行的话)就产生如下效果:
ara NP Sara
Crewe NP Crewe
, , ,
a DT a
pupil NN pupil
at IN at
Miss NP Miss
Minchin NP Minchin
's POS 's
London NP London
school NN school
, , ,
is VBZ be
left VVN leave
in IN in
poverty NN poverty
when WRB when
her PP$ her
father NN father
dies VVZ die
, , ,
but CC but
is VBZ be
later RBR later
rescued VVD rescue
by IN by
a DT a
mysterious JJ mysterious
benefactor NN benefactor
. SENT .
Sara NP Sara
Once RB once
on IN on
如果用一句一行的文本,符码情况就很好,如下:
The_DT Pool_NN of_IN Tears_NP Alice_NP was_VBD beginning_VVG to_TO get_VV very_RB tired_JJ of_IN sitting_VVG by_IN her_PP$ sister_NN on_IN the_DT bank_NN ,_, and_CC of_IN having_VHG nothing_NN to_TO do_VV :_: once_RB or_CC twice_RB she_PP had_VHD peeped_VVN into_IN the_DT book_NN her_PP$ sister_NN was_VBD reading_VVG ,_, but_CC it_PP had_VHD no_DT pictures_NNS or_CC conversations_NNS in_IN it_PP ,_, and_CC where_WRB is_VBZ the_DT use_NN of_IN a_DT book_NN ,_, thought_VVD Alice_NP ,_, without_IN pictures_NNS or_CC conversations_NNS ?_SENT
我估计是TXT文本没有一句一行造成的,因此想请教下英文文本用什么比较快捷的办法做到一句一行。
如果不是这个原因的话,还想请教是什么原因造成Treetagger符码混乱的原因?
在此谢过啦!
ara NP Sara
Crewe NP Crewe
, , ,
a DT a
pupil NN pupil
at IN at
Miss NP Miss
Minchin NP Minchin
's POS 's
London NP London
school NN school
, , ,
is VBZ be
left VVN leave
in IN in
poverty NN poverty
when WRB when
her PP$ her
father NN father
dies VVZ die
, , ,
but CC but
is VBZ be
later RBR later
rescued VVD rescue
by IN by
a DT a
mysterious JJ mysterious
benefactor NN benefactor
. SENT .
Sara NP Sara
Once RB once
on IN on
如果用一句一行的文本,符码情况就很好,如下:
The_DT Pool_NN of_IN Tears_NP Alice_NP was_VBD beginning_VVG to_TO get_VV very_RB tired_JJ of_IN sitting_VVG by_IN her_PP$ sister_NN on_IN the_DT bank_NN ,_, and_CC of_IN having_VHG nothing_NN to_TO do_VV :_: once_RB or_CC twice_RB she_PP had_VHD peeped_VVN into_IN the_DT book_NN her_PP$ sister_NN was_VBD reading_VVG ,_, but_CC it_PP had_VHD no_DT pictures_NNS or_CC conversations_NNS in_IN it_PP ,_, and_CC where_WRB is_VBZ the_DT use_NN of_IN a_DT book_NN ,_, thought_VVD Alice_NP ,_, without_IN pictures_NNS or_CC conversations_NNS ?_SENT
我估计是TXT文本没有一句一行造成的,因此想请教下英文文本用什么比较快捷的办法做到一句一行。
如果不是这个原因的话,还想请教是什么原因造成Treetagger符码混乱的原因?
在此谢过啦!