Stanford Parser online

回复: Stanford Parser online

how can I process a text and output the result?
thank you.

For English Text:

Under DOS, go to the directory where the parser is located, then type the line below:

lexparser.bat input.txt >output.txt

Then, enter to get your result.

For processing Chinese texts

Firstly, you need segement the input text (search ICTCLAS in this forum if you don't have). That is, convert 今天真热。to 今天 真 热 。

Then save the segmented text in GB format (not UTF-8, which is used for the GUI/windows version).

Next, creat a bat file by copying and pasting the following lines (between the equal signs in blue) to your notepad, and save it with a name of lexparserCh.bat to the same folder where your parser program is:

=============================
@echo off
:: Runs the Chinese PCFG parser on one or more files, printing trees only
:: usage: lexparser fileToparse
java -server -mx800m -cp "stanford-parser.jar;" edu.stanford.nlp.parser.lexparser.LexicalizedParser -outputFormat "penn,typedDependenciesCollapsed" chineseFactored.ser.gz %1
=============================

Finally, go to the directory where the parser is located, and type the line below:

lexparserCh.bat inputCh.txt >outputCh.txt

Then, enter to get your result.
 
回复: Stanford Parser online

thanks,Dr.Hong. the following are a parsed english text and a chinese text.

(ROOT
(S
(NP
(NP (NNS Scores))
(PP (IN of)
(NP (NNS properties))))
(VP (VBP are)
(PP (IN under)
(NP (JJ extreme) (NN fire) (NN threat)))
(SBAR (IN as)
(S
(NP (DT a) (JJ huge) (NN blaze))
(VP (VBZ continues)
(S
(VP (TO to)
(VP (VB advance)
(PP (IN through)
(NP
(NP (NNP Sydney) (POS 's))
(JJ north-western) (NNS suburbs))))))))))
(. .)))

nsubj(are-4, Scores-1)
prep_of(Scores-1, properties-3)
amod(threat-8, extreme-6)
nn(threat-8, fire-7)
prep_under(are-4, threat-8)
mark(continues-13, as-9)
det(blaze-12, a-10)
amod(blaze-12, huge-11)
nsubj(continues-13, blaze-12)
advcl(are-4, continues-13)
aux(advance-15, to-14)
xcomp(continues-13, advance-15)
poss(suburbs-20, Sydney-17)
amod(suburbs-20, north-19)
prep_through(advance-15, suburbs-20)

(ROOT
(S
(NP (NNP Fires))
(VP (VBP have)
(ADVP (RB also))
(VP (VBN shut)
(PRT (RP down))
(NP
(NP (DT the) (JJ major) (NN road)
(CC and)
(NN rail) (NNS links))
(PP (IN between)
(NP (NNP Sydney)
(CC and)
(NNP Gosford))))))
(. .)))

nsubj(shut-4, Fires-1)
aux(shut-4, have-2)
advmod(shut-4, also-3)
prt(shut-4, down-5)
det(road-8, the-6)
amod(road-8, major-7)
dobj(shut-4, road-8)
nn(links-11, rail-10)
conj_and(road-8, links-11)
prep_between(road-8, Sydney-13)
conj_and(Sydney-13, Gosford-15)

(ROOT
(S
(NP
(NP (DT The) (JJ promotional) (NN stop))
(PP (IN in)
(NP (NNP Sydney))))
(VP (VBD was)
(NP (NN everything)
(S
(VP (TO to)
(VP (VB be)
(VP (VBN expected)
(PP (IN for)
(NP
(NP (DT a) (NNP Hollywood) (NN blockbuster))
:) -)
(NP
(NP (NNS phalanxes))
(PP (IN of)
(NP
(NP (NNS photographers))
(, ,)
(NP (DT a) (NN stretch) (NN limo)))))))
(PP
(PP (TO to)
(NP
(NP (DT a) (NN hotel))
(PP (IN across)
(NP (DT the) (NNP Quay)))))
:) -)
(CC but)
(PP (IN with)
(NP (CD one) (NN difference))))))))))
(. .)))

det(stop-3, The-1)
amod(stop-3, promotional-2)
nsubj(everything-7, stop-3)
prep_in(stop-3, Sydney-5)
cop(everything-7, was-6)
aux(expected-10, to-8)
auxpass(expected-10, be-9)
infmod(everything-7, expected-10)
det(blockbuster-14, a-12)
nn(blockbuster-14, Hollywood-13)
prep_for(expected-10, blockbuster-14)
dep(blockbuster-14, phalanxes-16)
prep_of(phalanxes-16, photographers-18)
det(limo-22, a-20)
nn(limo-22, stretch-21)
appos(photographers-18, limo-22)
det(hotel-25, a-24)
prep_to(with-31, hotel-25)
det(Quay-28, the-27)
prep_across(hotel-25, Quay-28)
num(difference-33, one-32)
prep_with(expected-10, difference-33)

(ROOT
(S
(NP
(NP (DT A) (NN line-up))
(PP (IN of)
(NP (NNS masseurs))))
(VP (VBD was)
(VP (VBG waiting)
(S
(VP (TO to)
(VP (VB take)
(NP (DT the) (NNS media))
(PP (IN in)
(NP (NN hand))))))))
(. .)))

det(line-2, A-1)
nsubj(waiting-6, line-2)
prep_of(line-2, masseurs-4)
aux(waiting-6, was-5)
aux(take-8, to-7)
xcomp(waiting-6, take-8)
det(media-10, the-9)
dobj(take-8, media-10)
prep_in(take-8, hand-12)

(ROOT
(S
(NP (NNP Never))
(VP (VBZ has)
(NP
(NP (DT the) (NN term))
(SBAR
(S (`` ``)
(S
(VP (VBG massaging)
(NP (DT the) (NNS media))))
('' '')
(VP (VBD seemed)
(ADJP (RB so) (JJ accurate)))))))
(. .)))

nsubj(has-2, Never-1)
det(term-4, the-3)
dobj(has-2, term-4)
dep(accurate-12, massaging-6)
det(media-8, the-7)
dobj(massaging-6, media-8)
cop(accurate-12, seemed-10)
advmod(accurate-12, so-11)
rcmod(term-4, accurate-12)



(ROOT
(IP
(PP (P 随着)
(NP (NN 住房) (NN 制度) (NN 改革)))
(PU ,)
(NP
(CP
(IP
(VP
(ADVP (AD 越来越))
(VP (VA 多))))
(DEC 的))
(NP (NN 城镇) (NN 居民)))
(VP
(VP (VV 拥有)
(NP
(DNP
(NP (PN 自己))
(DEG 的))
(NP (NN 房屋))))
(PU ,)
(CC 而且)
(VP
(ADVP (AD 大量))
(VP (VV 集中)
(PP (P 在)
(LCP
(NP (NN 住宅) (NN 小区))
(LC 内))))))
(PU 。)))

prep(拥有-11, 随着-1)
nmod(改革-4, 住房-2)
nmod(改革-4, 制度-3)
pobj(随着-1, 改革-4)
advmod(多-7, 越来越-6)
rcmod(居民-10, 多-7)
cpm(多-7, 的-8)
nmod(居民-10, 城镇-9)
nsubj(拥有-11, 居民-10)
assmod(房屋-14, 自己-12)
assm(自己-12, 的-13)
dobj(拥有-11, 房屋-14)
cc(拥有-11, 而且-16)
advmod(集中-18, 大量-17)
ccomp(拥有-11, 集中-18)
prep(集中-18, 在-19)
nmod(小区-21, 住宅-20)
lobj(内-22, 小区-21)
plmod(在-19, 内-22)

(ROOT
(IP
(NP
(DNP
(NP (NN 物) (NN 权) (NN 法))
(DEG 的))
(DP (DT 这)
(QP (CD 一)))
(NP (NN 规定)))
(PU ,)
(VP (VV 回答) (AS 了)
(NP
(NP
(ADJP (JJ 广大))
(NP (NN 群众)))
(DNP
(PP (P 关于)
(IP (PU “)
(VP
(LCP
(IP
(NP (NT 70年))
(NP
(ADJP (JJ 大))
(NP (NN 限)))
(VP (VV 到期)))
(LC 后))
(PU ,)
(NP
(DNP
(NP (PN 我们))
(DEG 的))
(NP (NN 住房)))
(ADVP (AD 怎么))
(VP (VV 办)))
(PU ”)))
(DEG 的))
(NP (NN 疑问))))
(PU 。)))

nmod(法-3, 物-1)
nmod(法-3, 权-2)
assmod(规定-7, 法-3)
assm(法-3, 的-4)
det(规定-7, 这-5)
det(这-5, 一-6)
nsubj(回答-9, 规定-7)
asp(回答-9, 了-10)
amod(群众-12, 广大-11)
nmod(疑问-28, 群众-12)
assmod(疑问-28, 关于-13)
tcomp(到期-18, 70年-15)
amod(限-17, 大-16)
nsubj(到期-18, 限-17)
tclaus(后-19, 到期-18)
lccomp(办-25, 后-19)
assmod(住房-23, 我们-21)
assm(我们-21, 的-22)
dobj(办-25, 住房-23)
advmod(办-25, 怎么-24)
clmpd(关于-13, 办-25)
assm(关于-13, 的-27)
dobj(回答-9, 疑问-28)
 
回复: Stanford Parser online

But this thread is mainly about Standford parser,not about the tagger.

It's quite similar actually. Anyway, firstly creat a bat file by copying and pasting the following lines (between the equal signs in blue) to your notepad:

=============================
@echo off
:: To tag a file using the pre-trained bidirectional model
:: usage: postagger.bat inputfile
java -mx300m -classpath postagger-2006-05-21.jar edu.stanford.nlp.tagger.maxent.MaxentTagger -model wsj3t0-18-bidirectional/train-wsj-0-18 -file input.txt >output.txt
=============================

Next, save it as a plain text file with the name of postagger.bat to the same folder where your Standford POS Tagger program is;

Then, save an English text with the name as input.txt to the same folder where the Tagger and postagger.bat are;

Finally, go to the folder where the Tagger, the postagger.bat and the input.txt are located, and double click the postagger.bat file to get your result file output.txt.

To tag another file, simply rename output.txt, and change the content of the input.txt file.

Good luck!
 
回复: Stanford Parser online

thanks again,Dr.hong.
the following is a tagged text.

Zidane/NNP remains/VBZ best-loved/JJ French/JJ despite/IN head-butt/NN
Zinedine/NNP Zidane/NNP remains/VBZ France/NNP 's/POS best-loved/JJ personality/NN despite/IN his/PRP$ head-butt/NN against/IN Italy/NNP 's/POS Marco/NNP Materazzi/NNP in/IN the/DT 2006/CD soccer/NN World/NNP Cup/NNP final/JJ ,/, a/DT survey/NN showed/VBD on/IN Saturday/NNP ./.
Zidane/NNP came/VBD first/RB in/IN a/DT ranking/NN of/IN France/NNP 's/POS Top/NNP 50/CD personalities/NNS ,/, beating/VBG ex-tennis/NN champion/NN Yannick/NNP Noah/NNP who/WP came/VBD in/IN the/DT second/JJ place/NN ,/, and/CC leaving/VBG singers/NNS Charles/NNP Aznavour/NNP and/CC Johnny/NNP Hallyday/NNP ,/, as/IN well/RB as/IN actor/NN Gerard/NNP Depardieu/NNP behind/IN ./.
The/DT footballer/NN had/VBD already/RB come/VBN first/RB in/IN a/DT comparable/JJ survey/NN by/IN pollster/NN Ifop/NNP six/CD months/NNS ago/RB ./.
Socialist/JJ presidential/JJ candidate/NN Segolene/NNP Royal/NNP came/VBD in/IN 23rd/CD place/NN ,/, up/RB from/IN 49th/CD in/IN last/JJ July/NNP ,/, and/CC her/PRP$ likely/JJ conservative/NN challenger/NN for/IN next/JJ year/NN 's/POS election/NN ,/, Nicolas/NNP Sarkozy/NNP ,/, came/VBD 42nd/NNP in/IN the/DT Ifop/NNP poll/NN ./.
Sarkozy/NNP had/VBD not/RB appeared/VBN on/IN the/DT list/NN in/IN July/NNP ./.
Both/DT Royal/NNP and/CC Sarkozy/NNP were/VBD overtaken/VBN by/IN television/NN star/NN Nicolas/NNP Hulot/NNP ,/, who/WP has/VBZ threatened/VBN to/TO run/VB for/IN president/NN unless/IN mainstream/NN politicians/NNS do/VBP more/JJR for/IN the/DT environment/NN ./. He/PRP came/VBD third/JJ in/IN the/DT poll/NN of/IN 1,064/CD people/NNS ./.
Zidane/NNP was/VBD shown/VBN a/DT red/JJ card/NN 10/CD minutes/NNS before/IN the/DT end/NN of/IN extra/JJ time/NN in/IN the/DT July/NNP 9/CD final/JJ after/IN head/NN butting/VBG Materazzi/NNP in/IN the/DT chest/NN ./. That/DT ensured/VBD he/PRP missed/VBD the/DT penalty/NN shootout/NN that/WDT decided/VBD the/DT match/NN in/IN Italy/NNP 's/POS favor/NN ./.
The/DT incident/NN inspired/VBD the/DT summer/NN hit/VBD ``/`` Coup/NNP de/IN Boule/NNP ''/'' -LRB-/-LRB- Head/NNP Butt/NNP -RRB-/-RRB- in/IN France/NNP ,/, and/CC an/DT Italian/JJ designer/NN has/VBZ come/VBN up/RP with/IN a/DT line/NN of/IN sweatshirts/NNS with/IN two/CD stick/NN figures/NNS depicting/VBG the/DT incident/NN printed/VBN on/IN the/DT back/NN ./.
 
回复: Stanford Parser online

It seems that both Gototagger and Standford share the same tagsets, Brill tagsets.
the following is the same text tagged with Gototagger.
Zidane/NNP remains/VBZ best-loved/JJ French/JJ despite/IN head-butt/JJ
Zinedine/NNP Zidane/NNP remains/VBZ France's/NNP best-loved/JJ personality/NN despite/IN his/PRP$ head-butt/JJ against/IN Italy's/NNP Marco/NNP Materazzi/NNP in/IN the/DT 2006/CD soccer/NN World/NNP Cup/NNP final,/VBG a/DT survey/NN showed/VBD on/IN Saturday./NNP
Zidane/NNP came/VBD first/JJ in/IN a/DT ranking/NN of/IN France's/NNP Top/JJ 50/CD personalities,/NN beating/VBG ex-tennis/NN champion/NN Yannick/NNP Noah/NNP who/WP came/VBD in/IN the/DT second/JJ place,/NN and/CC leaving/VBG singers/NNS Charles/NNP Aznavour/NNP and/CC Johnny/NNP Hallyday,/NNP as/NNP well/RB as/IN actor/NN Gerard/NNP Depardieu/NNP behind./JJ /NN
The/DT footballer/NN had/VBD already/RB come/VB first/JJ in/IN a/DT comparable/JJ survey/NN by/IN pollster/NN Ifop/NNP six/CD months/NNS ago./RB /VBP
Socialist/NNP presidential/JJ candidate/NN Segolene/NNP Royal/NNP came/VBD in/IN 23rd/CD place,/NN up/IN from/IN 49th/JJ in/IN last/JJ July,/NNP and/CC her/PRP$ likely/JJ conservative/JJ challenger/NN for/IN next/JJ year's/NNS election,/VBP Nicolas/NNP Sarkozy,/NNP came/VBD 42nd/JJ in/IN the/DT Ifop/NNP poll./CD /NN
Sarkozy/NNP had/VBD not/RB appeared/VBN on/IN the/DT list/NN in/IN July./NNP /NN
Both/DT Royal/NNP and/CC Sarkozy/NNP were/VBD overtaken/VBN by/IN television/NN star/NN Nicolas/NNP Hulot,/NNP who/WP has/VBZ threatened/VBN to/TO run/VB for/IN president/NN unless/IN mainstream/NN politicians/NNS do/VBP more/JJR for/IN the/DT environment./JJ He/PRP came/VBD third/JJ in/IN the/DT poll/NN of/IN 1,064/CD people./CD /NN
Zidane/NNP was/VBD shown/VBN a/DT red/JJ card/NN 10/CD minutes/NNS before/IN the/DT end/NN of/IN extra/JJ time/NN in/IN the/DT July/NNP 9/CD final/JJ after/IN head/NN butting/VBG Materazzi/NNP in/IN the/DT chest./VBN That/DT ensured/VBD he/PRP missed/VBD the/DT penalty/NN shootout/NN that/IN decided/VBN the/DT match/NN in/IN Italy's/NNP favour./CD /NN
The/DT incident/NN inspired/VBD the/DT summer/NN hit/VBD "Coup/NN de/FW Boule"/NNP (Head/NNP Butt)/NNP in/IN France,/NNP and/CC an/DT Italian/JJ designer/NN has/VBZ come/VBN up/IN with/IN a/DT line/NN of/IN sweatshirts/NNS with/IN two/CD stick/NN figures/NNS depicting/VBG the/DT incident/NN printed/VBN on/IN the/DT back./CD
 
回复: Stanford Parser online

Oh,I see, thanks laohong, you are always helpful and sincere. I learnt a lot from you.
 
回复: Stanford Parser online

how to use Standford Classifier? I tried, but the two executable Jar files can not run,why?
 
回复: Stanford Parser online

You guys are asking a multi-milllion dollars question. Why do you need a classifier like that?
 
回复: Stanford Parser online

It's quite similar actually. Anyway, firstly creat a bat file by copying and pasting the following lines (between the equal signs in blue) to your notepad:

=============================
@echo off
:: To tag a file using the pre-trained bidirectional model
:: usage: postagger.bat inputfile
java -mx300m -classpath postagger-2006-05-21.jar edu.stanford.nlp.tagger.maxent.MaxentTagger -model wsj3t0-18-bidirectional/train-wsj-0-18 -file input.txt >output.txt
=============================

Next, save it as a plain text file with the name of postagger.bat to the same folder where your Standford POS Tagger program is;

Then, save an English text with the name as input.txt to the same folder where the Tagger and postagger.bat are;

Finally, go to the folder where the Tagger, the postagger.bat and the input.txt are located, and double click the postagger.bat file to get your result file output.txt.

To tag another file, simply rename output.txt, and change the content of the input.txt file.

Good luck!

Hi Laohong, I followed your instructions and the english text was successfully tagged. do you know how to tag a chinese file? I tried to search information on the website of Stanford NLP, but I could't find the relevant information... It will be very appreciated if you can help me to sort it out.
 
回复: Stanford Parser online

The PosTagger was trained for English texts, though it's said you can train it to tag Chinese texts. However, it may be difficult for many of us to do so. It'd be good to use ICTCLAS_Win.exe to tag your Chinese tests. You can download it under "NLP Tools" in my online storage at:

http://corpuslaohong.ys168.com/
Password: corpus4u
Leave a message there after you got it.

If you do want to tag Chinese texts with Standford tools, the Standford Parser can also produce the POS information for Chinese texts. Read my instruction on how to parse a Chinese text with Standford Parser in earlier posts.
 
Back
顶部