Stanford Parser online

laohong

管理员
Staff member
#2
回复: Stanford Parser online

To follow up Dr. Xu's recommendation, here is a list of corpus tools developed by Standford NLP Group:

The Stanford Parser
Java implementations of probabilistic natural language parsers, both highly optimized PCFG and dependency parsers, and a lexicalized PCFG parser.

Online parser demo at: http://josie.stanford.edu:8080/parser/

Download the full package (requires Java 5 or JDK1.5 to run) at: http://nlp.stanford.edu/downloads/StanfordParser-2006-06-11.tar.gz


The Stanford POS Tagger
A Java implementation of a maximum-entropy part-of-speech (POS) tagger

Download Stanford Tagger version 2006-05-21 (requires JDK 1.5.0 or above to run) at: http://nlp.stanford.edu/software/postagger-2006-05-21.tar.gz


The Stanford Named Entity Recognizer
A Java implementation of a Conditional Random Field sequence model, together with well-engineered features for Named Entity Recognition.

Download Stanford Named Entity Recognizer version 1.0
http://nlp.stanford.edu/software/stanford-ner-2007-01-29.tar.gz


Stanford Chinese Word Segmenter
A Java implementation of a CRF-based Chinese Word Segmenter

Download Stanford Chinese Segmenter version 2006-05-11 (requires JDK 1.5.0 or above to run) at: http://nlp.stanford.edu/software/StanfordChineseSegmenter-2006-05-11.tar.gz


The Stanford Classifier
A Java implementation of conditional loglinear model classification (a.k.a. maximum entropy or multiclass logistic regression models)

Download Stanford Classifier version 1.0
http://nlp.stanford.edu/software/StanfordClassifier.tar.gz


Tregex and Tsurgeon
A Java implementation of a Tgrep2-style utility for matching patterns in trees, and a tree-transformation utility built on top of this matching language.

Download Tregex version 1.2 at: vhttp://nlp.stanford.edu/software/tregex.tar.gz


For more information about these tools, visit http://nlp.stanford.edu/software/index.shtml
 

laohong

管理员
Staff member
#3
回复: Stanford Parser online

这个Parser包还不错,中英文都能处理,就是结果没法存下来。不知道许博士试过没有。
 

附件

armstrong

高级会员
#4
回复: Stanford Parser online

when i load the paser,it shows that"Could not load the paser. Out of memory"
why?
the tagger can not be used, either.

I have installed JDk.
 

laohong

管理员
Staff member
#5
回复: Stanford Parser online

终于可以直接把Parser的结果输出到文本文件了。刚才测试了一下批处理下面四个句子(卫教授写给Sinclair悼唁里的前四句):
We are shocked to hear that Professor John Sinclair has left us. Undoubtedly, the 13th of March 2007 is a saddest day to the world linguistics, Corpus Linguistics in particular. The gap left by the departure of this innovative thinker and distinguished linguist will be felt in the hearts of the researchers working along the lines he has set. In deepest sorrow, we, linguists at Shanghai Jiao Tong University, China, found that we cannot express with words our gratitude and respect to John.

得到结果如下:
Parsed 94 words in 4 sentences (13.73 wds/sec; 0.58 sents/sec).

也跟大家分享详细的每句树型结果:

We are shocked to hear that Professor John Sinclair has left us.
(ROOT
(S
(NP (PRP We))
(VP (VBP are)
(ADJP (JJ shocked)
(S
(VP (TO to)
(VP (VB hear)
(SBAR (IN that)
(S
(NP (NNP Professor) (NNP John) (NNP Sinclair))
(VP (VBZ has)
(VP (VBN left)
(NP (PRP us)))))))))))
(. .)))

nsubj(shocked-3, We-1)
cop(shocked-3, are-2)
aux(hear-5, to-4)
xcomp(shocked-3, hear-5)
complm(left-11, that-6)
nn(Sinclair-9, Professor-7)
nn(Sinclair-9, John-8)
nsubj(left-11, Sinclair-9)
aux(left-11, has-10)
ccomp(hear-5, left-11)
dobj(left-11, us-12)



Undoubtedly, the 13th of March 2007 is a saddest day to the world linguistics, Corpus Linguistics in particular.
(ROOT
(S
(ADVP (RB Undoubtedly))
(, ,)
(NP
(NP (DT the) (NN 13th))
(PP (IN of)
(NP (NNP March) (CD 2007))))
(VP (VBZ is)
(NP
(NP (DT a) (JJ saddest) (NN day))
(PP (TO to)
(NP
(NP (DT the) (NN world) (NNS linguistics))
(, ,)
(NP
(NP (NNP Corpus) (NNP Linguistics))
(PP (IN in)
(NP (NN particular))))))))
(. .)))

advmod(day-11, Undoubtedly-1)
det(13th-4, the-3)
nsubj(day-11, 13th-4)
prep_of(13th-4, March-6)
num(March-6, 2007-7)
cop(day-11, is-8)
det(day-11, a-9)
amod(day-11, saddest-10)
det(linguistics-15, the-13)
nn(linguistics-15, world-14)
prep_to(day-11, linguistics-15)
nn(Linguistics-18, Corpus-17)
appos(linguistics-15, Linguistics-18)
prep_in(Linguistics-18, particular-20)


The gap left by the departure of this innovative thinker and distinguished linguist will be felt in the hearts of the researchers working along the lines he has set.
(ROOT
(S
(NP
(NP
(NP (DT The) (NN gap))
(VP (VBN left)
(PP (IN by)
(NP
(NP (DT the) (NN departure))
(PP (IN of)
(NP (DT this) (JJ innovative) (NN thinker)))))))
(CC and)
(NP (VBN distinguished) (NN linguist)))
(VP (MD will)
(VP (VB be)
(VP (VBN felt)
(PP (IN in)
(NP
(NP (DT the) (NNS hearts))
(PP (IN of)
(NP (DT the) (NNS researchers)))))
(S
(VP (VBG working)
(PRT (RP along))
(NP
(NP (DT the) (NNS lines))
(SBAR
(S
(NP (PRP he))
(VP (VBZ has)
(VP (VBN set)))))))))))
(. .)))

det(gap-2, The-1)
nsubjpass(felt-16, gap-2)
partmod(gap-2, left-3)
det(departure-6, the-5)
prep_by(left-3, departure-6)
det(thinker-10, this-8)
amod(thinker-10, innovative-9)
prep_of(departure-6, thinker-10)
amod(linguist-13, distinguished-12)
conj_and(gap-2, linguist-13)
aux(felt-16, will-14)
auxpass(felt-16, be-15)
det(hearts-19, the-18)
prep_in(felt-16, hearts-19)
det(researchers-22, the-21)
prep_of(hearts-19, researchers-22)
partmod(felt-16, working-23)
prt(working-23, along-24)
det(lines-26, the-25)
dobj(working-23, lines-26)
nsubj(set-29, he-27)
aux(set-29, has-28)
rcmod(lines-26, set-29)


In deepest sorrow, we, linguists at Shanghai Jiao Tong University, China, found that we cannot express with words our gratitude and respect to John.
(ROOT
(S
(PP (IN In)
(NP (JJS deepest) (NN sorrow)))
(, ,)
(NP
(NP (PRP we))
(, ,)
(NP
(NP (NNS linguists))
(PP (IN at)
(NP
(NP (NNP Shanghai) (NNP Jiao) (NNP Tong) (NNP University))
(, ,)
(NP (NNP China)))))
(, ,))
(VP (VBD found)
(SBAR (IN that)
(S
(NP (PRP we))
(VP (MD can) (RB not)
(VP (VB express)
(PP (IN with)
(NP (NNS words)))
(NP
(NP (PRP$ our) (NN gratitude)
(CC and)
(NN respect))
(PP (TO to)
(NP (NNP John)))))))))
(. .)))

amod(sorrow-3, deepest-2)
prep_in(found-16, sorrow-3)
nsubj(found-16, we-5)
appos(we-5, linguists-7)
nn(University-12, Shanghai-9)
nn(University-12, Jiao-10)
nn(University-12, Tong-11)
prep_at(linguists-7, University-12)
appos(University-12, China-14)
complm(express-21, that-17)
nsubj(express-21, we-18)
aux(express-21, can-19)
neg(express-21, not-20)
ccomp(found-16, express-21)
prep_with(express-21, words-23)
poss(gratitude-25, our-24)
dobj(express-21, gratitude-25)
conj_and(gratitude-25, respect-27)
prep_to(gratitude-25, John-29)
 

laohong

管理员
Staff member
#6
回复: Stanford Parser online

很可惜,树型结构的结果在这里贴出来后就显示的很难看,大家还是下载附件里的文本文件看吧。
 
Last edited:

laohong

管理员
Staff member
#8
回复: Stanford Parser online

I have the same problem with armstrong, why?
Not sure what exactly caused the problem to you and Armstrong, as the error didn't occur here. Maybe you should tell us exactly what happened with your files.

Have you successful loaded the parser window by double clicking either "lexparser-gui.bat", "stanford-parser-2006-06-11.jar" or "stanford-parser.jar"? If not, you may need to check whether your JDK is correctly installed.

If the parser window can be loaded, you should try to type in ONLY one or two sentences to test. If you want to open an existing text, it must be in utf-8 format. It'd be good to try with a text of only a few sentences first. Then load the parser file (englishFactored.ser.gz for English text; chineseFactored.ser.gz for Chinese text). It may take a while to load it as

"The current version of the parser requires Java 5 (JDK1.5 or above). The parser also requires plenty of memory (a minimum of 100Mb to run as a PCFG parser on sentences up to 40 words in length; typically around 500Mb of memory to be able to parse similarly long typical-of-newswire sentences using the factored model). "

Once the parser file is loaded, click one sentence in the text window, and it will be highlighted in yellow. Then click Parse, you should see the result in the output window in a second.
 

laohong

管理员
Staff member
#10
回复: Stanford Parser online

The same parser loading error on my computer. :(
Maybe your computer's RAM is not big enough (mine is 1GB). Alternatively, try it from command line with "lexparser.bat input.txt >output.txt".

How much memory do I need to parse very long sentences?

Memory usage by the parser depends on a number of factors:

Memory usage expands roughly with the square of the sentence length. You may wish to set a -maxLength and to skip long sentences.

The factored parser requires several times as much memory as just running the PCFG parser, since it runs 3 parsers.

The command-line version of the parser currently loads the whole of a file into memory before parsing any of it. If your file is extremely large, splitting it into multiple files and parsing them sequentially will reduce memory usage.
A 64-bit application requires more memory than a 32-bit application (Java uses lots of pointers).

A larger grammar or POS tag set requires more memory than a smaller one.
Below are some statistics for 32-bit operation with the supplied englishPCFG and englishFactoredGrammars. We have parsed sentences as long as 234 words, but you need lots of RAM and patience.

Length PCFG Factored
20 50 MB 250 MB
50 125 MB 600 MB
100 350 MB 2100 MB
 

xujiajin

管理员
Staff member
#11
回复: Stanford Parser online

My memory 1G too. I reinstalled Java JDK 1.5, and failed to load any of the parsers again.
 

xujiajin

管理员
Staff member
#12
回复: Stanford Parser online

Tried the command mode, error log said: no "server", something call JVM.dll was missing.
 

laohong

管理员
Staff member
#13
回复: Stanford Parser online

Tried the command mode, error log said: no "server", something call JVM.dll was missing.
I 'm using the latest version of JDK (1.6.0), which can be downloaded from:
https://sdlc1e.sun.com/ECom/EComActionServlet;jsessionid=E40CF9A34D7396FED90FF9BEF430941F

If the above link doesn't work, try to find it at this web site: http://java.sun.com/javase/downloads/index.jsp
click JDK6 to the download page.

The file is about 53.16 MB in size for Windows Offline Installation, Multi-language version.

You may uninstall all previous versions of JDK or JRE first, then install this latest version. After installation completes, copy the folder named "server" from C:\Program Files\Java\jdk1.6.0\jre\bin\ to C:\Program Files\Java\jre1.6.0\bin\.

Then, you need set your Java path by:

Right click My Computer icon, choose Properties, Advanced, Environment Variables, in the System Variables box, find Path, click Edit, and add the following line (all in English) to the end of the text there:

;C:\Program Files\Java\jdk1.6.0\bin;C:\Program Files\Java\jre1.6.0\bin

Finally, reboot your machine to try the parser. The problem of "missing server jvm.dll" should be resolved.
 
#16
回复: Stanford Parser online

It sucks. Registration required for downloading. I gave up.
I just downloaded JDK 1.6,and it did not to registrate.
the paser can be loaded,but I only input a sentence with only ten chinese characters, it begins to parse, however it shows "error, perhaps the sentence is too long. why?

thanks laonong's clear instruction.
 
#18
回复: Stanford Parser online

I just downloaded JDK 1.6,and it did not to registrate.
the paser can be loaded,but I only input a sentence with only ten chinese characters, it begins to parse, however it shows "error, perhaps the sentence is too long. why?




Yes,i have the same problem when I try parsing a very short sentence, the paser warns that "perhaps the sentence is too long."
 

laohong

管理员
Staff member
#19
回复: Stanford Parser online

Yes,i have the same problem when I try parsing a very short sentence, the paser warns that "perhaps the sentence is too long."
Please make sure:

1. Put the pasrer program package in a folder whose name has no Chinese characters;

2. Start the program by clicking lexparser-gui.bat (it's quite easy to get "out of memory" error if you start it from the two jar files);

3. Load chineseFactored.ser.gz for Chinese text, and you'd better not do anything else while waiting for the parser to load completely, otherwise you may get the "out of memory" error;

3. Under Language Tab, choose "Tokenized Simplified Chinese (utf-8)";

4. Input your Chinese sentence, and leave a space between words, e.g. 赵 先生 是 个 大学 老师 。 他 很 喜欢 写 文章 。

Good luck, guys!
 
#20
回复: Stanford Parser online

Dr.Hong,
I followed your steps and got the result,but it can only process a sentence,how can I process a text and output the result?
thank you.
 
顶部