BFSU Stanford POS Tagger: A Graphical Interface Windows Version 免费词性赋码工具 词性标注

xujiajin

管理员
Staff member
This GUI (Graphical User Interface) version of the Stanford POS Tagger was developed by Mr. Yunlong Jia, and designed by Dr. Jiajin Xu and Mr. Yunlong Jia.

This tagger automatically assigns part of speech information to each word in the loaded English plain text(s). The result of tagging will be saved after execution by default in the same directory of the source text(s) with the extension *.tag. To tag the texts, you need to choose one of the pre-loaded tagging models after you import your raw texts.

The tagger uses Penn Treebank tag set, which is described in Treebank POS tagset.pdf in the program folder.

Please note that the tool requires Java 1.5+ to be installed before you can analyze any texts.

More information about the Stanford POS tagger is available at http://nlp.stanford.edu/software/tagger.shtml.

Please cite the program as:

Xu, Jiajin & Yunlong Jia. (2011). BFSU Stanford POS Tagger: A Graphical Interface Windows Version. Beijing: National Research Center for Foreign Language Education, Beijing Foreign Studies University.

DOWNLOAD
http://www.fleric.org.cn/pub/soft/BFSU_Stanford_POS_Tagger1.1.2.rar

Also downloadable at http://ishare.iask.sina.com.cn/f/13470938.html


春节期间,WilliamJia不辞辛苦,编写了一个词性赋码工具,第一时间与诸位分享。
该工具原为斯坦福大学NLP中心开发,但用户界面很不友好,因此特开发了Windows界面。运行前,请安装Java运行环境。
http://www.java.com/zh_CN/download/

http://www.xdowns.com/soft/6/56/2007/Soft_37451.html
 
回复: 免费词性标注工具BFSU Stanford POS Tagger 1.0: A Graphical Interface Windows Version词性赋码

谢谢许老师第一时间的分享!

二位老师辛苦了,为论坛上传了这么多好东西,下载收藏了。
 
回复: 免费词性标注工具BFSU Stanford POS Tagger 1.0: A Graphical Interface Windows Version词性赋码

不知道准确率怎么样
好像corpora 识别不出来,只能标示NN 而不是NNS
 
回复: 免费词性标注工具BFSU Stanford POS Tagger 1.0: A Graphical Interface Windows Version词性赋码

我查了很多文献,没有找到准确率的数据报告。从实际标注看,很少有错。

另外,斯坦福的这个NLP中心是顶尖级的研究机构,产品比较值得信赖。准确率应与TreeTagger相当,而且两者所用码集相同。
 
回复: 免费词性标注工具BFSU Stanford POS Tagger 1.0: A Graphical Interface Windows Version词性赋码

两位博士为这个论坛奉献了很多好东东,感谢二位!
 
回复: 免费词性标注工具BFSU Stanford POS Tagger 1.0: A Graphical Interface Windows Version词性赋码

感谢分享!在新的一年,祝许老师事业春风、云龙学业“兔”飞猛进!
 
回复: 免费词性标注工具BFSU Stanford POS Tagger 1.0: A Graphical Interface Windows Version词性赋码

许老师,您好!我下了您的词性标注工具,自己建了两个微型语料库想对它们进行标注,不过好像要tagging models,想请教一下在哪里可以找到,找了半天没找到。。。
因为目前在准备写这方面的论文,之前没接触过语料库。。。 谢谢哦!:)
 
回复: 免费词性标注工具BFSU Stanford POS Tagger 1.0: A Graphical Interface Windows Version词性赋码

model在压缩包里有。

你先要加载文本【Choose Texts】,然后,【Select Tagger】,选择两个models任意一个都可以,我一般选bidirectional-distsim-wsj-0-18
 
回复: 免费词性标注工具BFSU Stanford POS Tagger 1.0: A Graphical Interface Windows Version词性赋码

不知道准确率怎么样
好像corpora 识别不出来,只能标示NN 而不是NNS

http://nlp.stanford.edu/software/pos-tagger-faq.shtml

In applications, we nearly always use the left3words-wsj-0-18.tagger model, and we suggest you do too. It's nearly as accurate (96.97% accuracy vs. 97.32% on the standard WSJ22-24 test set) and is an order of magnitude faster.
 
回复: 免费词性标注工具BFSU Stanford POS Tagger 1.0: A Graphical Interface Windows Version词性赋码

谢谢WilliamJia的补充,这下清楚了,也放心了,准确率可以达到97.32%。而且建议大家用left3words-wsj-0-18.tagger model。实际上,两个model的准确率不相上下。
 
回复: 免费词性标注工具BFSU Stanford POS Tagger 1.0: A Graphical Interface Windows Version词性赋码

新版说明:


BFSU Stanford POS Tagger: A Graphical Interface Windows Version

About

This GUI (Graphical User Interface) version of the Stanford POS Tagger was developed by Mr. Yunlong Jia, and designed by Dr. Jiajin Xu and Mr. Yunlong Jia.

This tagger automatically assigns part of speech information to each word in the loaded English text(s), and generates output texts in either of the three formats, i.e. word_POS, word/POS, and XML. The result of tagging will be saved in the same directory of the source text(s).

To tag the texts, you need to choose one of the pre-loaded tagging models after you import your raw texts. The left3words-wsj-0-18.tagger is recommended, which achieves an accuracy of 97.32% and faster. The other model, the bidirectional one, has an accuracy of 96.97% and slower (see also: Is your tagger slow? at http://nlp.stanford.edu/software/pos-tagger-faq.shtml).

The tagger uses Penn Treebank tag set, which is described in Treebank POS tagset.pdf in the program folder.

Please note that the tool requires Java 1.5+ to be installed before you can process any texts.

More information about the Stanford POS tagger is available at http://nlp.stanford.edu/software/tagger.shtml.

Please cite the program as:

Xu, Jiajin & Yunlong Jia. (2011). BFSU Stanford POS Tagger: A Graphical Interface Windows Version. Beijing: National Research Center for Foreign Language Education, Beijing Foreign Studies University.

BFSU Stanford POS Tagger: A Graphical Interface Windows Version is freeware. The software comes on an “as is” basis, and the authors will accept no liability for any damage that results from using the software.
 
回复: 免费词性标注工具BFSU Stanford POS Tagger 1.0: A Graphical Interface Windows Version词性赋码

我用samples里的两个文本测试了一下,用left3words-wsj-0-18.tagger那个model比bidirectional-distsim-wsj-0-18.tagger的那个model,速度快2倍(快5秒)。
 
回复: 免费词性标注工具BFSU Stanford POS Tagger 1.0: A Graphical Interface Windows Version词性赋码

model在压缩包里有。

你先要加载文本【Choose Texts】,然后,【Select Tagger】,选择两个models任意一个都可以,我一般选bidirectional-distsim-wsj-0-18

谢谢许老师哦,马上试试 。。。嘿嘿。。
 
回复: BFSU Stanford POS Tagger: A Graphical Interface Windows Version 免费词性赋码工具 词性标注

刚学用语料库,比较搞不清楚状况……

请问老师,打开界面后右边的settings那些怎么都不能点击?另外choose text的时候为何不能选择txt的文件?

谢谢老师!
 
回复: BFSU Stanford POS Tagger: A Graphical Interface Windows Version 免费词性赋码工具 词性标注

另外老师,我发现models文件夹里的文件都是.tagger或者.prop格式的,请问这是一种怎样的格式呢?该用什么打开?又该如何生成这样格式的文件呢?
 
回复: BFSU Stanford POS Tagger: A Graphical Interface Windows Version 免费词性赋码工具 词性标注

另外老师,我发现models文件夹里的文件都是.tagger或者.prop格式的,请问这是一种怎样的格式呢?该用什么打开?又该如何生成这样格式的文件呢?
那是训练后的模型,你不能改动。
你可以根据自己的语料,训练自己的模型。
 
回复: BFSU Stanford POS Tagger: A Graphical Interface Windows Version 免费词性赋码工具 词性标注

我现在知道怎么打开txt文档了,好像也tag成了,生成了.tag文件,该怎么打开呢?
 
回复: BFSU Stanford POS Tagger: A Graphical Interface Windows Version 免费词性赋码工具 词性标注

噢!可以设置成用记事本打开,谢谢啦!有问题我再过来问!
 
Back
顶部