http://corpus.sjtu.edu.cn/

QTAG 3.0 README

The program is contained in the executable jar file qtag.jar, the
English resource file is BLT.dat, and it is described in BLT.txt.

Use it as follows:
- input.txt is a plain text file (can contain SGML mark-up) which
is tokenised (ie words and punctuation separated by spaces).

java -jar qtag.jar BLT.dat < input.txt > output.txt

- output.txt will now contain the tagged text.

How to create your own resource files:
- you need a pre-tagged corpus for training the tagger.
The format of this corpus has to be as follows:
* one token followed by its tag (separated by spaces)
* one pair per line
For example:
The det
cat noun-sing
sat verb-past
on prep
the det
mat noun-sing
. punct

- run the LexiconCreator to generate a resource file from this:
java -cp qtag.jar qtag.LexiconCreator catparse.dat < cat.txt

- use it with "java -jar qtag.jar catparse.dat < mytext.in > mytext.out":

<w pos="det">The</w>
<w pos="noun-sing">cat</w>
<w pos="verb-past">sat</w>
<w pos="prep">on</w>
<w pos="det">the</w>
<w pos="noun-sing">mat</w>
<w pos="punct">.</w>

使用Qtag需要 java 虚拟环境;TOSCA/LOB 是在 DOS 下运行的,需要一些设置。仔细看看里面的说明就什么都明白了。
 
我在 15 楼提供的链接暂时关闭。待进一步整理后再放开。对不起!

[本贴已被 作者 于 2005年11月14日 17时15分25秒 编辑过]
 
回复:http://corpus.sjtu.edu.cn/

现在才看到这个帖子,惭愧呀
还是打不开那个网址,难过呦。
 
无法显示网页
您正在查找的页当前不可用。 网站可能遇到支持问题,或者您需要 调整您的浏览器设置。
 
Back
顶部