ineedgerf
普通会员
QTAG 3.0 README
The program is contained in the executable jar file qtag.jar, the
English resource file is BLT.dat, and it is described in BLT.txt.
Use it as follows:
- input.txt is a plain text file (can contain SGML mark-up) which
is tokenised (ie words and punctuation separated by spaces).
java -jar qtag.jar BLT.dat < input.txt > output.txt
- output.txt will now contain the tagged text.
How to create your own resource files:
- you need a pre-tagged corpus for training the tagger.
The format of this corpus has to be as follows:
* one token followed by its tag (separated by spaces)
* one pair per line
For example:
The det
cat noun-sing
sat verb-past
on prep
the det
mat noun-sing
. punct
- run the LexiconCreator to generate a resource file from this:
java -cp qtag.jar qtag.LexiconCreator catparse.dat < cat.txt
- use it with "java -jar qtag.jar catparse.dat < mytext.in > mytext.out":
<w pos="det">The</w>
<w pos="noun-sing">cat</w>
<w pos="verb-past">sat</w>
<w pos="prep">on</w>
<w pos="det">the</w>
<w pos="noun-sing">mat</w>
<w pos="punct">.</w>
使用Qtag需要 java 虚拟环境;TOSCA/LOB 是在 DOS 下运行的,需要一些设置。仔细看看里面的说明就什么都明白了。
The program is contained in the executable jar file qtag.jar, the
English resource file is BLT.dat, and it is described in BLT.txt.
Use it as follows:
- input.txt is a plain text file (can contain SGML mark-up) which
is tokenised (ie words and punctuation separated by spaces).
java -jar qtag.jar BLT.dat < input.txt > output.txt
- output.txt will now contain the tagged text.
How to create your own resource files:
- you need a pre-tagged corpus for training the tagger.
The format of this corpus has to be as follows:
* one token followed by its tag (separated by spaces)
* one pair per line
For example:
The det
cat noun-sing
sat verb-past
on prep
the det
mat noun-sing
. punct
- run the LexiconCreator to generate a resource file from this:
java -cp qtag.jar qtag.LexiconCreator catparse.dat < cat.txt
- use it with "java -jar qtag.jar catparse.dat < mytext.in > mytext.out":
<w pos="det">The</w>
<w pos="noun-sing">cat</w>
<w pos="verb-past">sat</w>
<w pos="prep">on</w>
<w pos="det">the</w>
<w pos="noun-sing">mat</w>
<w pos="punct">.</w>
使用Qtag需要 java 虚拟环境;TOSCA/LOB 是在 DOS 下运行的,需要一些设置。仔细看看里面的说明就什么都明白了。