The program is contained in the executable jar file qtag.jar, the
English resource file is BLT.dat, and it is described in BLT.txt.
Use it as follows:
- input.txt is a plain text file (can contain SGML mark-up) which
is tokenised (ie words and punctuation separated by spaces).
java -jar qtag.jar BLT.dat < input.txt > output.txt
- output.txt will now contain the tagged text.
How to create your own resource files:
- you need a pre-tagged corpus for training the tagger.
The format of this corpus has to be as follows:
* one token followed by its tag (separated by spaces)
* one pair per line
For example:
The det
cat noun-sing
sat verb-past
on prep
the det
mat noun-sing
. punct
- run the LexiconCreator to generate a resource file from this:
java -cp qtag.jar qtag.LexiconCreator catparse.dat < cat.txt
- use it with "java -jar qtag.jar catparse.dat < > mytext.out":
<w pos="det">The</w>
<w pos="noun-sing">cat</w>
<w pos="verb-past">sat</w>
<w pos="prep">on</w>
<w pos="det">the</w>
<w pos="noun-sing">mat</w>
<w pos="punct">.</w>
使用Qtag需要 java 虚拟环境;TOSCA/LOB 是在 DOS 下运行的,需要一些设置。仔细看看里面的说明就什么都明白了。
The program is contained in the executable jar file qtag.jar, the
English resource file is BLT.dat, and it is described in BLT.txt.
Use it as follows:
- input.txt is a plain text file (can contain SGML mark-up) which
is tokenised (ie words and punctuation separated by spaces).
java -jar qtag.jar BLT.dat < input.txt > output.txt
- output.txt will now contain the tagged text.
How to create your own resource files:
- you need a pre-tagged corpus for training the tagger.
The format of this corpus has to be as follows:
* one token followed by its tag (separated by spaces)
* one pair per line
For example:
The det
cat noun-sing
sat verb-past
on prep
the det
mat noun-sing
. punct
- run the LexiconCreator to generate a resource file from this:
java -cp qtag.jar qtag.LexiconCreator catparse.dat < cat.txt
- use it with "java -jar qtag.jar catparse.dat < > mytext.out":
<w pos="det">The</w>
<w pos="noun-sing">cat</w>
<w pos="verb-past">sat</w>
<w pos="prep">on</w>
<w pos="det">the</w>
<w pos="noun-sing">mat</w>
<w pos="punct">.</w>
使用Qtag需要 java 虚拟环境;TOSCA/LOB 是在 DOS 下运行的,需要一些设置。仔细看看里面的说明就什么都明白了。