回复: UAM CorpusTool: Text Annotation for the 21st Century...and its free
The current MMAX's Project Wizard just sucks, you should not waste your time on it. and that's why we developed our own scheme designer instead. It doesn't matter if you don't have the scheme desinger, you can still open the scheme file in the Scheme folder with a text editor to code it manually. As my earlier example shows, simply add a "
next="xxxxxx"", you can make it hierarchical (refer to "
<value id="value_26" name = "anaphoric" next="anaphoric_type"/>" please).
As far as the exporting XML is concernd, MMAX has no such a feature, as there may be many layers annotated for a same basedata, and even within a same layer there are embedding annotations, which the standard XML has no way to handle (let's talk about this better in another thread, and I'll send a paper to AAACL coference on this soon). Basically, there is a Basedata folder, which has the wordlist file, and the Markables folder has a bundle of markable files (if you have several schemes). Each (annotated) markable file is structured as below:
<?xml version="1.0"?>
<!DOCTYPE markables SYSTEM "markables.dtd">
<markables xmlns="www.eml.org/NameSpaces/coref">
<markable id="markable_404" span="word_3112..word_3113" refer_to="empty" coref_chain="set_262" />
<markable id="markable_266" span="word_2371..word_2372" coref_type="ident" refer_to="markable_145" minimal="levels" coref_chain="set_325" />
<markable id="markable_527" span="word_3584..word_3587" coref_type="ident" refer_to="empty" minimal="upregulation" coref_chain="set_401" />
........
</markables>
The wordspans of each markable files are actually mapping to the wordlist in Basedata folder (and this is the advantage of stand-off annotation). With a converter (not bundled with MMAX, we've developed our own) we can easily get the standard XML result as below:
<?xml version="1.0" encoding="UTF-8"?>
<?xml:stylesheet type="text/xsl" href="coref-table.xsl"?>
<DOC>
<articleinfo>
<bibliomisc>MEDLINEMC_1064895</bibliomisc>
</articleinfo>
<s> Increased <COREF ID="1" MIN="production"> interleukin-17 production </COREF> via a phosphoinositide 3-kinase/Akt and nuclear factor κB-dependent pathway in <COREF ID="2" MIN="patients"> patients with <COREF ID="3"> rheumatoid arthritis </COREF> </COREF> </s>
<s> Inflammatory mediators have been recognized as being important in <COREF ID="4" MIN="pathogenesis"> the pathogenesis of <COREF ID="5" REF="3" TYPE="ident"> rheumatoid arthritis (RA) </COREF> </COREF> . <COREF ID="6"> Interleukin (IL)-17 </COREF> is an important regulator of immune and inflammatory responses, including the induction of <COREF ID="7" MIN="cytokines"> proinflammatory cytokines </COREF> and <COREF ID="8" MIN="resorption"> osteoclastic bone resorption </COREF> .
......
</DOC>
...That's a very nice application, but for general purposes the user would want the basic XML file of the coding results as a first step. I wouldn't worry too much about the query aspect at this stage.
For simple annotation of just a few files, surely it's too early to talk about how to query the annotation results. If there are over a dozen of files annotated, a query package is defitely needed.