回复: [下载]NooJ: a linguistic development environment
Dear colleagues,
we are pleased to announce the release of NooJ v2.0.
NooJ is a linguistic engineering development platform that allows
linguists and NLP developers to formalize various levels of
linguistic phenomena, and build various applications of NLP. See
www.nooj4nlp.net to download freely the software, its manual and
linguistic resources.
Beside a number of enhancements of the interface (syntax coloring,
linguistic resource management, etc.) and of its included free
linguistic resources, v2.0 contains:
-- A new corpus processor that applies a typical NooJ linguistic
query to a corpus made of 10,000+ texts in a few minutes.
-- A more robust dictionary compiler. For instance, it compiles the
Hungarian dictionary that describes the equivalent of a list of 120+
million word forms in a few hours (it takes a few minutes to compile
the English dictionary).
-- A new linguistic engine that better integrates the morphological
and syntactic levels of analyses via new operations on variables. Its
more visible enhancements are its two types of constraints:
<$N=:N+Hum> checks that the linguistic unit stored in variable $N
matches the query <N+Hum> (any NooJ query is valid to the right of
the operator "=:")
<$N$Nb="p"> checks that the value of the lexical property "Nb" of the
linguistic unit stored in $N is equal to "p"
<$N$Nb=$A$Nb> checks that the value of the lexical property "Nb" of
both linguistic units $N and $A are equal
Lexical properties can be set either in each dictionary entry
(e.g. "+Nb=p") or in the properties' definition file (via a rule such
as "Nb = s + p;").
-- When variables are not explicitely set, NooJ links them to the
corresponding lexical symbols. For instance, $N will be linked to the
nearest symbol <N> and $N$Vsup will encode the value of the property
VSup for the noun. Series of variables that are set in a loop can be
retrieved with the series' variable symbol "$$". For instance, the
series of adjectives that occur to the left of a noun can be accessed
with the symbol $$A
-- The Machine Translation engine now allows to perform checks on
recursively defined linguistic units. For instance,
<$N$ZH$Cl=$A$ZH$Cl> checks that the classifiers of the Chinese
translation of a noun and an adjective match.
-- noojapply includes the new dictionary and corpus processors; it
parses texts in which text units are delimited with XML tags (such as
<p> or <s>).
Max Silberztein