Does anybody have a near-comprehensive list of English general nouns?
How can we extract the headwords in WordNet?
http://torvald.aksis.uib.no/corpora/1995-3/0064.html
Re: general nouns
Patrick Cassidy (micra@tigger.jvnc.net)
Wed, 2 Aug 1995 01:13:41 -0400 (EDT)
* Messages sorted by: [ date ][ thread ][ subject ][ author ]
* Next message: Marco Antonio Da Rocha: "Re: Thesaurus"
* Previous message: gunter.lorenz: "general nouns"
* Maybe in reply to: gunter.lorenz: "general nouns"
Gunter Lorenz writes:
> Could anybody please provide pointers on the following:
> I am working on the class of _general_nouns_ at the moment; superordinates
> such as people, issue, problem, thing etc. that can function as "labels"
> for pieces of discourse - either anaphorically or cataphorically.
>
> Does anybody know of a thesaurus-like list of these kind of labels or of
> otherwise reasonable categorisations?
. . .
>
> Gunter Lorenz, M.A. Tel: 0(+49)-821-598-751
> Didaktik des Englischen Fax: 0(+49)-821-598-5501
> Universitaet Augsburg
> Universitaetsstr. 10 Gunter.Lorenz@phil.uni-augsburg.de
> D-86135 Augsburg
The recent answer of Mark Lauer to a similar question is also relevant
to this query. The Wordnet thesaurus Mark mentioned contains a hierarchy,
and any of the 1000-2000 higher nodes could be considered as a
_general noun_ designating an area of discourse. The 1043 headwords of
the 1911 Roget could be viewed similarly.
An additional hierarchy available for consideration is the hierarchy
I have been preparing by modification of the 1911 Roget. It is
not as complete as Wordnet, but has a larger number of defined
semantic relations. This modified Roget has been named the FACTOTUM
Semantic Network. It is still at an early stage of development,
and in addition to being less complete than Wordnet, the present
version of the FACTOTUM Semantic Network will certainly have
numerous inconsistencies and errors. However, it should already
be more useful than the 1911 Roget, since (1) it has been
supplemented with several thousand words not in the 1911 Roget;
(2) its structure is better defined, and we have a parser that can
extract the logical structure from the plain text; (3) the hierarchy,
though not yet a fully accurate inheritance hierarchy, is better
organized for inheritance purposes than is the Roget arrangement;
and (4) it has over 10,000 semantic links (other than hierarchical)
relating the words of each main entry to each other. These links are
formed from a list of over 160 semantic relations, which are being
defined as necessary to explicitly mark the semantic relations between
words, which are only implied by the juxtaposition of words in each Roget
main entry. At this point, this semantic network has about 2,000
main entries, most of which might also be viewed as _general nouns_.
The main purpose of the present phase of modifying the Roget
is to find the set of semantic relations considered by the Roget's authors
as significant enough to warrant viewing words as "related".
It is my suspicion that this list will form a minimum set necessary to
provide logical definitions of words adequate for human-level
language understanding. It is understood, of course, that "definitions"
of words in terms of other words must be grounded at some point by
"primitive" concepts which must be "defined" in some other way, e.g.
by procedural code for constructing a logical representation of
a discourse.
This FACTOTUM Semantic Network is copyrighted, but a
zipped ASCII version of the text is available for examination by
anonymous ftp.
ftp to styx.ios.com
(if this node does not respond, try ios.com)
log in as anonymous
change directory to pub/users/micra
(full path is /home/ftp/pub/usrs/micra)
set mode to binary.
get files fsn.zip, readme.fsn, and fsn_doc.asc
The "readme.fsn" file will have general information about the
files in this directory, and the restrictions placed on their use.
More information about the semantic network and semantic relations
will be found in "fsn_doc.asc" and in the unzipped files "relation.asc"
and file "fsn1.asc" (header portion).
In spite of its very primitive state, this semantic network
is being made available at this time to contribute to current
discussions of the possibility of finding some common ground
among general ontologies being prepared by different groups.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
To another query about thesauruses, Mark Lauer replied:
> You have at least a couple of options:
>
> 1) Use WordNet, a freely available lexical taxonomy consisting
> of small synonym sets (about 4 words in each) linked by various
> semantic relations (ISA, HAS_PART, etc). This was developed
> by George Miller (1990) and associates. It contains around 167,000
> word senses, including nouns, verbs, adjectives and adverbs.
> ftp://clarity.princeton.edu/pub/wordnet/wn1.5unix.tar.gz.a
> 2) Use Roget's 1911 Thesaurus from Project Gutenburg consisting
> of 1043 categories, each containing nouns, verbs, adjectives, adverbs
> and phrases. There are an average of 34 single-word nouns in each.
> This was entered by Patrick Cassidy of Micra Inc. The file is in human
> readable form and so requires quite a bit of massaging to get a machine
> tractable version. I have done this already for the nouns (see Lauer, 1995;
> Resnik, 1995), and if anyone would like to use my version, please email me
> and I will try to get back to you as soon as I can.
> ftp://mrcnext.cso.uiuc.edu/etext/etext91/roget13a.txt
>
> Hope this is of use to people out there.
>
> Best wishes,
> Mark Lauer
> Microsoft Institute
> Sydney, Australia
>
> Miller, G. (1990) WordNet: An On-line Lexical Database.
> In International Journal of Lexicography, Vol. 3(4).
>
> Lauer, M. (1995) Corpus Statistics Meet
> The Compound Noun: Some Empirical Results.
> In Proceedings of the 33rd Annual Meeting
> of the Association for Computational Linguistics,
> Cambridge, MA.
>
> Resnik, P. (1995) Disambiguating Noun Groupings
> with Respect to WordNet Senses
> In Proceedings of the Third Workshop on Very Large Corpora,
> Cambridge, MA.
>
>
(End information from Mark Lauer)
				
			How can we extract the headwords in WordNet?
http://torvald.aksis.uib.no/corpora/1995-3/0064.html
Re: general nouns
Patrick Cassidy (micra@tigger.jvnc.net)
Wed, 2 Aug 1995 01:13:41 -0400 (EDT)
* Messages sorted by: [ date ][ thread ][ subject ][ author ]
* Next message: Marco Antonio Da Rocha: "Re: Thesaurus"
* Previous message: gunter.lorenz: "general nouns"
* Maybe in reply to: gunter.lorenz: "general nouns"
Gunter Lorenz writes:
> Could anybody please provide pointers on the following:
> I am working on the class of _general_nouns_ at the moment; superordinates
> such as people, issue, problem, thing etc. that can function as "labels"
> for pieces of discourse - either anaphorically or cataphorically.
>
> Does anybody know of a thesaurus-like list of these kind of labels or of
> otherwise reasonable categorisations?
. . .
>
> Gunter Lorenz, M.A. Tel: 0(+49)-821-598-751
> Didaktik des Englischen Fax: 0(+49)-821-598-5501
> Universitaet Augsburg
> Universitaetsstr. 10 Gunter.Lorenz@phil.uni-augsburg.de
> D-86135 Augsburg
The recent answer of Mark Lauer to a similar question is also relevant
to this query. The Wordnet thesaurus Mark mentioned contains a hierarchy,
and any of the 1000-2000 higher nodes could be considered as a
_general noun_ designating an area of discourse. The 1043 headwords of
the 1911 Roget could be viewed similarly.
An additional hierarchy available for consideration is the hierarchy
I have been preparing by modification of the 1911 Roget. It is
not as complete as Wordnet, but has a larger number of defined
semantic relations. This modified Roget has been named the FACTOTUM
Semantic Network. It is still at an early stage of development,
and in addition to being less complete than Wordnet, the present
version of the FACTOTUM Semantic Network will certainly have
numerous inconsistencies and errors. However, it should already
be more useful than the 1911 Roget, since (1) it has been
supplemented with several thousand words not in the 1911 Roget;
(2) its structure is better defined, and we have a parser that can
extract the logical structure from the plain text; (3) the hierarchy,
though not yet a fully accurate inheritance hierarchy, is better
organized for inheritance purposes than is the Roget arrangement;
and (4) it has over 10,000 semantic links (other than hierarchical)
relating the words of each main entry to each other. These links are
formed from a list of over 160 semantic relations, which are being
defined as necessary to explicitly mark the semantic relations between
words, which are only implied by the juxtaposition of words in each Roget
main entry. At this point, this semantic network has about 2,000
main entries, most of which might also be viewed as _general nouns_.
The main purpose of the present phase of modifying the Roget
is to find the set of semantic relations considered by the Roget's authors
as significant enough to warrant viewing words as "related".
It is my suspicion that this list will form a minimum set necessary to
provide logical definitions of words adequate for human-level
language understanding. It is understood, of course, that "definitions"
of words in terms of other words must be grounded at some point by
"primitive" concepts which must be "defined" in some other way, e.g.
by procedural code for constructing a logical representation of
a discourse.
This FACTOTUM Semantic Network is copyrighted, but a
zipped ASCII version of the text is available for examination by
anonymous ftp.
ftp to styx.ios.com
(if this node does not respond, try ios.com)
log in as anonymous
change directory to pub/users/micra
(full path is /home/ftp/pub/usrs/micra)
set mode to binary.
get files fsn.zip, readme.fsn, and fsn_doc.asc
The "readme.fsn" file will have general information about the
files in this directory, and the restrictions placed on their use.
More information about the semantic network and semantic relations
will be found in "fsn_doc.asc" and in the unzipped files "relation.asc"
and file "fsn1.asc" (header portion).
In spite of its very primitive state, this semantic network
is being made available at this time to contribute to current
discussions of the possibility of finding some common ground
among general ontologies being prepared by different groups.
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
To another query about thesauruses, Mark Lauer replied:
> You have at least a couple of options:
>
> 1) Use WordNet, a freely available lexical taxonomy consisting
> of small synonym sets (about 4 words in each) linked by various
> semantic relations (ISA, HAS_PART, etc). This was developed
> by George Miller (1990) and associates. It contains around 167,000
> word senses, including nouns, verbs, adjectives and adverbs.
> ftp://clarity.princeton.edu/pub/wordnet/wn1.5unix.tar.gz.a
> 2) Use Roget's 1911 Thesaurus from Project Gutenburg consisting
> of 1043 categories, each containing nouns, verbs, adjectives, adverbs
> and phrases. There are an average of 34 single-word nouns in each.
> This was entered by Patrick Cassidy of Micra Inc. The file is in human
> readable form and so requires quite a bit of massaging to get a machine
> tractable version. I have done this already for the nouns (see Lauer, 1995;
> Resnik, 1995), and if anyone would like to use my version, please email me
> and I will try to get back to you as soon as I can.
> ftp://mrcnext.cso.uiuc.edu/etext/etext91/roget13a.txt
>
> Hope this is of use to people out there.
>
> Best wishes,
> Mark Lauer
> Microsoft Institute
> Sydney, Australia
>
> Miller, G. (1990) WordNet: An On-line Lexical Database.
> In International Journal of Lexicography, Vol. 3(4).
>
> Lauer, M. (1995) Corpus Statistics Meet
> The Compound Noun: Some Empirical Results.
> In Proceedings of the 33rd Annual Meeting
> of the Association for Computational Linguistics,
> Cambridge, MA.
>
> Resnik, P. (1995) Disambiguating Noun Groupings
> with Respect to WordNet Senses
> In Proceedings of the Third Workshop on Very Large Corpora,
> Cambridge, MA.
>
>
(End information from Mark Lauer)
 
				 
 
		