A LOGICAL APPROACH TO COMPUTATIONAL CORPUS LINGUISTICS
Torbjörn Lager, 1995 (PhD thesis)
GOTHENBURG MONOGRAPHS IN LINGUISTICS 14
The purpose of this thesis is to build a corpus theory development environment C to discuss its design, use, and implementation. The proposed system is based on a logical approach to computational corpus linguistics where sentences of logic are used to express statements about texts and logical inference is used to manipulate these sentences in order to analyse the texts.
The thesis demonstrates the remarkable ease with which the functionalities needed in a corpus system can be implemented when based upon adequate means of representing, querying, and reasoning. The proposed system implements hand coding, searching, concordancing, parsing, counting, tabling, collocating, automatic part of speech tagging, lemmatizing, excerpting, interpreting, treebanking, explanation, and various kinds of learning.
By linking all this functionality into a common representational framework characterised by high expressive power, declarativity, and explicit reasoning strategies, and by embedding the whole concept in a particular philosophical and methodological context, including an ontology of text, an analysis of the notion of theory, an explication of the notion of truth, and other foundational issues, we arrive at an interactive system which is multi-functional and general, yet simple, consistent, and highly usable.
Apart from being interesting from a practical point of view, the development of such a system raises intriguing philosophical and methodological questions: What is a corpus text? What is a corpus theory? What does it mean to develop a corpus theory? What does it mean for a corpus theory to be true about a corpus text? What is the link between the truth of such a theory and its usefulness for natural language processing purposes? These and related questions are discussed in the thesis.
The system exists in a prototype implementation and the thesis contains numerous examples from this implementation in action.
KEY WORDS: Corpus linguistics, Corpus tools, Grammar, Grammar development, Logic programming
Download the book/thesis (1.5 MB):
http://www.ling.gu.se/~lager/Thesis.pdf
Torbjörn Lager, 1995 (PhD thesis)
GOTHENBURG MONOGRAPHS IN LINGUISTICS 14
The purpose of this thesis is to build a corpus theory development environment C to discuss its design, use, and implementation. The proposed system is based on a logical approach to computational corpus linguistics where sentences of logic are used to express statements about texts and logical inference is used to manipulate these sentences in order to analyse the texts.
The thesis demonstrates the remarkable ease with which the functionalities needed in a corpus system can be implemented when based upon adequate means of representing, querying, and reasoning. The proposed system implements hand coding, searching, concordancing, parsing, counting, tabling, collocating, automatic part of speech tagging, lemmatizing, excerpting, interpreting, treebanking, explanation, and various kinds of learning.
By linking all this functionality into a common representational framework characterised by high expressive power, declarativity, and explicit reasoning strategies, and by embedding the whole concept in a particular philosophical and methodological context, including an ontology of text, an analysis of the notion of theory, an explication of the notion of truth, and other foundational issues, we arrive at an interactive system which is multi-functional and general, yet simple, consistent, and highly usable.
Apart from being interesting from a practical point of view, the development of such a system raises intriguing philosophical and methodological questions: What is a corpus text? What is a corpus theory? What does it mean to develop a corpus theory? What does it mean for a corpus theory to be true about a corpus text? What is the link between the truth of such a theory and its usefulness for natural language processing purposes? These and related questions are discussed in the thesis.
The system exists in a prototype implementation and the thesis contains numerous examples from this implementation in action.
KEY WORDS: Corpus linguistics, Corpus tools, Grammar, Grammar development, Logic programming
Download the book/thesis (1.5 MB):
http://www.ling.gu.se/~lager/Thesis.pdf