oscar3
高级会员
Corpus linguistics is essentially a technology, but like many technologies,
it may have, at least potentially, considerable consequences.
After all, the telescope transformed astronomy, the X-ray machine
radicalized medicine, the tape recorder impelled the advance of sociolinguistics
and the study of oral discourse, the video recorder
advanced the study of small-group interactions, and the spectrograph
(and similar devices) consolidated the development of instrumental
phonetics. Corpus linguistics technology requires a computer that can
store a collection of text files (the corpus) and then apply software to
those files to produce frequency lists, lists of key words, and, most
importantly, strings of words showing which words co-occur (or collocate)
with others. The text files in a corpus may consist entirely of
written texts (as in the Helsinki Corpus of English Texts), entirely of
transcriptions of speech (as in MICASE―the Michigan Corpus of
Academic spoken English), or of both (as in the Bank of English).1
These corpora are typically constructed on certain principles that lead
to appropriate sampling, and they can vary greatly in size. The Bank
of English corpus, an earlier stage of which underpinned the important
corpus-based COBUILD English Language Dictionary,2 is huge
and, at the time of writing, rapidly approaching 400 million words;
small specialized corpora, especially those devoted to single genres,
such as research articles or university lectures, can be orders of magnitude
smaller. The pros and cons of large diffuse corpora and small
narrow ones is a matter of current debate.
http://forum.corpus4u.org/upload/forum/2005082322565756.pdf
it may have, at least potentially, considerable consequences.
After all, the telescope transformed astronomy, the X-ray machine
radicalized medicine, the tape recorder impelled the advance of sociolinguistics
and the study of oral discourse, the video recorder
advanced the study of small-group interactions, and the spectrograph
(and similar devices) consolidated the development of instrumental
phonetics. Corpus linguistics technology requires a computer that can
store a collection of text files (the corpus) and then apply software to
those files to produce frequency lists, lists of key words, and, most
importantly, strings of words showing which words co-occur (or collocate)
with others. The text files in a corpus may consist entirely of
written texts (as in the Helsinki Corpus of English Texts), entirely of
transcriptions of speech (as in MICASE―the Michigan Corpus of
Academic spoken English), or of both (as in the Bank of English).1
These corpora are typically constructed on certain principles that lead
to appropriate sampling, and they can vary greatly in size. The Bank
of English corpus, an earlier stage of which underpinned the important
corpus-based COBUILD English Language Dictionary,2 is huge
and, at the time of writing, rapidly approaching 400 million words;
small specialized corpora, especially those devoted to single genres,
such as research articles or university lectures, can be orders of magnitude
smaller. The pros and cons of large diffuse corpora and small
narrow ones is a matter of current debate.
http://forum.corpus4u.org/upload/forum/2005082322565756.pdf