Sunrise asked -
请问什么是自然语料,在研究者所设定的情境下收集的语料算不算自然语料,或者这样的语料有没有一个专有的名字?
As my answer is too long to be MSGed, I am pasting it here-
There are three types of lingusitic data, introspective data, elicited data and corpus data. See my chapter below for details: http://eprints.lancs.ac.uk/61/
Elicited data lies on the continuum between introspective and corpus data. Such data can be closer to either pole. If the researcher designs a set of sentences in an experiement to invite gammaticality judgement from informants, the data is not natural; if corpus is composed of sentences or words read out by testers (for use in developing speech recognition applications, for example), the "corpus" is not natural.
In the case of CHILDES, children were recorded in their daily activities or when they were instructed to tell stories from pictures. This type of data is closer to natural language production, as the informant were not aware what linguistic features they were tested on, if they were aware that they were being tested at all.
请问什么是自然语料,在研究者所设定的情境下收集的语料算不算自然语料,或者这样的语料有没有一个专有的名字?
As my answer is too long to be MSGed, I am pasting it here-
There are three types of lingusitic data, introspective data, elicited data and corpus data. See my chapter below for details: http://eprints.lancs.ac.uk/61/
Elicited data lies on the continuum between introspective and corpus data. Such data can be closer to either pole. If the researcher designs a set of sentences in an experiement to invite gammaticality judgement from informants, the data is not natural; if corpus is composed of sentences or words read out by testers (for use in developing speech recognition applications, for example), the "corpus" is not natural.
In the case of CHILDES, children were recorded in their daily activities or when they were instructed to tell stories from pictures. This type of data is closer to natural language production, as the informant were not aware what linguistic features they were tested on, if they were aware that they were being tested at all.