关于自然语料

xiaoz

永远的超级管理员
Staff member
Sunrise asked -

请问什么是自然语料,在研究者所设定的情境下收集的语料算不算自然语料,或者这样的语料有没有一个专有的名字?

As my answer is too long to be MSGed, I am pasting it here-

There are three types of lingusitic data, introspective data, elicited data and corpus data. See my chapter below for details: http://eprints.lancs.ac.uk/61/

Elicited data lies on the continuum between introspective and corpus data. Such data can be closer to either pole. If the researcher designs a set of sentences in an experiement to invite gammaticality judgement from informants, the data is not natural; if corpus is composed of sentences or words read out by testers (for use in developing speech recognition applications, for example), the "corpus" is not natural.
In the case of CHILDES, children were recorded in their daily activities or when they were instructed to tell stories from pictures. This type of data is closer to natural language production, as the informant were not aware what linguistic features they were tested on, if they were aware that they were being tested at all.
 
Surreptiously recording dyadic conversations can be claimed one of the most natural discourse ever.

Most of the data in SCCSD BJ-500 were recorded surreptiously.
http://www.ddyyx.com/corpus.htm
 
研究者设定一个情境:让母亲和孩子在一个有四个盒子的房间,但是他们的对话是自发的,这样的录音语料也可称为自然语料吧?
另外,三种语料的名字帮忙能否翻译成汉语?
 
有什么具体依据没?
xujiajin先生能否帮忙翻译xiaoz先生提到的三类语料,一种是内省语料,另外两种该怎么翻译
 
我知道了,设定具体情境的语料可能就是xiaoz先生所说的elicit data,但是这该怎么翻译
 
是elicited data, 诱导式语料,诱导获得的语料
corpus data: 语料库语料

实际上xiaoz的三种语料的可以分为两块来看:即内省vs elicitation + corpora
但是elicited data其实也是可以进入语料库,所以从一定意义上说corpus data是包含elicitation的。
 
Back
顶部