How to write a thesis based on spoken data?

zhmabel

初级会员
各位大侠,特别是这个论坛上发布很多帖子的精英们,本人现在就要着手进行硕士的毕业论文了,可是由于这几年来都不太勤于学业,以至现在不知道如何进行,连开题都难以定夺。本人导师是研究语料库的,很希望其弟子能够在口语方面有所研究,可是我现在都还不知道口语有哪些方面可以着手创造出一篇论文出来,所以请教各位,帮在下一把了!!!

[本贴已被 xujiajin 于 2005年09月10日 21时08分01秒 编辑过]
 
1、你手头有现成的口语语料吗?
2、你导师希望在哪个具体的小点上有所研究?
 
做口语语料研究是十分辛苦的。当然你如果有现成的语料,特别是完全转写和校对好的文字脚本在自然更好。

不过通常的硕士论文可以做一个小规模的研究,比如几次广播电视上的interview或者talk show节目之类的。
 
MICASE倒是可以用用,http://www.lsa.umich.edu/eli/micase/index.htm
MICASE中主要是学术口语。
 
I am using the demographically sampled component of the BNC (ca. 4 million words).
 
But that part of BNC, let alone the entire BNC, is not available to the common user.
 
The British Library server provides 20 days' free trial per usename via SARA. When one username expires, register another. 40 days is enough for data collection for an MA project, right?
 
Most researches on spoken corpora are primarily based on transcripts. If we want to get the most of the speech in its actual context, we have to go back to the sound files/recordings.
 
This is what distinguishes text-/transcript-based research from audio-based research. And most recently multimodal (text, audio, video, gesture etc) information is integrated and analyzed simultaneously.
 
Indeed, much extra-linguistic information is lost in transcription. That's why the transcripts of some existing audio/video corpora are time stamped and linked to the digital audio/video chunks (e.g. the LDC Santa Barbara corpus, and the SCOTS corpus http://www.scottishcorpus.ac.uk/).

However, a spoken corpus composed of transcripts properly marked up with some most common extra-lingusitic features can be used for many purposes in lingusitic inquiry.
 
Another way out is to annotate the audio sound directly, marking up the phonetic, phonological, paralinguistic features together with the verbatim orthographic transcript. This way of annotation can be well done by Praat (www.praat.org).

[本贴已被 作者 于 2005年06月20日 10时02分25秒 编辑过]
 
But a spoken corpus annotated with ponological, phonetic, and prosodic features is still composed of enriched "transcripts". In lingusitic analysis - as opposed to language engineering (e.g. voice recognition and speech-to-text synthesis), corpora of such transcripts are usually sufficient for most purposes.
 
when we come to the analysis of situational features of linguistic facts, esp. the interactive attitudinal features, tones of voice and other prosodic features have to be taken into account.

take "yes" for example, "yes" with a rising tone and a falling tone and in some cases a reluctant (prolonged, first falling then rising) tone carry rather different communicative content.


[本贴已被 作者 于 2005年06月21日 00时59分39秒 编辑过]
 
True. That's why most spoken corpora are marked up with pauses indicating duration, events, vocals while some corpora of transcripts are even annotated with prosodic features such as tones and prolonged pronunciations. These are all transcribed corpora. Unless you are dealing with audio/video recordings directly, you are using transcribed spoken corpora.
 
国际上有一个ICAME Corpus collection 的光盘,其中包含很多语料库。你可以到google上搜一下。

国内只有一个CLEC现在出售,该光盘附在杨惠中、桂诗春的书后。

中文的语料库倒是有一些可以在线检索的,具体的见“汉语语料库”版的置顶贴。
 
I'm doing a research on contrastive study of english and chinese relative clauses for my MA degree.I wanted to use some intuition-based data,but my tutor suggested the corpus-based data was better( "more authoritative , more reliable"). The problem is that I never used corpus before and don't know how to use it. I bought a book related to corpus linguistics but find of little help.Perhaps I'm not that techonology-minded.Would anyone give me a hand?
 
Back
顶部