求助:求带音/视频的较大型中文口语语料库

各位专家学者好,我是上周刚注册的会员,前来请教求助。刚开始PhD课题,作汉语普通话口语语篇方面的分析,需要有音频文件(视频可能就更好了),找到了PolyU Corpus of Spoken Chinese,有sound track ,但是语料规模太小,无法满足语料需求。希望在这里能得到一些信息指导,非常感谢!

PS: 看到肖忠华老师去世的消息,非常非常难过,希望他在天堂安息,保佑我们继续语料库语言学研究的事业。
 
The Lancaster Los Angeles Spoken Chinese Corpus (http://www.lancaster.ac.uk/fass/projects/corpus/LLSCC/) has about 1 million words. If it's still too small, perhaps you could try to create your own, using some existing data, such as TV shows, movie transcripts, etc. If you're interested in natural spoken interaction, the the dataset will necessarily be small, considering the amount of work in transcription.
 
Thank you very much Dr. Ai. May I follow up with two more questions considering your advice:
1. If I use TV shows to create my own corpus, shall I consider about the Copyright of those shows as transcripts, audio and visual are all protected?
2. The link to LLSCC is valid, but I can not see where to log in onto the corpus. And the data in this corpus is not open to public for free, I remember. It is ok even it charges. Do you know how to access to it?

And last, do you know what CHA files are and how to open such files?

Many thanks. Wish you a happy new year of the Monkey.
 
To answer your first question, if you're only using the corpus you created for your private use, then you should be fine. However, if you intend to publish or distribute the corpus, then you need to deal with the copyright issue. You might contact Professor Tao for details regarding the availability of the LLSCC corpus. I believe CHA files can be opened by CLAN software http://childes.psy.cmu.edu/clan/.
 
First I must apologize for my ignorance,Dr. Ai. I am not sure who is the Professor Tao you recommended me to contact for LLSCC corpus. Is he Prof.Hongyin Tao?
 
Back
顶部