Two questions: Wordsmith error & reference corpus

xiaoz

永远的超级管理员
Staff member
Two questions from angieyxl at 01:58 PM 2007-02-16

first question:
when doing index wordlist, i came across a problem, that said: general protection fault in module wordlist.exe at 001E:034C. tried N times and it just didn't work. totally at a loss. maybe wordsmith 3.0 itself dysfunctions or something

2nd question:
自建的语料是否太小?
目前我正在分析一个英语学术讨论组信息的语言特点。我汇总了该网站的所有讨论贴子,做成一个语料库(有点美其名曰),总字数只有4万字。
分析的其中的一部分想借助wordsmith做。主要目的是想分析学术讨论以网络为媒介来呈现时所体现出来的特点。这个语料库是否有点太小?
有一个比较tough的问题是关于reference corpus. 在做keyword时,需要一个reference corpus, 如果要突出网络为媒介而非传统媒介,所呈现的特点时,以什么做为reference corpus比较好?(wordsmith我是新手,语料库这块我也很弱。)我想过以传统的学术论文做为reference,但觉得不妥。因为发表的书面语与学术讨论必定是不同的。最好的reference应该是colloquium,但这种语料库以我目前的能力根本找不到。因此想听听您的高见。您认为这么小规模的语料是否有操作的可能性?哪种语料作为reference corpus比较好?
 
回复: Two questions: Wordsmith error & reference corpus

Re Q1: Does WST crash when you make a wordlist of other data? Or it is fine with other corpora and only crashes with this particular corpus? There are some "cracked" version out there which crashes easily. Do you have a correct copy of WST?

Re Q2: A corpus of 40K tokens is bit too smallish to provide convincing evidence.

A reference corpus depends on your purpose. To highlight the features of the langauge used in forum discussions, I think a reference corpus of paper-based publications should be ok.
 
回复: Two questions: Wordsmith error & reference corpus

our chinese new year today, happy new year to all!

thanks
Re Q1: Does WST crash when you make a wordlist of other data?
yea,it did crach with other corpus, so i know why

Re Q2: A corpus of 40K tokens is bit too smallish to provide convincing evidence.

umm, that's what i am worried about. maybe i should download more messages from other scholarly discussion lists. but presently i am aware of only few of them. would anyone kindly recommend more scholarly discussion lists?
thanks
 
Back
顶部