中国语料库研究的历史与现状

好文章啊,冯老师真厉害。
http://www.china-language.gov.cn/jgsz/jss/images/feng/feng.htm
 
对,这应该算是一篇比较全面的总结文章了。冯也称得上是这方面的权威之一了。
 
I also wonder how many of the corpora mentioned are publicly available.
 
Many corpora mentioned are kept in their closets and exclusive to the very few corpus compliers.

[本贴已被 作者 于 2005年06月21日 15时23分37秒 编辑过]
 
A huge waste of resources.
Funding organisations in the UK are more likely to support corpus building projects that make outcomes publicly available.
 
Many researches here at home are researches for researches' sake.

[本贴已被 作者 于 2005年06月21日 22时11分11秒 编辑过]
 
其实,不是学者不愿意或者不喜欢做好学问。因为客观上生活的压力太大,所以有很多后顾之忧。我想国内绝对不缺乏聪明的大脑,可是我感觉生活足以压得大家喘不过气来。

这样想来,有些问题就可以理解了。
 
Xaira is a good program for Chinese and all other languages encoded in XML.

Chinese Academy of Science and many other research institues do have robust tools for Chinese segmentation, annotation and concordancing.
 
回复:中国语料库研究的历史与现状

One thing I have always thought to do would be to combine the financial resources overseas with the manpower in China to develop some publicly available large corpus resources. I hope this forum can facilitate such endeavors in the future.
 
Cheap manpower may not necessarily produce cheap resources. Some of the resources offered by the Chinese Linguistic Data Consortium are very expensive, even though they claim to be a non-profit-making organization.

http://www.chineseldc.org/resourse.asp
 
回复:中国语料库研究的历史与现状

If the overseas finance is strong enough - and utilizing mainland labor is one way to reduce costs - the resources could be made (more or less) free.

So far this has been just a dream.
 
回复:中国语料库研究的历史与现状

以下是引用 动态语法2005-7-5 6:10:45 的发言:
One thing I have always thought to do would be to combine the financial resources overseas with the manpower in China to develop some publicly available large corpus resources. I hope this forum can facilitate such endeavors in the future.

I would volunteer to be part of the dream team. This dream can be expected.
 
回复:中国语料库研究的历史与现状

以下是引用 动态语法2005-7-7 11:45:53 的发言:
If the overseas finance is strong enough - and utilizing mainland labor is one way to reduce costs - the resources could be made (more or less) free.

So far this has been just a dream.

Agreed.
A corpus can only has a life when it becomes an open resource - either free (a license may apply) or at a reasonable cost.
 
I agree that most of the resources on corpus linguistics are not availble to our Chinese learners, and it is even more hard to obtain the latest outcome overseas, also at home, which is also the reason why China always keep backwards in linguistic research.
 
在中国这种条件下,人们对资源的保守和吝啬是可以理解的。前几年在国内要作语言学研究,连几本经典的语言学著作一般的研究者都很难有幸看到,于是我们看到了很多国内专家转手贩卖的二手货。一方面我们要感谢国内这些专家,是他们让我们了解了一点语言学的基本常识,尽管他们的书里面有很多思想连他们自己也没有怎么弄通。另一方面,这种资源匮乏的局面也成就了他们的功名。我的同事在95年去广外复印一本语言学的原版书籍,拿回来后仔细研读,后来发现上面划上横杠的部分居然和复印该资料那个学校某位老师出版的一本书上的内容非常吻合。在那个时候,或者更早一点的时候,又有多少人想到,或者能够找到这样可以复印原版书的地方。也许,在潜意识里面,我们都真的还希望资源匮乏一点好。不过,又都希望自己垄断资源。当然,这应该说不是内容的全部。在当今中国,知识产权还没有得到很好的保护,无论在国家的措施方面还是国人意识上来说都是如此。在创造智力财富人群中间同样存在严重的问题,因此,那些首创者的保守也是可以理解的,因为一旦公布,谁知道将是什么结果。一切都太无法预知。这就是我们目前还无法在短期内改变的现实。在国内语料库这种资源无疑也不会因为它的到来而改变这种现状。oscar3希望,在corpus4u这个网站的倡导下,这种状况有所改变,至少在一定范围内变化的步伐会快一点。oscar3对此充满信心。

[本贴已被 作者 于 2005年08月04日 18时51分30秒 编辑过]

[本贴已被 作者 于 2005年08月04日 19时16分25秒 编辑过]
 
Back
顶部