
I also wonder how many of the corpora mentioned are publicly available.
Many corpora mentioned are kept in their closets and exclusive to the very few corpus compliers.

[本贴已被 作者 于 2005年06月21日 15时23分37秒 编辑过]
A huge waste of resources.
Funding organisations in the UK are more likely to support corpus building projects that make outcomes publicly available.
Many researches here at home are researches for researches' sake.

[本贴已被 作者 于 2005年06月21日 22时11分11秒 编辑过]

Xaira is a good program for Chinese and all other languages encoded in XML.

Chinese Academy of Science and many other research institues do have robust tools for Chinese segmentation, annotation and concordancing.

One thing I have always thought to do would be to combine the financial resources overseas with the manpower in China to develop some publicly available large corpus resources. I hope this forum can facilitate such endeavors in the future.
Cheap manpower may not necessarily produce cheap resources. Some of the resources offered by the Chinese Linguistic Data Consortium are very expensive, even though they claim to be a non-profit-making organization.

If the overseas finance is strong enough - and utilizing mainland labor is one way to reduce costs - the resources could be made (more or less) free.

So far this has been just a dream.

以下是引用 动态语法2005-7-5 6:10:45 的发言:
One thing I have always thought to do would be to combine the financial resources overseas with the manpower in China to develop some publicly available large corpus resources. I hope this forum can facilitate such endeavors in the future.

I would volunteer to be part of the dream team. This dream can be expected.

以下是引用 动态语法2005-7-7 11:45:53 的发言:
If the overseas finance is strong enough - and utilizing mainland labor is one way to reduce costs - the resources could be made (more or less) free.

So far this has been just a dream.

A corpus can only has a life when it becomes an open resource - either free (a license may apply) or at a reasonable cost.
I agree that most of the resources on corpus linguistics are not availble to our Chinese learners, and it is even more hard to obtain the latest outcome overseas, also at home, which is also the reason why China always keep backwards in linguistic research.

[本贴已被 作者 于 2005年08月04日 18时51分30秒 编辑过]

[本贴已被 作者 于 2005年08月04日 19时16分25秒 编辑过]