四大古典小说语料库下载

laohong

管理员
Staff member
{感谢“清风出袖”的介绍!} 原自中国语言学家网的资源

四大古典小说语料库下载

[ 作者:国家高技术研究发展计划(863计划) 转贴自:国家高技术研究发展计划(863计划) 点击数:534 更新时间:2004-11-1 文章录入:admin ]

四大古典小说语料库下载:
http://www.chinalinguist.net/Article_Show.asp?ArticleID=135
(下载后请用access软件打开,然后搜索相关的内容即可)


如果不能打开该页面,这里是下载地址:

三国演义 http://ling.ccnu.edu.cn/ylk/sgyy.mdb

红 楼 梦 http://ling.ccnu.edu.cn/ylk/hlm.mdb

西 游 记 http://ling.ccnu.edu.cn/ylk/xyj.mdb

水 浒 传 http://ling.ccnu.edu.cn/ylk/shz.mdb


如果还是不能下载,可以到Cropus4U的Gmail里下载。
 
也需要从HTML转成纯文本呀。好像上面Access版本的语料整齐程度更高些,而且每段都有ID,查找也就方便多了。
 
Thank u Lao Hong, U r really terrific! But could u recommend one or two POS taggers for Chinese?
 
回复:四大古典小说语料库下载

以下是引用 xusun5752006-2-18 21:22:29 的发言:
Thank u Lao Hong, U r really terrific! But could u recommend one or two POS taggers for Chinese?


这里介绍我用过的三个中文分词、词性标注工具(三者的对比以后专文再讲),网上都有试用版的下载:

1、 NEUCSP is a Chinese tokeniser/tagger developed by the NLP Lab from the North-Eastern University, Shenyang, China. It does not handle XML tags and sometimes fails on large corpora, so the filter processes large corpora by splitting them into separate documents. The input encoding is GB2312; Neucsp is available for Windows only.

http://www.nlplab.cn/

2、 ICTCLAS is an integrate Chinese lexical analysis system (Institute of Computing Technology, Chinese Lexical Analysis System) that uses an approach based on multi-layer HMM. ICTCLAS includes word segmentation, Part-Of-Speech tagging and unknown words recognition. Its segmentation precision is 97.58%(result from recent official evaluation in national 973 project). The recalling rates of unknown words recognized using roles tagging achieve more than 90%. Especially, the recalling of Chinese person names achieve nearly 98%. The speed for word segmentation and POS tagging is 31.5KB/s.

http://www.nlp.org.cn/project/project.php?proj_id=6

3、CKIP Autotag
The CKIP (Chinese Knowledge and Information Processing) group is a research team formed by the Institute of Information Science and the Institute of Linguistics of Academia Sinica in 1986. Its purpose is to establish a fundamental research environment for Chinese natural language processing.

http://rocling.iis.sinica.edu.tw/CKIP/parser.htm
 
Back
顶部