A Corpus Worker`s Toolkit:语料库工具箱-0908 更新

动态语法

管理员
Staff member
回复:A Corpus Worker`s Toolkit:语料库工具箱-0819更新

以下是引用 xujiajin2005-8-24 11:44:05 的发言:
简体中文版自述文件繁w中文版x我n案


Most grateful to Dr Xu!
 

刘语料

封禁用户
But I do not find the following files:TK_Start.clb, 01_TextUtl.clb, 02_WdL_Conc.clb, 03_DiscTag.clb, 04_*.Trans.clb, and 05_Links.clb and kwic.pl, kwic_e.pl, segment.pl, wordlist.txt .
could someone tell me how to get the above files?
Thank you!
 

动态语法

管理员
Staff member
回复:A Corpus Worker`s Toolkit:语料库工具箱-0819更新

现在第一页,第3 楼的文件也是最新的了。
 

动态语法

管理员
Staff member
回复:A Corpus Worker`s Toolkit:语料库工具箱-0907更新

2005-09-08版更新项目:

-常用文本统计计算器:
(1) Church et al. 给出的互现信息MI算法格式
(2) Church et al. 给出的T-值算法格式 (含变量调整和不调整两种算法)
(3) BNCweb 给出的互现信息MI算法格式
(4) BNCweb给出的互现信息MI3算法格式
(5) BNCweb给出的Z-值算法格式
(6) BNCweb给出的Log-likelihood 双项(X, Y)算法格式
(7) Paul Rayson 给?Log-likelihood (单项, 双语料库) 算法格式
(8) 连接到Cathy Ball, Georgetown University Chi Square (X2) 网上计算器
-加入一个文本分割工具(用split.pl)。
-加入文本内码转换连接(GB/Big5<->UTF8/UNICODE)。
-加入一个汉字空格工具(用到cmu_seg.pl & ldc.dic). 空格也可以去掉。
-增进索引工具的可靠性。
-加入了Du Bois 最新的口语转写系统 (2005年八月版本)。
-加入一个多功能的语义体态分析系统(用到kwic_l.pl)。

升级指南(简体中文):
http://forum.corpus4u.org/upload/forum/2005091015253151.pdf

下载升级用文件 (download files for upgrade only):
http://forum.corpus4u.org/upload/forum/2005091808294721.zip

---------------------------------------

非升级的全新用户:

用户指南(中文简体):
http://forum.corpus4u.org/upload/forum/2005091015211831.pdf

User Guide (English):
http://forum.corpus4u.org/upload/forum/2005091015100987.pdf

新用户所须全部文件 (Download the whole package):
http://forum.corpus4u.org/upload/forum/2005091808305014.zip

-----------
Website for ACWT:
http://www.humnet.ucla.edu/alc/chinese/ACWT/ACWT.htm

[本贴已被 作者 于 2005年09月10日 15时31分36秒 编辑过]
 

动态语法

管理员
Staff member
回复:A Corpus Worker`s Toolkit:语料库工具箱-0907更新

Thanks for the positive feedback. I hope it's useful to as many corpus workers
as possible.
 

xujiajin

管理员
Staff member
One biggest strength is that corpus workers can diy their own clip in this toolkit.
 

动态语法

管理员
Staff member
回复:A Corpus Worker`s Toolkit:语料库工具箱-0907更新

以下是引用 xujiajin2005-9-9 10:27:38 的发言:
One biggest strength is that corpus workers can diy their own clip in this toolkit.
确实如此。有不少工具我都写做“示例”,个人完全可以根据自己的需要来改编、增添其中内容。
仔细研究一下一些条目的写法就会发现自己“编程”并不是很难。
 

xujiajin

管理员
Staff member
I personally think Discourse Tagging and Transcription are the best part of the toolkit.
 

xujiajin

管理员
Staff member
Can you post a copy of the 2005 version of the Du Bois transcription system?

It is better to start a new thread for that, maybe at the section of "Corpus Markup & Annotation语料库标注 "
 

动态语法

管理员
Staff member
回复:A Corpus Worker`s Toolkit:语料库工具箱-0907更新

以下是引用 xujiajin2005-9-9 22:11:37 的发言:
I personally think Discourse Tagging and Transcription are the best part of the toolkit.
Yes, I am having fun with these too. In fact I am in the middle of developing
another tagging scheme, which may 'surprise' some of you here. It will come out
soon.
 

动态语法

管理员
Staff member
回复:A Corpus Worker`s Toolkit:语料库工具箱-0907更新

以下是引用 xujiajin2005-9-9 22:26:44 的发言:
Can you post a copy of the 2005 version of the Du Bois transcription system?

It is better to start a new thread for that, maybe at the section of "Corpus Markup & Annotation语料库标注 "
Yes, that's in the plan. I'm too busy with ACWT these days. But it will be
posted soon (have already got the permission from Jack Du Bois to share his
unpublished system).
 
顶部