Yacsi: Another ICTCLAS 2012 GUI

iCasino

普通会员
[Downlod] Yacsi: An ICTCLAS2012 GUI

链接中为YACSI分词工具的各个历史版本。
http://ishare.iask.sina.com.cn/f/24241229.html


Hi all,

I've developed another graphical user interface named Yacsi to ICTCLAS (version 2012). For this program to start up, you need to download ICTCLAS2012-SDK-0101.rar from Http://www.nlpir.org (the web page is http://www.nlpir.org/?action-viewnews-itemid-14 as of March 5,2012) and put Yacsi 0.9.exe into the unzipped folder of the rar file. That's all.

The major feature lies in that it can keep the tagged corpus files in their original folder hierachy so that files in different folders will not overwrite each other, if they happen to share same names.

I'd like to thank Dr. Zhang Huaping for making the possibility in the first place and Robert Young for showing the ease of working with ICTCLAS early back in 2008.

Yacsi is a freeware. It was tested on Windows XP but I can guarantee no absolute correctness. Any bug reports are heartedly welcome.

Regards,
iCasino
Email: liangpingwu@126.com
 

附件

  • Yacsi 0.9.GIF
    Yacsi 0.9.GIF
    18.4 KB · 浏览: 17
Last edited:
回复: [Downlod] Yacsi: An ICTCLAS2012 GUI

Hi all,

I've developed another graphical user interface named Yacsi to ICTCLAS (version 2012). For this program to start up, you need to download ICTCLAS2012-SDK-0101.rar from Http://www.nlpir.org (the web page is http://www.nlpir.org/?action-viewnews-itemid-14 as of March 5,2012) and put Yacsi 0.9.exe into the unzipped folder of the rar file. That's all.

The major feature lies in that it can keep the tagged corpus files in their original folder hierachy so that files in different folders will not overwrite each other, if they happen to share same names.

I'd like to thank Dr. Zhang Huaping for making the possibility in the first place and Robert Young for showing the ease of working with ICTCLAS early back in 2008.

Yacsi is a freeware. It was tested on Windows XP but I can guarantee no absolute correctness. Any bug reports are heartedly welcome.

Regards,
iCasino
Email: liangpingwu@126.com


Thanks a lot, iCasino, and I hope it will work for a long time. The use finds that it works well but the segment units are not spaced.
 
回复: [Downlod] Yacsi: An ICTCLAS2012 GUI

Hi all,

I've developed another graphical user interface named Yacsi to ICTCLAS (version 2012). For this program to start up, you need to download ICTCLAS2012-SDK-0101.rar from Http://www.nlpir.org (the web page is http://www.nlpir.org/?action-viewnews-itemid-14 as of March 5,2012) and put Yacsi 0.9.exe into the unzipped folder of the rar file. That's all.

The major feature lies in that it can keep the tagged corpus files in their original folder hierachy so that files in different folders will not overwrite each other, if they happen to share same names.

I'd like to thank Dr. Zhang Huaping for making the possibility in the first place and Robert Young for showing the ease of working with ICTCLAS early back in 2008.

Yacsi is a freeware. It was tested on Windows XP but I can guarantee no absolute correctness. Any bug reports are heartedly welcome.

Regards,
iCasino
Email: liangpingwu@126.com

Thanks a lot! It's a good freeware. But could you describe how accurate it is in terms of POS tagging?
 
回复: [Downlod] Yacsi: An ICTCLAS2012 GUI

Thanks a lot, iCasino, and I hope it will work for a long time.
Dr Zhang said:"
为保障用户使用的便利,从本版(ICTCLAS2012-SDK-u0106.rar)开始,调用的dll的名称一律为ICTCLAS2011.dll,不再变化,一般用户只需要变更dll及对应的.user授权文件,无需重新编译自己的程序,即可兼容新版本分词程序。 "

So I guess we still need to update the dll and .user file periodically to keep up with the pace of ICTCLAS. Luckily, this is not a big deal for most users. Or we can stick to the old trick of adjusting the system time.

What's exciting about the new dll (since u0101) is that it can handle utf-8 and traditional Chinese directly so it might cut down the time of having to convert between encoding routines (although Yacsi hasn't take advantage of that since the u0101 version is not mature enough to handle uft-8).

The use finds that it works well but the segment units are not spaced.
Sorry, can you give me an example? I am not quite sure I have understood the problem.
 
回复: [Downlod] Yacsi: An ICTCLAS2012 GUI

Thanks a lot! It's a good freeware. But could you describe how accurate it is in terms of POS tagging?

Dr Zhang Huaping said recently (2010?) that the improvement of newer ICTCLAS lies in speed rather than accuracy (around 98%).That's an inherent limitation of statistical POS tagging, plus some difficulty of tagging Chinese even for human experts, I guess. His present focus is on industrial deployment rather than perfection of linguistic theory.
 
回复: [Downlod] Yacsi: An ICTCLAS2012 GUI

Sorry, can you give me an example? I am not quite sure I have understood the problem.[/QUOTE]


I mean after postagging, their are no spaces between the words.

For example:
original : 国务院总理温家宝3月5日在十一届全国人大五次会议上作政府工作报告时提出,要加快转变经济发展方式,推进经济结构战略性调整。

after postagging: 国务院/n 总理/n 温家宝/n 3月/t 5日/t 在/p 十一/m 届/q 全国人大/n 五/m 次/q 会议/n 上/f 作/v 政府/n 工作/v 报告/n 时/n 提出/v ,/w 要/v 加快/v 转变/v 经济/n 发展/v 方式/n ,/w 推进/v 经济/n 结构/n 战略性/n 调整/v 。/w
 
回复: Yacsi: Another ICTCLAS 2012 GUI

Hi all,

Two bugs are fixed and one feature added in Yacsi 0.93. Please consider the newer version for better use.

Regards,

iCasino


History:
2012/03/05 Yacsi 0.9
2012/03/06 Yacsi 0.92
2012/03/11 Yacsi 0.93
Bugs fixed in 0.93:
negative time span for segmenting when your corpus is big or your corpus files are numerous;
unsuccessful user dictionary unloading when you want to turn it off
Feature added in 0.93:
automatic memory of corpus/tagged paths for later use
 
回复: Yacsi: Another ICTCLAS 2012 GUI

Hi all,
One bug fixed and one feature added in Yacsi 0.94. Thanks for your suggestions.

Regards,
iCasino


History:
2012/03/05 Yacsi 0.9
2010/03/06 Yacsi 0.92
2012/03/11 Yacsi 0.93
2012/03/14 Yacsi 0.94
Bug fixed in 0.94
After clicking "查看结果", the left bar of the opened window shadows the tagged files
Feature added
UTF-8 support. It is known that ICTCLAS-SDK-0101 has some problems in interpreting utf-8 encoded files.But if your text files are clean, sometimes the results can be perfect. I add this feature to facilitate people who wish to submit bugs to Dr Zhang Huaping so that in the near future we might have a more stable ICTCLAS for use.(Note: It is reported that SOME utf8-related bugs have been fixed in ICTCLAS-SDK-0106.)
 

附件

  • Yacsi 0.94.GIF
    Yacsi 0.94.GIF
    20.3 KB · 浏览: 2
  • Yacsi 0.94.rar
    80.7 KB · 浏览: 135
Last edited:
回复: Yacsi: Another ICTCLAS 2012 GUI

hi all,

Two features are added in Yacsi 0.95. Thanks for your interest and suggestions.

I also updated Yacsi 0.93 by fixing the "查看结果" problem (now termed Yacsi 0.93+) for those people who always see beauty in simplicity.

As always, suggestions are heartedly welcome.

Regards,
iCasino

Revision History:
2012/03/05 Yacsi 0.9
2012/03/06 Yacsi 0.92
2012/03/11 Yacsi 0.93
2012/03/14 Yacsi 0.94
2012/03/18 Yacsi 0.95

Features added in 0.95:
The user dictionary can be specified by the user and is saved automatically for later use, which can be useful in the context of multiple projects;
You may choose either use or not use the files in the sub-folders for segmentation. (you are forced to enter the subfolders in the previous versions.)

Note: the granularity control (切分粒度) is disabled as I can find no difference by switching it on or off. (You may try ICTCLAS2012_demo.exe for sure. Please let me know if the observation is wrong.)
 

附件

  • Yacsi 0.95.GIF
    Yacsi 0.95.GIF
    20.4 KB · 浏览: 1
  • Yacsi 0.95.rar
    83.6 KB · 浏览: 47
  • Yacsi 0.93+.rar
    78.9 KB · 浏览: 16
回复: Yacsi: Another ICTCLAS 2012 GUI

Yacsi 0.95+

One bug fixed:
the program freezes when segmenting large quantities of files (no real hurt in terms of linguistic analysis, just a poor user experience)
 

附件

  • Yacsi 0.95+.rar
    84.8 KB · 浏览: 96
Last edited:
回复: Yacsi: Another ICTCLAS 2012 GUI

Hi all,

Yacsi 0.96 is out with two features added and one bug fixed.

The previous Yacsi 0.95+ turns out to be a premature release and ineffective in reporting problems correctly (sorry for that, guys). So if you are a perfectionist, please consider this newer version to make your mind peaceful (if only temporarily :)).


Regards,
iCasino

Revision History:

2012/03/05 Yacsi 0.9
2012/03/06 Yacsi 0.92
2012/03/11 Yacsi 0.93
2012/03/14 Yacsi 0.94
2012/03/18 Yacsi 0.95
2012/03/31 Yacsi 0.96
Feature added in 0.96:
Multi-threading capability is added to solve the problem of GUI freezing;
The user dictionary can be loaded from any folder on your computer instead of the current directory of Yacsi.exe.
Bug fixed:
The generated configure file Yacsi.xml can now be viewed in a standard web browser.

链接中为YACSI分词工具的各个历史版本。
http://ishare.iask.sina.com.cn/f/24241229.html
 
回复: Yacsi: Another ICTCLAS 2012 GUI

YACSI works pretty well. The only problem is that it only generates "character list" of the segmented texts (see below). It would be great if a "word list" can also be automatically generated. Many thanks!

字频_tagged.txt

, , 95
的 , 86
。 , 51
国 , 43
我 , 37
在 , 33
是 , 27
们 , 26
不 , 23
美 , 21
为 , 21
一 , 20
民 , 20
他 , 20
选 , 16
大 , 16
以 , 16
政 , 15
...
 
回复: Yacsi: Another ICTCLAS 2012 GUI

YACSI works pretty well. The only problem is that it only generates "character list" of the segmented texts (see below). It would be great if a "word list" can also be automatically generated. Many thanks!

字频_tagged.txt

, , 95
的 , 86
。 , 51
国 , 43
我 , 37
在 , 33
是 , 27
们 , 26
不 , 23
美 , 21
为 , 21
一 , 20
民 , 20
他 , 20
选 , 16
大 , 16
以 , 16
政 , 15
...

Thanks for your interest. But I can't remember YACSI can generate a "character list". Can you give me more detail regarding your problem?

Also YACSI is not intended to be all-inclusive. To make a frequency list of "words", there are many good tools out there already. Antconc and MyZiCiFreq字词频率统计工具.exe (http://www.cncorpus.org/download.aspx?n=MyZiCiFreqTool.rar) can do this job very well (to name just a few). My philosophy is not to reinvent the wheel until absolutely necessary.
 
Last edited:
Back
顶部