有无汉语只分词不标注的软件?

Thanks a lot, laohong & Dr. Xiao.

ICTCLAS不错。发现它只吃ASCI编码,不吃Unicode.
Chinese segmenter and annotation tool, 单击bat文件, 不知为何只闪1秒就消失了?
 
Here is how you can use the segment.zip version (http://www.mandarintools.com/download/segment.zip):

1. You should unzip the 5 files into one folder, say, C:\Temp\New.

2. Before using the tool, you should make sure your computer has active perl installed. Simply check your C drive and C:\Program Files to see whether there is a directory named Perl. If not, go to this page to download the latest version (http://aspn.activestate.com/ASPN/Downloads/ActivePerl), it's better to choose the Windows MSI version to install.

3. Use Notepad to open your input Chinese text, save as GB text, for example, I saved your message above in the txt file named as input.txt. Then put it in the same folder with the tool.

4. Then, use Notepad to open the Segment.bat file, only change "1%" to input.txt, then save it.

5. Now double click Segment.bat to run, you'll get a new file named as input.seg in the folder when it's done. Open the input.seg file with Notepad, you'll see the result. Here is the test output of the text from your message above.

ICTCLAS 不错 。 发现 它 只 吃 ASCI 编码 , 不吃 Unicode. Chinese segmenter and annotation tool, 单击 bat 文件 , 不知 为何 只 闪 1 秒 就 消失了 ?
 
laohong, I try the perl version and it works well, but the java version meets the same problem as jiji does,that is , the Dos screen flashes a moment and then yields nothing. could you kindly help me solve the problem?
 
It's all about how you use it under DOS.

1. Make sure your computer has Java installed. To check it, see whether you can find a folder named as Java under C drive or C:\Program Files\. If not, click the link here to install it: http://www.java.com/en/

2. Put the segmenter.jar file to a folder, say, C:\Temp. Then save your Chinese text file in the same folder. Make sure the file is saved in GB, B5 or UTF-8 encoding. Here I'm using an example text named as input.txt in UTF-8 encoding.

3. Click Start, Run, in the Open box, key in cmd, then OK to see a DOS command window pop out.

4. In the DOS window, key in cd\, then Enter, you'll see C:\>_ there. Next, key in cd Temp, then Enter to get C:\Temp>_.

5. Now immediately after C:\Temp> key in

java -jar segmenter.jar -8 input.txt

(note the space) Enter to get your result in the C:\Temp\ , with a name as input.txt.seg.

If your text is in GB encoding, replace -8 with -g in the above command. If the text is in Big5 encoding, replace -8 with -b. Good luck!
 
回复:有无汉语只分词不标注的软件?

The problems stated here are among the things that motivate ACWT.
 
可以去这里看看,北大詹卫东教授的课程主页上提供分词工具下载,不知道是不是你需要的。
 
Back
顶部