BFSU English Sentence Segmenter 1.0 Freeware 英文自动分句工具

xujiajin

管理员
Staff member
BFSU Sentence Segmenter 1.0

BFSU Sentence Segmenter 1.0 was programmed by Mr. Yunlong Jia, and designed by Dr. Jiajin Xu. This tool converts any loaded English plain text(s) into one sentence per line format. We have maximized our effort to ignore abbreviations like Dr., Mr., Mrs., B.C. as sentence final marks, but it is almost impossible to exhaust all exceptional and/or ad hoc abbreviations which might be deceptively recognized as sentence final positions. We therefore allow users to customize the list of abbreviations in “Abbrev.ini". In that case, the more abbreviations you add to the list, the less improper segmentations will be possible. Please note that alphanumeric strings in the list are case-sensitive.

The result of segmentation will be saved after execution in the same directory of the source text(s) in the same filename(s) with the extension .seg.

Post-editing is always necessary for serious language workers.

Please cite the program as:
Xu, Jiajin& Yunlong Jia. (2010). BFSU Sentence Segmenter 1.0. Beijing: National Research Center for Foreign Language Education, Beijing Foreign Studies University.

BFSU Sentence Segmenter 1.0 is freeware. The software comes on an “as is” basis, and the authors will accept no liability for any damage that results from using the software.

Bug reports will be highly appreciated and should be sent to WilliamJia@OpenCorpus.org.



相关讨论
Automatic Sentence Segmentation

Minor update: Change of words in About
 

附件

  • BFSU Sentence Segmenter 1.0.rar
    357.5 KB · 浏览: 518
回复: BFSU English Sentence Segmenter 1.0 Freeware

Thank you, Dr. Xu. You're always so kind and helpful.
 
回复: BFSU English Sentence Segmenter 1.0 Freeware

先下为快!
软件界面简洁,运行速度飞快,可极大提高英语语料句分效率!
感谢许博士、贾博士分享!
 
回复: BFSU English Sentence Segmenter 1.0 Freeware

试用了,与lextutor的sentence extractor比较,特点明显,非常好,尤其是批量处理.
abbrev.ini很有创意.
一个小bug:括号后的句际标点的处理,如")."一类.
一个小建议:处理后的文件仍用.text格式,而seg可置于文件名前或末端.
 
回复: BFSU English Sentence Segmenter 1.0 Freeware

).可以自己添加。

采用不同后缀是文本转化过程中的通行做法。
 
回复: BFSU English Sentence Segmenter 1.0 Freeware

...

采用不同后缀是文本转化过程中的通行做法。

又学习了,长见识了.但"通行做法"的 .seg 打开时需要改扩展名, 确实不便.(姑枉言之,姑枉听之吧.)
 
回复: BFSU English Sentence Segmenter 1.0 Freeware

又学习了,长见识了.但"通行做法"的 .seg 打开时需要改扩展名, 确实不便.(姑枉言之,姑枉听之吧.)

可以右键点选.seg文件,在属性里把默认打开程序改成记事本Notepad,以后双击就自动打开了。
 
回复: BFSU English Sentence Segmenter 1.0 Freeware

感谢许博士和贾博士!效率真高!
 
回复: BFSU English Sentence Segmenter 1.0 Freeware 英文自动分句工具

西方有mark davies,中国有Dr.xu and jia!
 
回复: BFSU English Sentence Segmenter 1.0 Freeware 英文自动分句工具

:)发一个Full of Sound and Fury, Signifying Nothing的测试文本:

"Look this room, St. Paul!", said Tom (Tim's litter brother). "'That's great, No. 110.911 is OK', William(威廉) Shakespeare(v.s. Dr. Greene) said." OK! What does you mean(i.e. I'm confused), I want to know?"

(后面有最新版,附件已删除)
 
Last edited:
回复: BFSU English Sentence Segmenter 1.0 Freeware 英文自动分句工具

近期将发布BFSU句子处理工具包的:
BFSU NewWords Marker 1.0 (根据指定词表标注句中的生词)
 
回复: BFSU English Sentence Segmenter 1.0 Freeware 英文自动分句工具

惊喜连连!感谢贾博士!期待ing
 
Last edited:
回复: BFSU English Sentence Segmenter 1.0 Freeware 英文自动分句工具

:)发一个Full of Sound and Fury, Signifying Nothing的测试文本:

"Look this room, St. Paul!", said Tom (Tim's litter brother). "'That's great, No. 110.911 is OK', William(威廉) Shakespeare(v.s. Dr. Greene) said." OK! What does you mean(i.e. I'm confused), I want to know?"

附件为最新修正版:

谢谢您的辛苦!
测试了一小段来自“Gutenberg”的文本。
测试前与测试后文本没有差别(除了空行被清除)。
本想贴上图片,没能成功。
传个原文件,您自己测试。
我分析主要原因是,Gutenberg上的文本都是以硬回车换行的,而您的这个软件似乎把硬回车看作断句的标志。是不是在程序中加入“先清除硬回车符,再断句”的语句。当然其实我也知道先用别的工具把硬回车清掉,也许就可以解决这个问题,但如果您的软件能一气呵成,那就更好了。再次感谢您的辛苦!
 

附件

  • Untitled.txt
    1.1 KB · 浏览: 12
回复: BFSU English Sentence Segmenter 1.0 Freeware 英文自动分句工具

我们之前考虑过这个问题,但我们想不应该让机器解决用户的所有问题。

所以我们在说明里特别强调This tool is only happy with good English texts. Please keep your texts as clean as possible. Dirty texts will upset the software.

网页上和pdf文本里得来的文字,常常有line break。

你可以通过Word或者别的工具将不合理的换行符(^l)或回车(^p)清理后,再来分句。

顺便说一下,我们都是利用业余时间在做这些事情。不能保证有时间处理大家的所有问题。请理解。
 
回复: BFSU English Sentence Segmenter 1.0 Freeware 英文自动分句工具

我们之前考虑过这个问题,但我们想不应该让机器解决用户的所有问题。

所以我们在说明里特别强调This tool is only happy with good English texts. Please keep your texts as clean as possible. Dirty texts will upset the software.

网页上和pdf文本里得来的文字,常常有line break。

你可以通过Word或者别的工具将不合理的换行符(^l)或回车(^p)清理后,再来分句。

顺便说一下,我们都是利用业余时间在做这些事情。不能保证有时间处理大家的所有问题。请理解。

十二万分地理解!
谢谢你们的无私奉献!
 
回复: BFSU English Sentence Segmenter 1.0 Freeware 英文自动分句工具

哇塞!你的理解好显著啊!
 
Back
顶部