BFSU English Sentence Segmenter 1.0 Freeware 英文自动分句工具

回复: BFSU English Sentence Segmenter 1.0 Freeware 英文自动分句工具

清理line break 尽量不要使用Word, Word太耗资源。建议使用UltraEdit或其他纯文本编辑器。
一般分四步清理(以UltraEdit为例):
1. 用某个标识符(如:-_-,选择标识符时,要确保原文不含标识符,不然会出错)替换^p^p,作用是记录原文的段落。
2. 用空格代替^p
3. 删除多余空格
4. 用^p代替标识符

替换前:
This is
a good
test.

You can
try
it, if
you
like.

第一步:
This is
a good
test._-_You can
try
it, if
you
like.

第二步:
This is a good test._-_You can try it, if you like.

第三步:(无多余空格省略)

第四步:
This is a good test.
You can try it, if you like.
 
Last edited:
回复: BFSU English Sentence Segmenter 1.0 Freeware 英文自动分句工具

附件为2010-3-25日修正版,主要修改如下

1)改写了处理“()<>[]{}”的算法,可以在不对原文进行改动的基础上进行切分。
 

附件

  • BFSU Sentence Segmenter 1.0.rar
    360.3 KB · 浏览: 108
Last edited:
回复: BFSU English Sentence Segmenter 1.0 Freeware 英文自动分句工具

谢谢您的辛苦!
测试了一小段来自“Gutenberg”的文本。
测试前与测试后文本没有差别(除了空行被清除)。
本想贴上图片,没能成功。
传个原文件,您自己测试。
我分析主要原因是,Gutenberg上的文本都是以硬回车换行的,而您的这个软件似乎把硬回车看作断句的标志。是不是在程序中加入“先清除硬回车符,再断句”的语句。当然其实我也知道先用别的工具把硬回车清掉,也许就可以解决这个问题,但如果您的软件能一气呵成,那就更好了。再次感谢您的辛苦!


lextutor.ca上试了一下,好象没有你上面提及的问题,结果如下:

[The Man Who Was Thursday by G K Chesterton 1908] To Edmund Clerihew Bentley A cloud was on the mind of men, and wailing went the weather, Yea, a sick cloud upon the soul when we were boys together.
Science announced nonentity and art admired decay; The world was old and ended: but you and I were gay; Round us in antic order their crippled vices came-- Lust that had lost its laughter, fear that had lost its shame.

Like the white lock of Whistler, that lit our aimless gloom, Men showed their own white feather as proudly as a plume.

Life was a fly that faded, and death a drone that stung; The world was very old indeed when you and I were young.

They twisted even decent sin to shapes not to be named: Men were ashamed of honour; but we were not ashamed.

Weak if we were and foolish, not thus we failed, not thus; When that black Baal blocked the heavens he had no hymns from us Children we were--our forts of sand were even as weak as eve, High as they went we piled them up to break that bitter sea.

Fools as we were in motley, all jangling and absurd, When all church bells were silent our cap and beds were heard.



--------------------------------------------------------------------------------
7 sentences, 31.4285714285714 words average
 
回复: BFSU English Sentence Segmenter 1.0 Freeware 英文自动分句工具

Minor upgrade. See the top post.
 
Back
顶部