Automatic Sentence Segmentation

This program converts a plain running text into one-sentence-per-line format by simply adding a return code after the standard sentence terminal marks. The current version (ver. 2) covers most common and frequently-used abbreviations such as Mr., Dr., Prof., a.m., p.m. as well as sentence-initial list numbers, but you still have to post edit the results manually for other less common abbreviations. Note that when initials are followed by the "period + space" combination as in G. W. Bush the strings containing them will be divided immediately after the period mark, resulting in improper segmenetation in most cases. Also note that all strings must be terminited to be considered an indepenedent sentential unit.
回复: Automatic Sentence Segmentation

1 This program converts a plain running text into one-sentence-per-line format by simply adding a return code after the standard sentence terminal marks.
2 The current version (ver. 2) covers most common and frequently-used abbreviations such as Mr., Dr., Prof., a.m., p.m. as well as sentence-initial list numbers, but you still have to post edit the results manually for other less common abbreviations.
3 Note that when initials are followed by the "period + space" combination as in G.
4 W.
5 Bush the strings containing them will be divided immediately after the period mark, resulting in improper segmenetation in most cases.
6 Also note that all strings must be terminited to be considered an indepenedent sentential unit.
回复: Automatic Sentence Segmentation
回复: Automatic Sentence Segmentation

sentence segmentation现在没有任何程序能百分百,但也没那么复杂. 要博士们编这样的软件,实在是'杀鸡用牛刀"了,无异于让建筑大师们去工地当小工使, 去扎钢筋搅拌水泥粉大墙,这是多么大的浪费呵.这活还是让咱民工干吧,这是咱的活,干得一定比博士好:p
句切分没有技术含量,我们啥都不是,只会用"电风扇"即word去句切分,千万以上的文字都处理了,可见这没什么花头,没什么神秘的. 但是具体操作要有针对性,不要指望写好一个"宏"或一个程序就能处理好所有的文本.比方说".?!"是句际标记,但如果带引号该如何处理?单引号,双引号, 单双引号连用, 引号前有空格之有无,小数点,缩略语....等等,等等,都是句切分要考虑的问题.