请教:如何用editplus作中文“句提取”工具(sentence extractor)

oscar3

高级会员
#2
回复: 请教:如何用editplus作中文“句提取”工具(sentence extractor)

如题。或是否有 类似 http://www.lextutor.ca/tools/ex_sentences/ 这样的sentence extractor? 谢谢!
EditPlus supports following regular expressions in Find, Replace and Find in Files command.


Expression Description
\t Tab character.
\n New line.
. Matches any character.
| Either expression on its left and right side matches the target string. For example, "a|b" matches "a" and "b".
[] Any of the enclosed characters may match the target character. For example, "[ab]" matches "a" and "b". "[0-9]" matches any digit.
[^] None of the enclosed characters may match the target character. For example, "[^ab]" matches all character EXCEPT "a" and "b". "[^0-9]" matches any non-digit character.
* Character to the left of asterisk in the expression should match 0 or more times. For example "be*" matches "b", "be" and "bee".
+ Character to the left of plus sign in the expression should match 1 or more times. For example "be+" matches "be" and "bee" but not "b".
? Character to the left of question mark in the expression should match 0 or 1 time. For example "be?" matches "b" and "be" but not "bee".
^ Expression to the right of ^ matches only when it is at the beginning of line. For example "^A" matches an "A" that is only at the beginning of line.
$ Expression to the left of $ matches only when it is at the end of line. For example "e$" matches an "e" that is only at the end of line.
() Affects evaluation order of expression and also used for tagged expression.
\ Escape character. If you want to use character "\" itself, you should use "\\".

The tagged expression is enclosed by (). Tagged expressions can be referenced by \0, \1, \2, \3, etc. \0 indicates a tagged expression representing the entire substring that was matched. \1 indicates the first tagged expression, \2 is the second, etc. See following examples.

Original Search Replace Result
abc (ab)(c) \0-\1-\2 abc-ab-c
abc a(b)(c) \0-\1-\2 abc-b-c
abc (a)b(c) \0-\1-\2 abc-a-c


I could not find regular expressions which support full-width character search under Editplus. You may try other tools.
 
#4
回复: 请教:如何用editplus作中文“句提取”工具(sentence extractor)

这里涉及到一个值得思考的问题:什么是句子?
倘若凡是以句号,问号,感叹号、省略号结尾的都当作句子的话,问题就简单多了。
 

xusun575

高级会员
#6
回复: 请教:如何用editplus作中文“句提取”工具(sentence extractor)

这里涉及到一个值得思考的问题:什么是句子?倘若凡是以句号,问号,感叹号、省略号结尾的都当作句子的话,问题就简单多了。
谢谢superyangt的关注. 除了您所提及的以外, 还要考虑包括 end quote内的"句号,问号,感叹号、省略号".
 
顶部