BFSU Sentence Collector 1.0 基于语料库的英语例句提取工具

回复: BFSU Sentence Collector 1.0 基于语料库的英语例句提取工具

What's the flavor of the regular expressions? Could it be PCRE? If that's the case, the information at might be helpful. Thanks for any tips.
回复: BFSU Sentence Collector 1.0 基于语料库的英语例句提取工具

Is it possible to add one more feature to your wonderful program, that is, allowing the user to adjusting the font size in the GUI?

I showed this tool to my colleagues. They all sang highly of it. But they also complained that it is impossible to use it in the classroom as the font size is too small for the students to see on the projector screen.

I guess the font size is hard coded into the program. Although it is possible to change it in the output file due to its plain text nature (or use a html browser to do the job), we would be very grateful if we could do that in the GUI too.
回复: BFSU Sentence Collector 1.0 基于语料库的英语例句提取工具

ah, got it, edit the template.ini file!
It's so flexible. Thanks for your wonderful design.
回复: BFSU Sentence Collector 1.0 基于语料库的英语例句提取工具

Yes. Modify the template as you do with CSS for html.
回复: BFSU Sentence Collector 1.0 基于语料库的英语例句提取工具

What's the flavor of the regular expressions? Could it be PCRE? If that's the case, the information at might be helpful. Thanks for any tips.

Different flavors of regex share much of their matching patterns. I am familiar with Perl compatible regex, which works well on Sentence Collector. Try out other flavors of regex, if you want to test the compatibility of the tool.
回复: BFSU Sentence Collector 1.0 基于语料库的英语例句提取工具

Thanks for your reply. As far as I know, most programs written in Delphi use a reg flavor of PCRE, but I am not quite sure whether Sentence Collector (SC) also followed suit. Yes, you are right, any user can try it out for sure.

I am also a bit curious about the reg search function provided by SC(fast and smooth). I am now facing a paradox: It seems that for large data sets, some sort of index should be exploited for speed, but it is also common sense that regular expressions should work directly on line-based files. If the latter is true, I am wondering how we could scale up the system (when the data sets grow HUGE) while keeping the reg search function (in considering that the default data set is less than 6M with SC). Could you share any design decisions you have made in such a situation or I have made a wrong observation? Thanks for any pointers.
回复: BFSU Sentence Collector 1.0 基于语料库的英语例句提取工具

Thanks for your reply......

SC1.0使用的正则表达式不完全是PCRE(Perl Compatible Regular Expressions),CS支持通用的正则表达式语法:
1. \ Quote the next metacharacter
2. ^ Match the beginning of the line
3. . Match any character (except newline)
4. $ Match the end of the line (or before newline at the end)
5. | Alternation
6. () Grouping
7. [] Character class

1. * Match 0 or more times
2. + Match 1 or more times
3. ? Match 1 or 0 times
4. {n} Match exactly n times
5. {n,} Match at least n times
6. {n,m} Match at least n but not more than m times

1. \w Match a "word" character (alphanumeric plus "_")
2. \W Match a non-"word" character
3. \s Match a whitespace character
4. \S Match a non-whitespace character
5. \d Match a digit character
6. \D Match a non-digit character

回复: BFSU Sentence Collector 1.0 基于语料库的英语例句提取工具

informative and insightful, thanks a lot.
回复: BFSU Sentence Collector 1.0 基于语料库的英语例句提取工具

回复: BFSU Sentence Collector 1.0 基于语料库的英语例句提取工具

回复: BFSU Sentence Collector 1.0 基于语料库的英语例句提取工具

Yes. Modify the template as you do with CSS for html.

把其中的一句改成这样:div{margin-left:6px; font-size:60px; font-family:Georgia; margin-top:10pt;margin-bottom:10pt},便可以放大字体,便于在教室的屏幕上使用。
回复: BFSU Sentence Collector 1.0 基于语料库的英语例句提取工具

回复: BFSU Sentence Collector 1.0 基于语料库的英语例句提取工具

回复: BFSU Sentence Collector 1.0 基于语料库的英语例句提取工具

回复: BFSU Sentence Collector 1.0 基于语料库的英语例句提取工具

请问如何使用自己的语料?setting 那里不可以换
回复: BFSU Sentence Collector 1.0 基于语料库的英语例句提取工具

请问如何使用自己的语料?setting 那里不可以换

It is possible to use our own corpus in Sentence Collector 1.0, with a little bit twisting with sentence segmentation, sorting, new words marking,file format conversion, index configuration, etc ... But I am not quite sure whether that is what the authors have wanted us to do with it. Certainly if you have gone that far, you might be tempted to write a new tool instead.