
回复: 请问检索LOB和BROWN语料库时怎么去掉行号和标记

Be specific in your question: Are you using some online concordancer? Or are you using the corpora in your own machine and you have control over the text?
回复: 请问检索LOB和BROWN语料库时怎么去掉行号和标记

i am using wordsmith4.0, but after concodance all the line numbers are retained such as L040 etc. How can I remove all these nuisances? Is there any other software to do the concordance of LOB and BROWN?
回复: 请问检索LOB和BROWN语料库时怎么去掉行号和标记

They are not nuisances. If you don't want them there, simply Find & Replace them to wipe them off before doing concordances with WordSmith. Here is how to Find & Replace them:

1. You need EditPlus to help you with this. You can get an evaluation version at: http://www.editplus.com.

2. Open the files (e.g. all the 15 LOB files) with EditPlus (you'd better backup your files first);

3. In the menu, click Search, Replace, type ^[a-z0-9]+[ ]+[0-9]+[ ] in Find what, and leave it empty for Replace with;

4. Check the option Regular expression and All open files, and click Replace all to get all the files ready for you to do your desired "nuisances-free" concordances.

The same applies to BROWN corpus.

Good luck!
回复: 请问检索LOB和BROWN语料库时怎么去掉行号和标记

They are not nuisances. If you don't want them there, simply Find & Replace them to wipe them off before doing concordances with WordSmith. Here is how to Find & Replace them:

1. You need EditPlus to help you with this. You can get an evaluation version at: http://www.editplus.com.

2. Open the files (e.g. all the 15 LOB files) with EditPlus (you'd better backup your files first);

3. In the menu, click Search, Replace, type ^[a-z0-9]+[ ]+[0-9]+[ ] in Find what, and leave it empty for Replace with;

4. Check the option Regular expression and All open files, and click Replace all to get all the files ready for you to do your desired "nuisances-free" concordances.

The same applies to BROWN corpus.

Good luck!

thanks a lot, laohong, dr xiao! you both are the first to come to the rescue of every c-pal with detailed explanations and professional expertise! happy labor day holiday to you both and other administrators!
回复: 请问检索LOB和BROWN语料库时怎么去掉行号和标记

maybe lob and brow are not "clear" material-- many tags have been inserted already-- so you can search on line to down load the software that are developed for lob-- these soft can search these text effectively
so is the case of brown
回复: 请问检索LOB和BROWN语料库时怎么去掉行号和标记

Thank you so much. Is there software specifically designed for lob and brown?
回复: 请问检索LOB和BROWN语料库时怎么去掉行号和标记

[*],<*>, *不是代表所有的任意字符吗?而且*也是正则里的东东啊?
[*],<*>。如果编写公式??? 我写
回复: 请问检索LOB和BROWN语料库时怎么去掉行号和标记

[*],<*>, *不是代表所有的任意字符吗?而且*也是正则里的东东啊?
[*],<*>。如果编写公式??? 我写

请阅读 EditPlus 的帮助文件。
回复: 请问检索LOB和BROWN语料库时怎么去掉行号和标记

请阅读 EditPlus 的帮助文件。

laohong 在啊。我马上去阅读。但您能不能指导下如果去掉所有的[],<>这样的东西,公式如何编写?我昨天弄了老半天也没成功。谢谢了

回复: 请问检索LOB和BROWN语料库时怎么去掉行号和标记

回复: 请问检索LOB和BROWN语料库时怎么去掉行号和标记

我觉得如果你的库中有大写字母的话 前面应该这样写,对不


没有必要,除非你选择“Case Sensitive”。
回复: 请问检索LOB和BROWN语料库时怎么去掉行号和标记

laohong 在啊。我马上去阅读。但您能不能指导下如果去掉所有的[],<>这样的东西,公式如何编写?我昨天弄了老半天也没成功。谢谢了


[a-z0-9]+[ ]+[0-9]+[ ]前加^的目的就是要找所有出现在段首的符合那个表达式的字符;如果去掉^就可以找到文本中任意地方符合的字符了。

要取掉所有 [] 标记及其内的东西,可以试一下:
如果 [] 内只有字母,没有空格:\[[a-z]+\]
如果 [] 内既有字母,又有空格:\[[a-z ]+\]
如果 [] 内既有字母,又有数字:\[[a-z0-9]+\]
同理,可以处理其它符号,具体内容请阅读 EditPlus 里关于正则表达式的帮助部分。
回复: 请问检索LOB和BROWN语料库时怎么去掉行号和标记

