请问检索LOB和BROWN语料库时怎么去掉行号和标记

回复: 请问检索LOB和BROWN语料库时怎么去掉行号和标记

Be specific in your question: Are you using some online concordancer? Or are you using the corpora in your own machine and you have control over the text?
 
回复: 请问检索LOB和BROWN语料库时怎么去掉行号和标记

i am using wordsmith4.0, but after concodance all the line numbers are retained such as L040 etc. How can I remove all these nuisances? Is there any other software to do the concordance of LOB and BROWN?
 
回复: 请问检索LOB和BROWN语料库时怎么去掉行号和标记

They are not nuisances. If you don't want them there, simply Find & Replace them to wipe them off before doing concordances with WordSmith. Here is how to Find & Replace them:

1. You need EditPlus to help you with this. You can get an evaluation version at: http://www.editplus.com.

2. Open the files (e.g. all the 15 LOB files) with EditPlus (you'd better backup your files first);

3. In the menu, click Search, Replace, type ^[a-z0-9]+[ ]+[0-9]+[ ] in Find what, and leave it empty for Replace with;

4. Check the option Regular expression and All open files, and click Replace all to get all the files ready for you to do your desired "nuisances-free" concordances.

The same applies to BROWN corpus.

Good luck!
 
回复: 请问检索LOB和BROWN语料库时怎么去掉行号和标记

They are not nuisances. If you don't want them there, simply Find & Replace them to wipe them off before doing concordances with WordSmith. Here is how to Find & Replace them:

1. You need EditPlus to help you with this. You can get an evaluation version at: http://www.editplus.com.

2. Open the files (e.g. all the 15 LOB files) with EditPlus (you'd better backup your files first);

3. In the menu, click Search, Replace, type ^[a-z0-9]+[ ]+[0-9]+[ ] in Find what, and leave it empty for Replace with;

4. Check the option Regular expression and All open files, and click Replace all to get all the files ready for you to do your desired "nuisances-free" concordances.

The same applies to BROWN corpus.

Good luck!

thanks a lot, laohong, dr xiao! you both are the first to come to the rescue of every c-pal with detailed explanations and professional expertise! happy labor day holiday to you both and other administrators!
 
回复: 请问检索LOB和BROWN语料库时怎么去掉行号和标记

maybe lob and brow are not "clear" material-- many tags have been inserted already-- so you can search on line to down load the software that are developed for lob-- these soft can search these text effectively
.
so is the case of brown
 
回复: 请问检索LOB和BROWN语料库时怎么去掉行号和标记

Thank you so much. Is there software specifically designed for lob and brown?
 
回复: 请问检索LOB和BROWN语料库时怎么去掉行号和标记

问个问题。为什么editplus不支持
[*],<*>, *不是代表所有的任意字符吗?而且*也是正则里的东东啊?
我的意思是我想去掉所有的
[*],<*>。如果编写公式??? 我写
[*],<*>,,发现无结果。
 
回复: 请问检索LOB和BROWN语料库时怎么去掉行号和标记

问个问题。为什么editplus不支持
[*],<*>, *不是代表所有的任意字符吗?而且*也是正则里的东东啊?
我的意思是我想去掉所有的
[*],<*>。如果编写公式??? 我写
[*],<*>,,发现无结果。

请阅读 EditPlus 的帮助文件。
 
回复: 请问检索LOB和BROWN语料库时怎么去掉行号和标记

请阅读 EditPlus 的帮助文件。

laohong 在啊。我马上去阅读。但您能不能指导下如果去掉所有的[],<>这样的东西,公式如何编写?我昨天弄了老半天也没成功。谢谢了

而且发现brown的比较好处理,因为它的码都是出现在段落首!但如果象clec那样,在段落中间有码那??
 
回复: 请问检索LOB和BROWN语料库时怎么去掉行号和标记

其实很多检索软件都带了过滤功能,过滤后的结果就是没有码的,那样可以重新保存在txt吗?
 
回复: 请问检索LOB和BROWN语料库时怎么去掉行号和标记

我觉得如果你的库中有大写字母的话 前面应该这样写,对不

^[a-zA-Z0-9]...........

没有必要,除非你选择“Case Sensitive”。
 
回复: 请问检索LOB和BROWN语料库时怎么去掉行号和标记

laohong 在啊。我马上去阅读。但您能不能指导下如果去掉所有的[],<>这样的东西,公式如何编写?我昨天弄了老半天也没成功。谢谢了

而且发现brown的比较好处理,因为它的码都是出现在段落首!但如果象clec那样,在段落中间有码那??

[a-z0-9]+[ ]+[0-9]+[ ]前加^的目的就是要找所有出现在段首的符合那个表达式的字符;如果去掉^就可以找到文本中任意地方符合的字符了。

要取掉所有 [] 标记及其内的东西,可以试一下:
如果 [] 内只有字母,没有空格:\[[a-z]+\]
如果 [] 内既有字母,又有空格:\[[a-z ]+\]
如果 [] 内既有字母,又有数字:\[[a-z0-9]+\]
……
同理,可以处理其它符号,具体内容请阅读 EditPlus 里关于正则表达式的帮助部分。
 
回复: 请问检索LOB和BROWN语料库时怎么去掉行号和标记

其实很多检索软件都带了过滤功能,过滤后的结果就是没有码的,那样可以重新保存在txt吗?

指望能有那个软件帮你解决所有问题的想法是不切实际的。自己动手学习做一些基本的文本处理的工作是必须的。
 
Back
顶部