如何操作:要在语料库中插入一个符号

oscar3

高级会员
本人要在语料库中每一个文本的第一行插入一个符号,因为第一行全部是标题,暂且定为插入<T>吧,请问各位如何操作?因为,语料库有400多个文档,手工不经济,更重要是怕出错。谢谢!
 
回复: 如何操作:要在语料库中插入一个符号

It's pretty easy to do if you use PowerGrep.

The following assumes that your files are clean: e.g., there are no blank lines or blank space in front of the actual first line. (If you do have blank lines at the beginning of the file, the <T> tag will be inserted at the first blank line instead of your first word. In that case read the next post instead.)

Using PowerGrep to insert the <T> tag:

Select the files you want to add the tag to in the File Selector.
Start with a fresh action.
Set the action type to "search-and-replace". Set the search type to "list of regular expressions".
In the Search box, enter the regular expression \A. This regular expression matches the position at the very start of the file.
In the Replacement box, enter <T> or any other tags that you want to insert.
Set the target and backup file options as you like them.
Click the Replace button to actually add the tag.
 
回复: 如何操作:要在语料库中插入一个符号

You can also use PowerGrep to get rid of the blank lines and clean up your files first. So if your files are not clean do the following first then do the above steps.

<Have you made a backup copy of your files before you do anything?>

To clean up the blank lines in your files using PowerGrep:

Select the files you want to clean up in the File Selector.
Start with a fresh action.
Set the action type to "collect data". Set the search type to "literal text".
In the Search box, enter the regular expression ^$. This regular expression matches all blank lines in your file(s).
In File Sectioning, select Line by Line.
Check Invert Search Results (this option is to get rid of the blank lines in the result files.)
Set the target and backup file options as you like them. (Now is another chance to save a copy of your original file if you haven't done so.)
Click the Collect Data button on top of the window to execute the command. The result files should be free of empty lines.

With the clean files as input, you can now use the \A method outlined above to insert the tag to the beginning of the first line (i.e. the title line).
 
回复: 如何操作:要在语料库中插入一个符号

Thank you, 动态语法, for your two replies. I have tried as you suggest here.I am afraid something's wrong, PowerGrep(V2.23) fails to insert the tag in the files, although it can display all the titles with the tag at the beginning of it in the result box. Shall change another version of PowerGREP?
 

附件

  • result.txt
    51.7 KB · 浏览: 16
回复: 如何操作:要在语料库中插入一个符号

Fortunately, I suceed in picking all the titles from the files.
 
回复: 如何操作:要在语料库中插入一个符号

Thank you, 动态语法, for your two replies. I have tried as you suggest here.I am afraid something's wrong, PowerGrep(V2.23) fails to insert the tag in the files, although it can display all the titles with the tag at the beginning of it in the result box. Shall change another version of PowerGREP?

II saw the <TITLE> tag in your files. The result box is just a preview of what's actually there. Click the Replace button and you should be able to insert the tag.

Also, why just 184 matches in 184 files? There should be about 400 matches in 400 files according to your first post.
 
回复: 如何操作:要在语料库中插入一个符号

Hi, 动态语法, I have got it. The failure may be caused by the cracked old version of PowerGrep. The V3.41 works perfectly. I did not processed all the files, only 184 as a test on the methodology. Thank you again, 动态语法!:D
 
回复: 如何操作:要在语料库中插入一个符号

Replace\A with Replace\A<TITLE> in PowerGREP.
 
Back
顶部