paraconc使用

ParaConc 的使用

是的,文本预先对齐,检索就方便多了。这里给个例子做参考:

中文原文(10句,最好用记事本存成纯文本格式,GB编码):

<seg>那時又將天晚,趙姨娘的聲音只管喑啞起來了,居然鬼嚎一般。</seg>
<seg>無人敢在她跟前,只得叫了幾個有膽量的男人進來坐著。</seg>
<seg>趙姨娘一時死去,隔了些時,又回過來,整整的鬧了一夜。</seg>
<seg>到了第二天,也不言語,只裝鬼臉,自己拿手撕開衣服,露出胸膛,好像有人剝她的樣子。</seg>
<seg>可憐趙姨娘雖說不出來,其痛苦之狀,實在難堪。正在危急,大夫來了,也不敢診脈,只囑咐辦後事罷。說了,起身就走。</seg>
<seg>那送大夫的家人再三央告,說#請老爺看看脈,小的好回稟家主。」</seg>
<seg>那大夫用手一摸,已無脈息。</seg>
<seg>賈環聽了,然後大哭起來。眾人只顧賈環,誰料理趙姨娘。</seg>
<seg>只有周姨娘心裏苦楚。</seg>
<seg>想到做偏房側室的下場頭,不過如此。</seg>


英文原文(对应的10句,用记事本存成纯文本文件):

<seg>She was a terrifying sight , and no one now dared go near her . </seg>
<seg>By evening her voice began to grow hoarse and she sounded more and more like a croaking harpy . </seg>
<seg>None of the women could bear to be in her presence , and they deputed some of the more courageous menfolk to come in and keep watch on her . </seg>
<seg>One minute she seemed to be gone , then she came round again , and so it went on all night . </seg>
<seg>By the next morning she was incapable of speech , her face was horribly contorted and she began rending her clothes and baring her bosom , as if someone else was stripping her naked . </seg>
<seg>She was now totally inarticulate , and the torment she was undergoing was terrible to behold . She seemed to have reached a final crisis , when the doctor arrived . He would not take her pulse , but gave orders at once for her last things to be made ready and himself prepared to leave without further ado . </seg>
<seg>The servant who had brought him entreated him to stay and take her pulse , so that he could at least return with a satisfactory report to his master , and in the end the doctor relented . </seg>
<seg>He felt her pulse once , and pronounced that there was no sign of life . </seg>
<seg>Hearing this , Jia Huan burst out wailing , and immediately everyone 's attention was turned to him and no one spared another thought for Aunt Zhao , lying dead on the kang , her feet bare , her hair in disarray . </seg>
<seg>Only Aunt Zhou seemed affected . </seg>
<seg>She thought morbidly to herself that such is the end of a concubine ! </seg>

注意:这里给的文本句子的起始已经用<seg></seg>标注,因此在用ParaConc里的Load Corpus时,Align Format 里要选择 Start/End Tags。Good Luck!
 
thank you very much, laohong!
and how do you manage to add <seg> tags to the original, manually or automatically? if auto matically, is there some free tool to do this?

thanks again.
 
ParaConc 的使用

是的,是自动添加的,不过你得先把句断好才行。
 
回复: paraconc使用

从2楼的样例文本可以存成两个纯文本文件,一个Chn Seg.txt, 一个Eng Seg.txt。然后把这两个文件倒入ParaConc,得到下图:

http://www.corpus4u.org/attachment.php?attachmentid=67&stc=1&d=1165312146

敲Align Format那里的那个Options, 得到下图:

http://www.corpus4u.org/attachment.php?attachmentid=68&stc=1&d=1165312225

这样就可以得到aligned 好了的语料供检索了。

如果把文本中的<seg>和</seg>分别换成<p>和</p>,只要相应地把Align Format那里的那个Options也改成<p>和</p>就可以了。
 

附件

  • 1.jpg
    1.jpg
    25.5 KB · 浏览: 481
  • 2.jpg
    2.jpg
    2.8 KB · 浏览: 671
回复: paraconc使用

这里告诉一个简单的不需要编程的方法:

1、先把中英文本准备好,每行一个句子。注意中英文两个文本中的行数应该一致;如果要想检索到汉字,最好先做好分词处理或是汉字间加空格;

2、用EditPlus (可到http://www.editplus.com下载)打开文本;

3、然后敲菜单里的Search键,选择Replace, 并把Regular Expression 打勾;

4、在Find what里填入引号里的字符(不要复制引号)“\n”,并在Replace with里填入引号里的字符(不要复制引号)“</seg>\n<seg>”; 把鼠标放到文章最开始,然后敲Replace All就得到结果;

5、最后,把文本最后一行那个<seg>移到第一行句首就可以了。
 
回复: paraconc使用

请问,如果是人工对齐,也需要“中英文两个文本中的行数一致” 和 “分词处理或是汉字间加空格”吗? 我使用ParaConc检索,文本中有这个英文词,但是检索提示是no matches, 不知道什么原因?我试了下,检索the, 能显示检索结果。
 
Last edited by a moderator:
回复: paraconc使用

请问,如果是人工对齐,也需要“中英文两个文本中的行数一致” 和 “分词处理或是汉字间加空格”吗? 我使用ParaConc检索,文本中有这个英文词,但是检索提示是no matches, 不知道什么原因?我试了下,检索the, 能显示检索结果。

把你的文本贴上来,我们帮你看看?
 
回复: paraconc使用

谢谢laohong。
已经手动对齐,但是输入英文单词,检索不出结果。而且search为什么都没有highlight
非常感谢大师指点,目前在写关于新闻翻译的硕士毕业论文,刚接触paraconc,有点迷茫, 不知道怎样用软件分析。
下图是我对齐的文本, 没有对应的英文部分简单做了个标记。
 

附件

  • paraconc.doc
    137.8 KB · 浏览: 99
回复: paraconc使用

谢谢laohong。
已经手动对齐,但是输入英文单词,检索不出结果。而且search为什么都没有highlight
非常感谢大师指点,目前在写关于新闻翻译的硕士毕业论文,刚接触paraconc,有点迷茫, 不知道怎样用软件分析。
下图是我对齐的文本, 没有对应的英文部分简单做了个标记。

your doc when downloaded cannot be opened.
 
回复: paraconc使用

谢谢laohong。
已经手动对齐,但是输入英文单词,检索不出结果。而且search为什么都没有highlight
非常感谢大师指点,目前在写关于新闻翻译的硕士毕业论文,刚接触paraconc,有点迷茫, 不知道怎样用软件分析。
下图是我对齐的文本, 没有对应的英文部分简单做了个标记。

据我所知,paraconc不能检索MS-word格式的文档。况且你这doc文档打开之后是乱码,不知道使用了什么编码。能否将你说的那两个文本文件传上来。
 
回复: paraconc使用

据我所知,paraconc不能检索MS-word格式的文档。况且你这doc文档打开之后是乱码,不知道使用了什么编码。能否将你说的那两个文本文件传上来。

他上传的那个Word文档并不是文本,里面只是插了个ParaConc的截图。这样子就没法帮他看是什么问题了。
 
回复: paraconc使用

不好意思,马上上传,不知道这样可否?
 

附件

  • D_chinese 4.txt
    1.9 KB · 浏览: 62
  • D_english 4.txt
    3.6 KB · 浏览: 48
回复: paraconc使用

不好意思,马上上传,不知道这样可否?

看了,对齐有问题.文本未预处理(除噪),怎么能对齐呢?
此外,材料也没选好,咱电风扇也没法吹.唉,算了.咱撤!:p


再PS:
你是用软件对齐的吗? 什么软件?谢谢!
 
Last edited:
回复: paraconc使用

Your problem is caused by incorrect encoding. You will need to save the Chinese file as UTF8, and check the box UTF8 when you load the corpus into ParaConc.

As you have marked up the segments, you can also specify the tag <seg>.

不好意思,马上上传,不知道这样可否?
 
Back
顶部