(求教)求以标准通用置标语言辅助人工标注的软件

回复: (求教)求以标准通用置标语言辅助人工标注的软件

Laohong您好!找Trados花了些时间,让您久等了!

一旦找到Trados,心里满满的全是幸福:)?

WinAlign里可以进行句子对齐的文件格式,没有TXT。便把TXT文本的内容,全拷到WORD文档里,这样进行起来就很顺利了。

在用WinAlign之前,我把已加段首段尾标注的语料,又用EditPlus做了些替换整理,为的是紧凑美观,如:

整理前(<P>后及</P>前都有空格,但是帖子一发出来就不显示空格了):
<P> 'Is John getting the carriage ready?' </P>
<P> 'Yes, sir.' </P>
<P> 'Is the luggage brought down?' </P>
<P> 'They are bringing it down, sir.' </P>
整理后(<P>后及</P>前没有空格):
<P>'Is John getting the carriage ready?'</P>
<P>'Yes, sir.'</P>
<P>'Is the luggage brought down?'</P>
<P>'They are bringing it down, sir.'</P>

只是不知如此整理出的文本是否符合规范?

按照WinAlign的《用户指南》,先用《简爱》第二十六章的内容作试验文本,一步步终于对齐了句子(如图所示,请以图的“实际大小”看图,那样没有水洇的感觉,眼睛很舒服的)。

做完后,就需要保存文本了,但是保存文本里中英文全混在一起了,PARACONC不检索这种文本呀,怎么办呢Laohong?

我好象迷路了。
 
Last edited by a moderator:
回复: (求教)求以标准通用置标语言辅助人工标注的软件

Laohong,我要发的图是BMP格式的,为的是您看着舒服,可网络不让传这种格式的图片,只好先发到您信箱里了。请查收,谢谢您!:)

还有,别忘了以图的“实际大小”看图,很清晰的。:)
 
回复: (求教)求以标准通用置标语言辅助人工标注的软件

The bi-text resulting from Winalign should look like the sample given below. At this point, a little bit of programming will separate the English and Chinese texts and save them into separate files for use with ParaConc, using the algorithm below:

if the line contains "<Seg L=EN-GB>"
then remove the first 13 characters and write the rest of the line into the English filename;

if the line contains "<Seg L=ZH-CN>"
then remove the first 13 characters and write the rest of the line into the Chinese filename;


---SAMPLE BEGIN---
<TrU>
<Quality>68
<CrU>XIAOZ
<CrD>29102006, 23:40
<Seg L=EN-GB>The Future of placeAfrica
<Seg L=ZH-CN>非洲的未来
</TrU>
<TrU>
<Quality>80
<CrU>XIAOZ
<CrD>29102006, 23:40
<Seg L=EN-GB>We are in a period of decisive historical significance for placeAfrica and its place in the world.
<Seg L=ZH-CN>我们非洲及其在世界的地位正处在决定性并具历史意义的时期。
</TrU>
<TrU>
<Quality>78
<CrU>XIAOZ
<CrD>29102006, 23:40
<Seg L=EN-GB>We are determined that this 2lst century shall be the African century.
<Seg L=ZH-CN>我们决心要使21世纪成为非洲的世纪。
</TrU>
<TrU>
<Quality>77
<CrU>XIAOZ
<CrD>29102006, 23:40
<Seg L=EN-GB>The idea of an African renaissance has taken hold in our continent with all the resonance of an idea whose time has come.
<Seg L=ZH-CN>非洲复兴的理想已在我们大陆扎下根,这理想实现的时刻已经到来,这想法引起大家的共鸣。
</TrU>
---SAMPLE END---
 
回复: (求教)求以标准通用置标语言辅助人工标注的软件

if the line contains "<Seg L=EN-GB>"
then remove the first 13 characters and write the rest of the line into the English filename;

if the line contains "<Seg L=ZH-CN>"
then remove the first 13 characters and write the rest of the line into the Chinese filename;

谢谢您肖教授!我不会编程,只能按照您教的办法,把原件中没有对齐的个别句子,一句一句手工改过来了。

您能教我如何编程自动完成此过程吗?谢谢您!
 
回复: (求教)求以标准通用置标语言辅助人工标注的软件

Please paste a sample paragraph from your aligned text.
 
回复: (求教)求以标准通用置标语言辅助人工标注的软件

英文本Jane_E26
<P><S>CHAPTER XXVI</S></P>
<P><S>SOPHIE came at seven to dress me: </S>
<S>she was very long indeed in accomplishing her task; so long that Mr. Rochester, grown, I suppose, impatient of my delay, sent up to ask why I did not come. </S>
<S>She was just fastening my veil (the plain square of blond after all) to my hair with a brooch; I hurried from under her hands as soon as I could.</S></P>
<P><S>'Stop!' she cried in French. </S>
<S>'Look at yourself in the mirror: you have not taken one peep.'</S></P>
<P><S>So I turned at the door: </S>
<S>I saw a robed and veiled figure, so unlike my usual self that it seemed almost the image of a stranger. </S>
<S>'Jane!' called a voice, and I hastened down. </S>
<S>I was received at the foot of the stairs by Mr. Rochester.</S></P>
<P><S>'Lingerer!' he said, 'my brain is on fire with impatience, and you tarry so long!'</S></P>
<P><S>He took me into the dining-room, surveyed me keenly all over, pronounced me 'fair as a lily, and not only the pride of his life, but the desire of his eyes,' and then telling me he would give me but ten minutes to eat some breakfast, he rang the bell. </S>
<S>One of his lately hired servants, a footman, answered it.</S></P>
<P><S>'Is John getting the carriage ready?'</S></P>
<P><S>'Yes, sir.'</S></P>
<P><S>'Is the luggage brought down?'</S></P>
<P><S>'They are bringing it down, sir.'</S></P>
<P><S>'Go you to the church: </S>
<S>see if Mr. Wood (the clergyman) and the clerk are there: </S>
<S>return and tell me.'</S></P>
<P><S>The church, as the reader knows, was but just beyond the gates; the footman soon returned.</S></P>
<P><S>'Mr. Wood is in the vestry, sir, putting on his surplice.'</S></P>
<P><S>'And the carriage?'</S></P>
<P><S>'The horses are harnessing.'</S></P>
<P><S>'We shall not want it to go to church; but it must be ready the moment we return: </S>
<S>all the boxes and luggage arranged and strapped on, and the coachman in his seat.'</S></P>
<P><S>'Yes, sir.'</S></P>
<P><S>'Jane, are you ready?'</S></P>
<P><S>I rose. </S>
<S>There were no groomsmen, no bridesmaids, no relatives to wait for or marshal: </S>
<S>none but Mr. Rochester and I. </S>
<S>Mrs. Fairfax stood in the hall as we passed. </S>
<S>I would fain have spoken to her, but my hand was held by a grasp of iron: </S>
<S>I was hurried along by a stride I could hardly follow; and to look at Mr. Rochester's face was to feel that not a second of delay would be tolerated for any purpose. </S>
<S>I wonder what other bridegroom ever looked as he did- so bent up to a purpose, so grimly resolute: </S>
<S>or who, under such steadfast brows, ever revealed such flaming and flashing eyes.</S></P>

中文本Jane_C26
<P><S>第二十六章</S></P>
<P><S>索菲娅七点钟来替我打扮,确实费了好久才大功告成。</S>
<S>那么久,我想罗切斯特先生对我的拖延有些不耐烦了,派人来问,我为什么还没有到。</S>
<S>索菲娅正用一枚饰针把面纱(毕竟只是一块淡色的普通方巾)系到我头发上,一待完毕,我便急急忙忙从她手下钻了出去。 </S></P>
<P><S>“慢着!”她用法语叫道。</S>
<S>“往镜子里瞧一瞧你自己,你连一眼都还没看呢。” </S></P>
<P><S>于是我在门边转过身来,</S>
<S>看到了一个穿了袍子,戴了面纱的人,一点都不像我往常的样子,就仿佛是一位陌生人的影像。</S>
<S>“简!”一个声音嚷道,我赶紧走下楼去。</S>
<S>罗切斯特先生在楼梯脚下迎着我。 </S></P>
<P><S>“磨磨蹭蹭的家伙,”他说,“我的脑袋急得直冒火星、你太拖拉了!” </S></P>
<P><S>他带我进了餐室,急切地把我从头到脚打量了一遍,声称我“像百合花那么美丽,不仅是他生活中的骄傲,而且也让他大饱眼福。”随后他告诉我只给我十分钟吃早饭,并按了按铃。</S>
<S>他新近雇用的一个仆人,一位管家应召而来。 </S></P>
<P><S>“约翰把马车准备好了吗?”</S></P>
<P><S>“好了,先生。”</S></P>
<P><S>“行李拿下去了吗?”</S></P>
<P><S>“他们现在正往下拿呢,先生。”</S></P>
<P><S>“上教堂去一下,</S>
<S>看看沃德先生(牧师)和执事在不在那里。</S>
<S>回来告诉我。”</S></P>
<P><S>读者知道,大门那边就是教堂,所以管家很快就回来了。</S></P>
<P><S>“沃德先生在法衣室里,先生,正忙着穿法衣呢。”</S></P>
<P><S>“马车呢?”</S></P>
<P><S>“马匹正在上挽具。”</S></P>
<P><S>“我们上教堂不用马车,但回来时得准备停当。</S>
<S>所有的箱子和行李都要装好捆好,车夫要在自己位置上坐好。”</S></P>
<P><S>“是,先生。”</S></P>
<P><S>“简,你准备好了吗?”</S></P>
<P><S>我站了起来,</S>
<S>没有男傧相和女傧相,也没有亲戚等候或引领。</S>
<S>除了罗切斯特先生和我,没有别人。</S>
<S>我们经过大厅时,费尔法克斯太太站在那里。</S>
<S>我本想同她说话,但我的手被铁钳似地捏住了,</S>
<S>让我几乎跟不住的脚步把我匆匆推向前去。一看罗切斯特先生的脸我就觉得,不管什么原因,再拖一秒钟他都不能忍耐了。</S>
<S>我不知道其他新郎看上去是不是像他这付样子——那么专注于一个目的,那么毅然决然;</S>
<S>或者有谁在那对稳重的眉毛下,露出过那么火辣辣,光闪闪的眼睛。</S></P>

上述 句子对应 皆在恩师laohong的指导下完成,谢谢laohong!:)

谢谢您,肖教授!:)
 
Last edited by a moderator:
回复: (求教)求以标准通用置标语言辅助人工标注的软件

No I want to see a sample from your Winalign.
 
回复: (求教)求以标准通用置标语言辅助人工标注的软件

OK. I have made a Winalign-ed copy with your data. Is the folloowing what you want?

After de-alignment: English text file:

<P><S>CHAPTER XXVI</S></P>
<P><S>SOPHIE came at seven to dress me:
</S>
<S>she was very long indeed in accomplishing her task; so long that Mr. Rochester, grown, I suppose, impatient of my delay, sent up to ask why I did not come.
</S>
<S>She was just fastening my veil (the plain square of blond after all) to my hair with a brooch; I hurried from under her hands as soon as I could.</S></P>
<P><S>'Stop!' she cried in French.
</S>
<S>'Look at yourself in the mirror:
you have not taken one peep.'</S></P>
<S>I saw a robed and veiled figure, so unlike my usual self that it seemed almost the image of a stranger.
</S>
<S>'Jane!' called a voice, and I hastened down.
</S>
<S>I was received at the foot of the stairs by Mr. Rochester.</S></P>
<P><S>'Lingerer!' he said, 'my brain is on fire with impatience, and you tarry so long!'</S></P>


After de-alignment: Chinese text file:

<P><S>第二十六章</S></P>
<P><S>索菲娅七点钟来替我打扮,确实费了好久才大功告成。
</S>
<S>那么久,我想罗切斯特先生对我的拖延有些不耐烦了,派人来问,我为什么还没有到。
</S>
<S>索菲娅正用一枚饰针把面纱(毕竟只是一块淡色的普通方巾)系到我头发上,一待完毕,我便急急忙忙从她手下钻了出去。 </S></P>
<P><S>“慢着!”她用法语叫道。
</S>
<S>“往镜子里瞧一瞧你自己,你连一眼都还没看呢。
” </S></P>
<S>看到了一个穿了袍子,戴了面纱的人,一点都不像我往常的样子,就仿佛是一位陌生人的影像。
</S>
<S>“简!”一个声音嚷道,我赶紧走下楼去。
</S>
<S>罗切斯特先生在楼梯脚下迎着我。 </S></P>
<P><S>“磨磨蹭蹭的家伙,”他说,“我的脑袋急得直冒火星、你太拖拉了!” </S></P>
 
回复: (求教)求以标准通用置标语言辅助人工标注的软件

The program for de-alignment is attached below. To use the program, do the following:

1. Make a new folder on your machine;
2. Download and unzip the program file named dealigner.pl into that foler;
3. Copy the file of translation units exported from Trados Winalign;
4. Go to http://www.activestate.com/store/productdetail.aspx?prdGuid=81fbce82-6bd5-49bc-a915-08d58c2648ca and download the FREE Perl and install it on your machine (and then forget it);
5. Double click my program file.

You will receive a prair of files starting with E_ and C_ plus your original bitext filenames for the English and Chinese sentences.
 

附件

  • dealigner.zip
    440 bytes · 浏览: 92
回复: (求教)求以标准通用置标语言辅助人工标注的软件

I would advise not to include sentence tags in texts to be aligned as they are a burden for Winalign. These can be added readily after alignment.

also the Chinese texts need to be tokenised before they can be used with ParaConc.
 
回复: (求教)求以标准通用置标语言辅助人工标注的软件

I would advise not to include sentence tags in texts to be aligned as they are a burden for Winalign. These can be added readily after alignment.

also the Chinese texts need to be tokenised before they can be used with ParaConc.

知道了,记住了,肖教授:)
 
回复: (求教)求以标准通用置标语言辅助人工标注的软件

谢谢您,肖教授!
所有工作只轻轻一点鼠标便完成!爽呵!

分解出来的英文本很干净。

中文本里出现了{\f4 }这样的标记,如:
<P>{\f4 “马车呢?”}</P>
<P>{\f4 “马匹正在上挽具。} {\f4 ”}</P>
我用“替换”为空,把它们全换掉了。

能编程真好呵!看来,C语言这块硬骨头我也得啃啃了:)
 
回复: (求教)求以标准通用置标语言辅助人工标注的软件

I do not recommend compiler languages like C/C++ for corpus linguists. I recommend script languages such as Perl. it is much easy to learn and understand for "arts" students in China (where there is a clear cut line between arts and sciences). Also Perl is the most powerful programming language for text processing - Haiyang has noted that 90% of Perl deals with text.

There are a couple of Perl tutorials and books at this sites. Just seach.
 
回复: (求教)求以标准通用置标语言辅助人工标注的软件

在网上找到SP (SGML parser):
http://www.jclark.com/sp/index.htm
可在页面的链接中下载SP:
How to get SP
我下载了,但是不明白如何对TXT文本文件进行SGML标注。
请指教,谢谢!
 
回复: (求教)求以标准通用置标语言辅助人工标注的软件

用了整整三天的时间趴墙头看这个帖子,边看边对照着做,成功啦!
谢谢楼主、Dr.Hong 和Dr.Xiao。
 
[Need Help] searching problems in Paraconc after alignment

:confused:after alignment of bi-texts in the Paraconc, we can immediately begin the search step to see the parallel text. But it seems that only the query word can be shown in highligted color while its translation remains unmarked in the TT. How can I possibly have it highlighted with the query word ? Does it mean I have to align the bi-texts at a word level, if so, how?
 
回复: (求教)求以标准通用置标语言辅助人工标注的软件

To align bilingual texts in the sentence level is not easy, not to mention at the word level. If you take a closer look, you might find that a word may have dozens of translations in another language. So it is almost impossible to do that perfectly. But certainly you may turn to a dictionary which includes the most common translations of the words, say, 中国 for China, 中华人民共和国 for the People's Republic of China, but what about "take" ? Can you imagine how many translations it can be made besides "取" "拿", it could also be a noun under certain context. So, this is really by no means an easy job as far as I am concerned.
 
Back
顶部