xusun575
高级会员
《当代语言学》第10卷2008年第4期329 - 339页, 《红楼梦》中英文平行语料库的创建
刘泽权等(2008:p.334)提到,"在MMAX界面下, 对句子属性标注的同时进行汉英文本的句级对齐。每个汉语句子和对应的英语句子连接起来, 形成一个对应句子集",
"汉语句子标注及对应标记信息:
<markable id = "markable_249" span = "word_3266. . word_3284" idiom_saying = "without" sentence_mode
= " declarative" voice = " active" chinese _ sentence = " comp lex" rhetoric = " without" linktype = " parallel"
language = " chinese" linkclass = " set_102" comp lex_sentence = " coordinate" register = "written" / >
英语句子标注及对应标记信息:
<markable id = "markable_267" span = "word_3481. . word_3499" idiom_ saying = "without" sentence_
mode = " declarative" voice = " active" rhetoric = " without" linktype = " parallel" language = " english" english_sentence = " complex" linkclass = " set_102" register = "written" / >
汉英句子都有起止的字或词的编号(如示例中的“word _3266. . word _3284”和“word _3481. . word_3499”) , 因而可以从存储文本原始数据的XML文档中找到以上两个句子对应的 字或词。如下所示:
<word id = "word_3266" >宝< /word >
<word id = "word_3267" >钗< /word >
<word id = "word_3268" >看< /word >
<word id = "word_3269" >毕< /word >
<word id = "word_3270" > , < /word >
<word id = "word_3271" >又< /word >
<word id = "word_3272" >从< /word >
<word id = "word_3273" >新< /word >
<word id = "word_3274" >翻< /word >
<word id = "word_3275" >过< /word >
<word id = "word_3276" >来< /word >
<word id = "word_3277" > , < /word >
<word id = "word_3278" >正< /word >
<word id = "word_3279" >细< /word >
<word id = "word_3280" >看< /word >
<word id = "word_3281" >口< /word >
<word id = "word_3282" >内< /word >
<word id = "word_3283" >念< /word >
<word id = "word_3284" >道< /word >
<word id = "word_3481" >When < /word >
<word id = "word_3482" >Bao < /word >
<word id = "word_3483" > - < /word >
<word id = "word_3484" > chai < /word >
<word id = "word_3485" > had < /word >
<word id = "word_3486" > looked < /word >
<word id = "word_3487" > at < /word >
<word id = "word_3488" > the < /word >
<word id = "word_3489" > stone < /word >
<word id = "word_3490" > all < /word >
<word id = "word_3491" > over < /word >
<word id = "word_3492" > , < /word >
<word id = "word_3493" > she < /word >
<word id = "word_3494" > turned < /word >
<word id = "word_3495" > back < /word >
<word id = "word_3496" > to < /word >
<word id = "word_3497" > the < /word >
<word id = "word_3498" > inscrip tion < /word >
<word id = "word_3499" > on < /word >"
我的问题是:既然是句层面对齐,而字对齐又无法实现,那么如此详尽的标注又有何必要?先谢谢啦!
刘泽权等(2008:p.334)提到,"在MMAX界面下, 对句子属性标注的同时进行汉英文本的句级对齐。每个汉语句子和对应的英语句子连接起来, 形成一个对应句子集",
"汉语句子标注及对应标记信息:
<markable id = "markable_249" span = "word_3266. . word_3284" idiom_saying = "without" sentence_mode
= " declarative" voice = " active" chinese _ sentence = " comp lex" rhetoric = " without" linktype = " parallel"
language = " chinese" linkclass = " set_102" comp lex_sentence = " coordinate" register = "written" / >
英语句子标注及对应标记信息:
<markable id = "markable_267" span = "word_3481. . word_3499" idiom_ saying = "without" sentence_
mode = " declarative" voice = " active" rhetoric = " without" linktype = " parallel" language = " english" english_sentence = " complex" linkclass = " set_102" register = "written" / >
汉英句子都有起止的字或词的编号(如示例中的“word _3266. . word _3284”和“word _3481. . word_3499”) , 因而可以从存储文本原始数据的XML文档中找到以上两个句子对应的 字或词。如下所示:
<word id = "word_3266" >宝< /word >
<word id = "word_3267" >钗< /word >
<word id = "word_3268" >看< /word >
<word id = "word_3269" >毕< /word >
<word id = "word_3270" > , < /word >
<word id = "word_3271" >又< /word >
<word id = "word_3272" >从< /word >
<word id = "word_3273" >新< /word >
<word id = "word_3274" >翻< /word >
<word id = "word_3275" >过< /word >
<word id = "word_3276" >来< /word >
<word id = "word_3277" > , < /word >
<word id = "word_3278" >正< /word >
<word id = "word_3279" >细< /word >
<word id = "word_3280" >看< /word >
<word id = "word_3281" >口< /word >
<word id = "word_3282" >内< /word >
<word id = "word_3283" >念< /word >
<word id = "word_3284" >道< /word >
<word id = "word_3481" >When < /word >
<word id = "word_3482" >Bao < /word >
<word id = "word_3483" > - < /word >
<word id = "word_3484" > chai < /word >
<word id = "word_3485" > had < /word >
<word id = "word_3486" > looked < /word >
<word id = "word_3487" > at < /word >
<word id = "word_3488" > the < /word >
<word id = "word_3489" > stone < /word >
<word id = "word_3490" > all < /word >
<word id = "word_3491" > over < /word >
<word id = "word_3492" > , < /word >
<word id = "word_3493" > she < /word >
<word id = "word_3494" > turned < /word >
<word id = "word_3495" > back < /word >
<word id = "word_3496" > to < /word >
<word id = "word_3497" > the < /word >
<word id = "word_3498" > inscrip tion < /word >
<word id = "word_3499" > on < /word >"
我的问题是:既然是句层面对齐,而字对齐又无法实现,那么如此详尽的标注又有何必要?先谢谢啦!
附件
Last edited: