paraconc使用2

本文由 xudekuan2006-08-25 发表於 "多语种语料库" 讨论区

  1. xudekuan

    xudekuan Moderator

    请问如何使用其"heading search"功能,弄了好几次,也没搞成.
    敬请高手指点.
     
  2. xudekuan

    xudekuan Moderator

    敬请高手指点。
    谢谢!
     
  3. xujiajin

    xujiajin 管理员 Staff Member

    回复: paraconc使用2

    10.3 Heading search --- NOT IMPLEMENTED
    Some corpora (such as the British National Corpus) are structured in such a
    way that they are divided into a series of headings and texts. The heading,
    which contains information about a text, is followed by the text itself. The
    HEADING SEARCH enables the user to select which texts are to be searched,
    based on the presence of particular specifications in the heading. Thus if the
    headings contain an annotation about the sex of the authors, then it should
    be possible to search only those texts written by female authors. For
    example, the pattern entered in the heading search might be author=fem.
    To perform a heading search, you first check the HEADINGS/CONTEXTS box
    (towards the bottom of the ADVANCED SEARCH dialogue box) and then
    choose EDIT. In the HEADINGS/CONTEXTS search dialogue box that arises,
    choose EDIT again. A HEADINGS CONFIGURATION dialogue box appears in which
    you can ADD (i) a heading name, (ii) the begin and end tags that delimit the
    heading, and (iii) the tags that indicate the beginning and ending of the text
    itself.
    Once the HEADINGS CONFIGURATION has been specified, the PATTERN to be
    searched for in the heading (e.g., author=fem) can be entered in the
    HEADINGS/CONTEXTS search dialogue box. All the heading information is now
    in place and so it just remains to enter the search query itself.
    This is another kind of context search. It allows us to search for a word (say
    victory) used in texts written by female authors. Once the headings have
    been specified as above, a search can be initiated. The program locates the
    heading and looks for the pattern specified (e.g., author=fem). If the pattern
    is found, then the text following the heading is searched for the string
    specified in the search text box (e.g., victory). If the pattern author=fem is
    not found in the header, then the program ignores the corresponding body
    of text and checks the next header for the presence of author=fem, and so
    on until the entire corpus has been searched.
     
  4. xudekuan

    xudekuan Moderator

    回复: paraconc使用2

    I did follow the directives several times, but could get the results of searching.
    Have you managed to get results?
     
  5. xujiajin

    xujiajin 管理员 Staff Member

    回复: paraconc使用2

    I didn't try. But did you see the words NOT IMPLEMENTED. I guess this function might not be enabled in the current version.
     
  6. AlexanderZ

    AlexanderZ 初级会员

    回复: paraconc使用2

    More questions concerning the use of Paraconc: I was puzzled that Paraconc could not identify my own tagged markers, such <S> and </S>, <P> and </P>. I had changed the Tag setting in the File Menu. But it could not work. Whether the texts that can be processed by Paraconc need to be sententially aligned clean texts? And the alignment at the sentential level needs a lot of hand-editing. Any suggestions for this? According to Xie Jiacheng's article, he had achieved the alighnment at the level of paragraph. His taggers are <P> and </P>. But I can only use the delimited segments. I have tried to add space between the characters and use the <seg> and </seg> markings.
     
  7. xujiajin

    xujiajin 管理员 Staff Member

    回复: paraconc使用2

    Pls paste some of your texts here for diagnosis. And pls also check your character encoding.
    Which version of paraconc did you use, build269 or the demo version?
     
  8. laohong

    laohong 管理员 Staff Member

    回复: paraconc使用2

    You may need read my earlier post regarding this question at:

    http://forum.corpus4u.org/showthread.php?t=2228&highlight=paraconc
     
  9. AlexanderZ

    AlexanderZ 初级会员

    回复: paraconc使用2

    Thanks for Dr. Xu and Dr. Hong. Unfortunately my version is Paraconc-BETA (Version 1.0 Build 233). 2003.5.28 22:30:30. Is it the demo version?

    The sample texts are as follows:

    <CHAPTER21>第 二 十 一 章   财 主 小 姐 引 起 的 争 吵 </CHAPTER21>
    <P> <S>一 个 女 孩 子 有 了 施 瓦 滋 小 姐 一 般 的 能 耐 ,谁 能 够 不 爱 呢 ?奥 斯 本 老 先 生 心 里 有 个 贪 高 好
    胜 的 梦 想 ,全 得 靠 她 才 能 实 现 。</S> <S>他 拿 出 十 二 分 的 热 忱 ,和 颜 悦 色 的 鼓 励 女 儿 们 和 年 轻 女 财 主
    交 朋 友 。</S> <S>他 说 做 父 亲 的 看 见 女 儿 交 了 那 么 合 适 的 朋 友 ,真 从 心 里 喜 欢 出 来 。</S> </P>
    <P> <S>他 对 萝 达 小 姐 说 :“亲 爱 的 小 姐 ,你 一 向 看 惯 伦 敦 西 城 贵 族 人 家 的 势 派 ,他 们 排 场 大 ,
    品 级 高 ,我 们 住 在 勒 塞 尔 广 场 的 人 家 寒 薄 得 很 ,不 能 跟 他 们 比 。</S> <S>我 的 两 个 女 儿 是 粗 人 ,不 过
    不 贪 小 便 宜 ,心 倒 是 好 的 。</S> <S>她 们 对 你 的 交 情 很 深 ,这 是 她 们 的 光 彩 ——嗳 ,她 们 的 光 彩 。</S> <S>我
    自 己 呢 ,也 是 个 直 心 直 肠 子 ,本 本 分 分 的 买 卖 人 。</S> <S>我 人 是 老 实 的 ,令 尊 生 前 商 业 上 的 朋 友 ,
    赫 尔 格 和 白 洛 克 ,也 是 我 的 朋 友 ,我 一 向 很 尊 敬 他 们 ;对 于 我 的 为 人 ,这 两 位 可 以 保 证 的 。</S> <S>
    我 们 家 里 全 是 实 心 眼 儿 ,倒 也 能 够 相 亲 相 爱 ,和 气 过 日 子 ,算 得 上 有 体 统 的 人 家 。</S> <S>你 来 看 看
    就 知 道 了 。</S> <S>我 们 都 是 粗 人 ,吃 的 也 是 粗 茶 淡 饭 ,不 过 倒 是 真 心 的 欢 迎 你 来 ,亲 爱 的 萝 达 小
    姐 ,——请 让 我 叫 你 萝 达 ,因 为 我 满 心 里 真 喜 欢 你 ,真 的 !我 是 直 爽 人 ,老 实 告 诉 你 ,我 喜
    欢 你 。</S> <S>拿 杯 香 槟 来 !赫 格 斯 ,跟 施 瓦 滋 小 姐 斟 杯 香 槟 。”</S> </P>

    <P> <S>Love may be felt for any young lady endowed with such qualities as Miss Swartz possessed; and a great dream of ambition entered into old Mr. Osborne's soul, which she was to realize.</S> <S>He encouraged, with the utmost enthusiasm and friendliness, his daughters' amiable attachment to the young heiress, and protested that it gave him the sincerest pleasure as a father to see the love of his girls so well disposed.</S> </P>
    <P> <S>"You won't find," he would say to Miss Rhoda, "that splendour and rank to which you are accustomed at the West End, my dear Miss, at our humble mansion in Russell Square.</S> <S>My daughters are plain, disinterested girls, but their hearts are in the right place, and they've conceived an attachment for you which does them honour--I say, which does them honour.</S> <S>I'm a plain, simple, humble British merchant--an honest one, as my respected friends Hulker and Bullock will vouch, who were the correspondents of your late lamented father.</S> <S>You'll find us a united, simple, happy, and I think I may say respected, family--a plain table, a plain people, but a warm welcome, my dear Miss Rhoda--Rhoda, let me say, for my heart warms to you, it does really.</S> <S>I'm a frank man, and I like you.</S> <S>A glass of Champagne!</S> <S>Hicks, Champagne to Miss Swartz."</S> </P>
     
  10. AlexanderZ

    AlexanderZ 初级会员

    回复: paraconc使用2

    I forget to paste the chapter title of the English Version. Sorry.

    <CHAPTER21>CHAPTER XXI A Quarrel About an Heiress</CHAPTER21>
     
  11. laohong

    laohong 管理员 Staff Member

    回复: paraconc使用2

    Several problems in your files:

    1. You should not use two or more tags there to confuse the tool, you'd better use one tag as the sentence marker. So, please remove <P> and </P> from your texts. If you do want to leave that info there, put them after <S> and before </S>.

    2. Each of your Chinese sentences and English sentences should be in a seperate line. That is, one sentence one line, starting with <S> and ending with </S>;

    3. The total numbers of Chinese sentences and English sentences should be the same (otherwise it will mismatch in alignment or there will be some sentences left un-aligned). However, including Chapter title, you have 13 sentences in the Chinese text and only 10 in the English text.

    I reformated your two texts here, and it's quite OK to search with ParaConc (no matter which version). You may want to look at the reformated texts attached.
     

    附件文件:

    Last edited: 2006-12-08
  12. AlexanderZ

    AlexanderZ 初级会员

    回复: paraconc使用2

    Thanks, Dr. Hong. But I have tried the two files. After I have loaded them, I could only use the Delimited Segments. If I choose the Align format as Start/Stop Tags, and choose to configure the start and stop tags as S and /S, it reports "Access violation at address 00476 in module 'Conc.exe'. Read of address FFFFFFFF8" two times and the two files cannot be loaded. Sorry to trouble you.
     
  13. laohong

    laohong 管理员 Staff Member

    回复: paraconc使用2

    You have to type in the FULL start and stop tags in the Align Format Start/Stop tags Options area, that is, Start tag: <S>, Stop Tag: </S>. By the way, leave blank for the Attribute text box. It's wrong to only type in S and /S.

    Try with the files I provided in last reply.

    Good Luck!
     
  14. AlexanderZ

    AlexanderZ 初级会员

    回复: paraconc使用2

    Thanks, Dr. Hong. I have tried other versions. It works well with my own tagged files.
     
  15. laohong

    laohong 管理员 Staff Member

    回复: paraconc使用2

    Glad that you've managed to solve your own problems!
     
  16. wangdw

    wangdw 初级会员

    回复: paraconc使用2

    您好!对您的回复极感兴趣.因为我也遇到了类似的问题.

    您能告诉我您用的是哪个版本的PARACONC解决的问题,怎样解决的,相关设置是怎么样的呢?

    我的疑难问题的详细情况写在这里了,一并请教您与laohong,谢谢!
    http://forum.corpus4u.org/showthread.php?p=15989#post15989

    期待您的回复.
     
  17. AlexanderZ

    AlexanderZ 初级会员

    回复: paraconc使用2

    My version is Build 629. The original Build 233 doesn't work well. Build 233 could only process delimited segments, because the default in the search option includes "<", ">", "/" and "\". They will make the progrmamme report mistakes. Just delete them. If you want to keep them, when lauch the programme next time, you can return to the original state by choosing the "Return to default" option.

    I have tried your two texts. It only shows the first two English sentences.

    The procedures are as follows:

    Load two texts

    Align format: Start/stop Tags
    Option: Settings
    Start Tag: <seg>
    Stop Tag: </seg> Press OK twice
    Text Search Window:
    Enter pattern to search for: <mzy>*</mzy>
    Press OK.

    Click the display menu in the toolbar and choose "suppress" the "normal tags" and them you can the two sentences without tags.

    Good luck!

    I think that you need to make sure the default of the Option in the text search pop-up window be changed. Delete "<", ">"and "/" in it.
     
  18. wangdw

    wangdw 初级会员

    回复: paraconc使用2

    谢谢您的热心回复,AlexanderZ.真羡慕您有Build 629版本的PARACONC.我也梦想有这个版本,我把您的回复保存了.等我有了,会践行您的建议.

    对了,请看一下laohong的做法,用演示版竟也做得出来,非常好!好方法大家一起分享!
    http://forum.corpus4u.org/showthread.php?t=2534&page=2

    为您添加了声望,聊表谢意!Hug and kiss you!
     
  19. AlexanderZ

    AlexanderZ 初级会员

    回复: paraconc使用2

    Thanks.
    Of course, the Demo version can do it well. Today, I downloaded the demo version of Paraconc Version 1.0 (Build 270). The two programmes at hand are almost the same, one folder is 1.31 M (Demo Version) and 1.33 M for Build 269, and the two executive files are very similar Build 269 (1350KB), while Build 270(1368KB). I think the displaying capacity may be different, but the rest are the same. Anyway, I haven't got time to try them in various aspects. I am also new to this tool.
    Have seen your post about the ambition to study A Dream of Red Mansion. I do appreciate your determination. But alignment is always the most difficult and painstaking task. Although some programmes can do this job, hand-editing is a must.
    In this website, there is a downloadable file about the manual for Paraconc, but I cannot find it. You can try and find it. If you cannot find the manual but need it, I can send it to the Gmail of corpus4u or to your email-address directly.
     
  20. wangdw

    wangdw 初级会员

    回复: paraconc使用2

    AlexanderZ您好!我已有了PARACONC的使用手册,它的详尽令人感动.
    谢谢您!