xaira被卡住了N次,无限头痛中!

Hi_Jane

会员
[FONT=仿宋体][FONT=宋体]坛子里有关[/FONT]xaira[FONT=宋体]的贴子及贴子里的相关链接,我都看过了。但在具体操作中,[/FONT]xaira[FONT=宋体]被卡住了[/FONT]N[FONT=宋体]次,无限头痛中![/FONT][/FONT]

[FONT=仿宋体][FONT=宋体]为避免麻烦,先用[/FONT]NJStar[FONT=宋体]把试验文本[/FONT]trying.txt[FONT=宋体]的代码转换成[/FONT]Unicode UTF8[FONT=宋体]。[/FONT][/FONT]

[FONT=仿宋体]1[FONT=宋体]、打开[/FONT]Xaira tools[FONT=宋体]。[/FONT]File[FONT=宋体]选[/FONT]New[FONT=宋体]。[/FONT][/FONT]

[FONT=仿宋体]2[FONT=宋体]、[/FONT]File[FONT=宋体]再选[/FONT]Index wizard[FONT=宋体]。跟着提示,一步步做下去。[/FONT][/FONT]

[FONT=仿宋体]3[FONT=宋体]、[/FONT]Corpus name: corpusname[/FONT]
[FONT=仿宋体]Title statement: titlestatement[/FONT]
[FONT=宋体][FONT=仿宋体]这样起名、做注释,很容易找到渊源,呵呵!下一步。[/FONT][/FONT]

[FONT=仿宋体]4[FONT=宋体]、用[/FONT]Browse[FONT=宋体]给语料安个家:[/FONT]E:\corpus[FONT=宋体]。下一步。[/FONT][/FONT]

[FONT=仿宋体]5[FONT=宋体]、找到用做语料的文本,我一开始转[/FONT]Unicode UTF8[FONT=宋体]的[/FONT]trying.txt[FONT=宋体],文本一句一行,全文如下:[/FONT][/FONT]
[FONT=仿宋体]The Old Man and the Sea [/FONT]
[FONT=仿宋体]By Ernest Hemingway [/FONT]
[FONT=仿宋体]He was an old man who fished alone in a skiff in the Gulf Stream and he had gone eighty-four days now without taking a fish. [/FONT]
[FONT=仿宋体]In the first forty days a boy had been with him. But after forty days without a fish the boy's parents had told him that the old man was now definitely and finally salao, which is the worst form of unlucky and the boy had gone at their orders in another boat which caught three good fish the first week. [/FONT]
[FONT=仿宋体]It made the boy sad to see the old man come in each day with his skiff empty and he always went down to help him carry either the coiled lines or the gaff and harpoon and the sail that was furled around the mast. [/FONT]
[FONT=仿宋体]The sail was patched with flour sacks and, furled; it looked like the flag of permanent defeat. [/FONT]
[FONT=仿宋体]The old man was thin and gaunt with deep wrinkles in the back of his neck. [/FONT]
[FONT=仿宋体]The brown blotches of the benevolent skin cancer the sun brings from its reflection on the tropic sea were on his cheeks. [/FONT]
[FONT=仿宋体]The blotches ran well down the sides of his face and his hands had the deep-creased scars from handling heavy fish on the cords. [/FONT]
[FONT=仿宋体]But none of these scars were fresh. [/FONT]
[FONT=仿宋体]They were as old as erosions in a fishless desert. [/FONT]
[FONT=仿宋体]Everything about him was old except his eyes and they were the same color as thesea and were cheerful and undefeated.[/FONT]
[FONT=宋体][FONT=仿宋体]下一步。[/FONT][/FONT]

[FONT=仿宋体]5[FONT=宋体]、[/FONT]trying.txt[FONT=宋体]是纯文本,当然选[/FONT]Plain text[FONT=宋体]啦!下一步。[/FONT][/FONT]

[FONT=仿宋体]6[FONT=宋体]、再次确认我要的就是[/FONT]trying.txt.[/FONT]

[FONT=仿宋体]7[FONT=宋体]、该[/FONT]coding[FONT=宋体]啦!默认是[/FONT]GBK[FONT=宋体],我选[/FONT]UTF-8[FONT=宋体]。下一步。[/FONT][/FONT]

[FONT=仿宋体]8[FONT=宋体]、[/FONT]Press Go to parse them. Go![/FONT]

[FONT=仿宋体]9[FONT=宋体]、[/FONT]All texts are well-formed. Press Next to start building the corpus header. [FONT=宋体]够顺利的呀!一做到这里我都难耐激动!可惜,好景不长[/FONT]555[FONT=宋体]!下一步。[/FONT][/FONT]

[FONT=仿宋体]10[FONT=宋体]、[/FONT]language[FONT=宋体]选[/FONT]en[FONT=宋体],英文。下一步。[/FONT][/FONT]

[FONT=仿宋体]11[FONT=宋体]、[/FONT]index![FONT=宋体]问题出来啦!每次都是这样[/FONT]555555[/FONT]
(请看第一个附图)


[FONT=仿宋体]12[FONT=宋体]、让我选“确定”还是“取消”?确定吧。取消也选过,更没辙。[/FONT][/FONT]

[FONT=仿宋体]13[FONT=宋体]、告诉我这个:[/FONT][/FONT]
[FONT=宋体](请看第二个附图)[/FONT]


[FONT=仿宋体]14[FONT=宋体]、好吧,完成。[/FONT][/FONT]

[FONT=仿宋体]15[FONT=宋体]、不甘心,我[/FONT]Run Indexer[FONT=宋体],仍是“第一附图”出现的应用程序错误。[/FONT][/FONT]

[FONT=宋体][FONT=仿宋体]请问:我哪一步做错了呢?[/FONT][/FONT]

[FONT=宋体]谢谢!!![/FONT]
 

附件

  • 0.jpg
    0.jpg
    32.2 KB · 浏览: 6
  • 1.JPG
    1.JPG
    27.5 KB · 浏览: 3
回复: xaira被卡住了N次,无限头痛中!

Xaira直接索引纯文本文件会有这个问题,我在以前的帖子里提到过解决办法。建议即使是纯文本索引,也需要在文本上加上简单的XML heading,存成XML,然后去建索引。这里以你的文本为例来说明,把等号内的文字复制下来,存成jane.xml,选UTF-8格式。

=======================================

<?xml version="1.0" encoding="utf-8"?>
<Doc>
<Header>
<Language>Chinese</Language>
<Author>Ernest Hemingway</Author>
<Sex>Male</Sex>
<FileName>The Old Man and the Sea</FileName>
</Header>
<Text>
<p n="0">The Old Man and the Sea </p>
<p n="1">By Ernest Hemingway </p>
<p n="2">He was an old man who fished alone in a skiff in the Gulf Stream and he had gone eighty-four days now without taking a fish. </p>
<p n="3">In the first forty days a boy had been with him. But after forty days without a fish the boy's parents had told him that the old man was now definitely and finally salao, which is the worst form of unlucky and the boy had gone at their orders in another boat which caught three good fish the first week. </p>
<p n="4">It made the boy sad to see the old man come in each day with his skiff empty and he always went down to help him carry either the coiled lines or the gaff and harpoon and the sail that was furled around the mast. </p>
<p n="5">The sail was patched with flour sacks and, furled; it looked like the flag of permanent defeat. </p>
<p n="6">The old man was thin and gaunt with deep wrinkles in the back of his neck. </p>
<p n="7">The brown blotches of the benevolent skin cancer the sun brings from its reflection on the tropic sea were on his cheeks. </p>
<p n="8">The blotches ran well down the sides of his face and his hands had the deep-creased scars from handling heavy fish on the cords. </p>
<p n="9">But none of these scars were fresh. </p>
<p n="10">They were as old as erosions in a fishless desert. </p>
<p n="11">Everything about him was old except his eyes and they were the same color as thesea and were cheerful and undefeated. </p>
</Text>

</Doc>
=======================================

在Header部分,你可以加入更多的描述该文本的 meta data。存好后就可以建索引了。
Index Wizard
Corpus Name: Jane
Titlestatement: Jane
Corpus root: 选择一个文件夹;
Location of texts: 选文本所在文件夹;
Markup:选XML;
Next, Next, Next, Go;
Text labelling: 选择just use file names;
Unit labelling: 选择p, n;
Tokenization: 选择using unicode rules to tokenize text;
Next (don't click Build Now);
Index;
看看这次能成功吗?

Good Luck!
 
回复: xaira被卡住了N次,无限头痛中!

真的成功了耶!飞一个! :) :) :)
谢谢师父的贴心帮助!
做个计划,一一学习师父的帖子来。
 
回复: xaira被卡住了N次,无限头痛中!

大家在那里弄到的库文件啊。这个软件我也下了。而且我有光盘版本的bnc ,但无发与这个最新的软件建立连接。谁知道怎么办?非常着急!
 
Back
顶部