回复: 请教:如何检索部分语料
且不说“XML就那么可怕吗的”问题,要想处理自己的语料自由度大,就值得花点时间学习一下XML。当然,大多数情况下并不一定要把文本完全XML化,简单地标记一下纯文本文件就可以用Xaira达到检索目的。这里给个例子:
原文:
A: How do you do?
B: Fine, thank you. How do you do?
A: Oh, what's your name?
B: Oh, my name is Wangyan. I think you are a freshman.
A: Yes, I'm glad to meet you. My name is Shengtianyun.
B: Me, too.
使用通用的文本编辑器,如EditPlus等的Find-Replace功能就可以很快把上面的文本简单标记成如下的样子,最好存成UTF-8格式的:
<?xml version="1.0" encoding="utf-8"?>
<TEXT>
<S SPK="A"> How do you do?</S>
<S SPK="B"> Fine, thank you. How do you do?</S>
<S SPK="A"> Oh, what's your name?</S>
<S SPK="B"> Oh, my name is Wangyan. I think you are a freshman.</S>
<S SPK="A"> Yes, I'm glad to meet you. My name is Shengtianyun.</S>
<S SPK="B"> Me, too.</S>
</TEXT>
然后用Xaira来Index,就可以检索了。例如下面就是检索“you”的结果,总共出现5次,A用了2次,B用了3次。