回复: 请问:如何可以做到把文本形式的语料库可以在线检索?
一个Perl做的,还是个雏形,一次能处理一个文件,按cgi方式执行即可,测试环境用的是apache 2+ Linux
#!/usr/bin/perl -w
open(FILE,"browncorpus.txt") or die("cannot open the file");
$/="";
$target='\b(the)\b';
$radius=50;
$width=2*$radius;
print "Content-Type: text/html\n\n";
print "<pre>";
while(<FILE>){
chomp;
s/\n/ /g;
while($_=~/$target/gi){
$match=$1;
$pos=pos($_);
$start=$pos-$radius-length($match);
if ($start<0){
$extract=substr($_,0,$radius) . "<font color='#cc0000'>" .$match . "</font>".substr($_,$pos,$radius);
$extract=$num++ . "\t" . (" " x - $start) . $extract;
} else {
$extract=$num++ . "\t" . substr($_,$start,$radius) . "<font color='#cc0000'>" .$match . "</font>".substr($_,$pos,$radius);
}
print "$extract\n";
}
}
print "</pre>";
#improved by iCasino in HNUC from Roger Bilisoly,2008. Practical Text Mining with Perl . Wiley.