回复: 我在建一个小型英文语料库,不知什么建库软件比较好
刚刚突然想到,WordSmith v5.0 有内建搭配搜寻引擎收集语料的功能: WebGetter.
Mick Scott老师真的很厉害.
以下为 WordSmith 内 WebGetter 的 部份 manual
................................................................
WebGetter visits the search engine you specify and downloads the first 1000 sources or so. Basically it uses the search engine just as you do yourself, getting a list of useful references. Then it sends out a robot to visit each web address and download the web page in each case (not from the search engine's cache but from the original web-site). Quite a few robots may be out there searching for you at once -- the advantage of this is that one slow download doesn't hold all the others up.
After downloading a web page, that WebGetter robot checks it meets your requirements (in Settings) and cleans up the resulting text. If the page is big enough, a file with a name very similar to the web address will be saved to your hard disk.
When it runs out of references, WebGetter re-visits the search engine and gets some more.