ICE-GB:Int"l Corpus of Eng:Great Britain component

xujiajin

管理员
Staff member
Int'l Corpus of Eng: Great Britain component
ftp://*************************
是别人放到网上的。不是我们随意传播。希望大家仅做学习之用。


**********
为了提倡对知识产权的保护,我们将论坛中涉及到的相关链接隐去。希望大家支持。对于之前我们工作中不力之处在此特别致歉。


[本贴已被 作者 于 2005年09月16日 23时46分06秒 编辑过]
 
奇怪看得见就是下载不了,是他们设了只读?不过我们倒是从上面看到了不少相关的语料库和软件。比如MicroConc, WordPilot, JDEST,
2005081220294889.jpg
 
回复:ICE-GB

i know some of them are very useful, i found this ftp 1 years ago..., such as CastExe are developed it by shanghai jiaotong university..the ftp also inclueds a lof of resources concerning foreign language teaching and learning
 
some must-download:
MONOCONCPRO V2.0
WORDPILOT (I think this one is a nice and simple DDL tool)
WORDSMITH V3 (IF YOU DON'T HAVE IT)
And the best part of it is the data of ICE-GB: first go to the official site of ICE-GB to download a sampler and refill the data folder with what you have got from the aforementioned FTP.

I FOUND THE ADDRESS OF THIS FTP LONG TIME AGO, BUT I PREFERD NOT TO BROADCAST IT. HELL KNOWS IF THE ADMIN. OF THE FTP IS THICKHEADED, WHEN HE/SHE IS NOT SUPPOSED TO LEAVE THE RESOURCES TOTALLY ACCESSIBLE.
 
回复:ICE-GB

以下是引用 dzhigner2005-8-14 10:35:21 的发言:
some must-download:
And the best part of it is the data of ICE-GB: first go to the official site of ICE-GB to download a sampler and refill the data folder with what you have got from the aforementioned FTP.

could you pls tell us the name of the sampler and the exact website of ICE_GB? I do need your help. many thanks in advance.
 
回复:ICE-GB

以下是引用 xujiajin2005-8-12 17:42:13 的发言:
ftp://*************************
是别人放到网上的。不是我们随意传播。希望大家仅做学习之用。


**********
为了提倡对知识产权的保护,我们将论坛中涉及到的相关链接隐去。希望大家支持。对于之前我们工作中不力之处在此特别致歉。

[本贴已被 作者 于 2005年08月15日 17时58分42秒 编辑过]
谢谢啦!这些资料确实很宝贵,但无法下载,或许是主人的保护得力吧!
 
有些网友说可以下载,有些也下载不了。不知是怎么回事。我这里也下不了。
 
回复:ICE-GB

以下是引用 xujiajin2005-8-15 20:31:10 的发言:
有些网友说可以下载,有些也下载不了。不知是怎么回事。我这里也下不了。
该下载的我都下了,不该下的就留着了,呵呵!
 
回复:ICE-GB

以下是引用 dzhigner2005-8-14 10:35:21 的发言:
some must-download:
MONOCONCPRO V2.0
WORDPILOT (I think this one is a nice and simple DDL tool)
WORDSMITH V3 (IF YOU DON'T HAVE IT)
And the best part of it is the data of ICE-GB: first go to the official site of ICE-GB to download a sampler and refill the data folder with what you have got from the aforementioned FTP.
I have done what you suggested for ICE-GB and got a full set of 500 texts, but when it comes to fragment searching, say, for "play", the same thing displayed with only three hits produced instead of more than 300 as indicated by "Getting Started". How can I get a full function of ICE-GB? Many thanks.
 
回复:ICE-GB

以下是引用 xiaoz2005-8-16 22:14:39 的发言:
The additional files are not indexed....
but i have full subfolder of data and index each containing 500 files....
i guess i should revise the ini, but i don't know . pls have a look at the one below. many thanks.

ICECUPS3.ini
[def]
def: corpus.def

[paths]
corp: data
indx: index
ldb: lexicon
node: nodal
mark: markup
var: vars
head: text
snd: sound
sndv: d:\

data: ..\icedata
filt: ..\icedata
save: c:\output

help: ..\icecup3.hlp
gets: ..\iceget.hlp
 
I defined the ini file as follows:
[paths]
corp: c:\ice\ice-gb\data
indx: c:\ice\ice-gb\index
lex: c:\ice\ice-gb\lexical
node: c:\ice\ice-gb\ nodal [NOTE THAT THERE IS NO SPACE BEFORE NODAL]
mark: c:\ice\ice-gb\markup
vars: c:\ice\ice-gb\vars
err: c:\ice\ice-gb\corpus\correct
head: c:\ice\ice-gb\text

data: c:\ice\icecup3\icedata
filt: c:\ice\icecup3\icedata
save: c:\ice\output

help: c:\ice\icecup3\icecup3.hlp
gets: c:\ice\icecup3\iceget.hlp

save: c:\temp
 
Also check the files in following folders:
DATA: 500 corpus files
INDEX: 500 index files
LEXICAL: three files (ICE-GB.IDX 135 KB; ICE-GB.SID 948 KB; ICE-GB.SSI 3658 KB)
MARKUP: three files (ICE-GB.IDX 12 KB; ICE-GB.SID 10 KB; ICE-GB.SSI 639 KB)
NODAL: 7 files (ice-gb2.idx 42 kb; ice-gb.sid 3668 kb; ice-gb.ssi 3890 kb; ice-gb.sso 13160 kb; ice-gb.idx 30 kb; ice-gb.sid 277 kb; ice-gb.ssi 3790 kb)
VARS: 8 files
TEXT: the most important parameter files in this folder:
ICE-GB.txt 391 kb (description of 500 files)
TEXT.txt, 13 kb, giving 200 written files?
STEXT.txt, 17 kb, giving 300 spoken files
...

Just downloading 500 data files will not help. These data files must be indexed properly for use with ICE-UP.
 
回复:ICE-GB

以下是引用 xiaoz2005-8-17 8:41:02 的发言:
Also check the files in following folders:
DATA: 500 corpus files
INDEX: 500 index files
LEXICAL: three files (ICE-GB.IDX 135 KB; ICE-GB.SID 948 KB; ICE-GB.SSI 3658 KB)
MARKUP: three files (ICE-GB.IDX 12 KB; ICE-GB.SID 10 KB; ICE-GB.SSI 639 KB)
NODAL: 7 files (ice-gb2.idx 42 kb; ice-gb.sid 3668 kb; ice-gb.ssi 3890 kb; ice-gb.sso 13160 kb; ice-gb.idx 30 kb; ice-gb.sid 277 kb; ice-gb.ssi 3790 kb)
VARS: 8 files
TEXT: the most important parameter files in this folder:
ICE-GB.txt 391 kb (description of 500 files)
TEXT.txt, 13 kb, giving 200 written files?
STEXT.txt, 17 kb, giving 300 spoken files
...

Just downloading 500 data files will not help. These data files must be indexed properly for use with ICE-UP.

I got all the above-mentioned files and ,too, redefined the *.ini. You are right. The only problem is with index which is indicated in the *.ini in the 4th line by by "ldb: lexicon" . It is not included in your def. What is it?


[def]
def: corpus.def

[paths]
corp: c:\beta\ice-gb-s\data
indx: c:\beta\ice-gb-s\index
ldb: lexicon
lex: c:\beta\ice-gb-s\lexical
node: c:\beta\ice-gb-s\nodal
mark: c:\beta\ice-gb-s\markup
vars: c:\beta\ice-gb-s\vars
err: c:\beta\ice-gb-s\corpus\correct
head: c:\beta\ice-gb-s\text
snd: c:\beta\ice-gb-s\sound
sndv: c:\
data: c:\beta\icecup3\icedata
filt: c:\beta\icecup3\icedata
help: c:\beta\icecup3.hlp
gets: c:\beta\iceget.hlp
save: c:\temp
 
回复:ICE-GB

This should be a solution if you have downloaded the whole ICE directory - and define paths in the ini file correctly.


以下是引用 dzhigner2005-8-17 10:47:15 的发言:
Not just the "Data" folder, but replace everything in the folder initially named "ICE-GB-S"
 
Back
顶部