Westbury Lab: a USENET Corpus

回复: Westbury Lab: a USENET Corpus

thanks, that's huge! It would take weeks to download them all.
 
Last edited:
回复: Westbury Lab: a USENET Corpus

I unzipped one file and found its size is too big to open, any idea?
 
回复: Westbury Lab: a USENET Corpus

有可能有的朋友电脑硬盘,至少是存储的分区是fat32的格式吧,那样貌似是不行的。ntfs是可以的。
 
回复: Westbury Lab: a USENET Corpus

用迅雷上网挂了一晚上(13个小时),下了10M:mad: ArthurW和Joe你们怎么能下得了那么多?怎么做到的?
 
回复: Westbury Lab: a USENET Corpus

我是用BT下载的,只下了2009,其他的以后再说:(

我也没有全部download下来,硬盘吃紧。:p

最新的苹果机硬盘已经做到TB级了,看来得下血本咯。
 
回复: Westbury Lab: a USENET Corpus

谢谢提示,又多下了一点。
但是各位能解读它的格式吗?好像只有---END.OF.DOCUMENT--- 作为分割符,其余的好像一团糟,都不知如何进行结构化存储了。
代码:
Steve Urbach < <EMAILADDRESS> >, the qualified charlatan
who assists women in child birth, purred:  
> On 13 Jan 2007 21:00:13 -0800, "Dr  <EMAILADDRESS> $"
> < <EMAILADDRESS> > wrote:
>
>>
>> Virtualmatter wrote:
>>> Hi, I know this isn't WinXp issue. However, Somedays (Not
>>> everyday). I got to hit the power button on my computer and it
>>> won't turn on. In order to fix the issue, I have to  physically
>>> unplug it, flip the switch off and then on (I hear a "Zzz" sound,
>>> probably the last of the capasitors losing their charge). Then I
>>> plug it back in, and everything is fine. I'm not sure what the
>>> issue is. So I am hoping some of you could shed some light on the
>>> issue.
>>> Thanks in Advanced.
>>>
>>> ~Ty
>>
>> It sounds like you may have a faulty ATX power supply unit. You could
>> always try replacing it, and if it doesn't resolve the issue then at
>> least you have a spare; or you could sell the old one on ebay if it's
>> in good condition.
>
> Do you have a laser printer that uses LPT1 (2 or 3)?
> Disconnect the LPT cable and see if startup problems go away.
> (had this one happen with *multiple* NCR  S20 server boards. Flawed
> buffering design allows HP and other printer ready signals to leak
> onto the board power controls).

You need Occam's Razor shoved fair up your fucking arse, you duff cunt. 


-- 
alt.usenet.kooks - Pierre Salinger Memorial Hook, Line & Sinker:
September 2005 and April 2006

"K-Man's particular genius, however, lies not merely in his humour,
but his ability to make posters who had previously seemed reasonably
well-balanced turn into foaming, frothing, death threat-uttering
maniacs" - Snarky, Demon Lord of Confusion

"remember that th [sic] head toecutter has quarter million confirmed
kadaitcha man kills ok" - the land surfer

Fuckwitted alt.atheism atheist to Kadaitcha Man:
"Imagine if I were to suggest "I have a prehensile tail". You
would, naturally, ask for evidence."

Kadaitcha Man in reply to fuckwitted alt.atheism atheist:
"Not at all. I would unquestionably accept your admission to being
a monkey."

Thou lackbeard. You talk greasily, your lips grow foul.

---END.OF.DOCUMENT---


我在Linux下用tar解开单独的文件,用Vim打开(1G文件)没有问题,但是比较耗时间。而且,原来每个包压缩率比较大,113M解压后变成了907M!看来要对这个东西进行加工没有耐心和硬盘空间是别指望了。
 
回复: Westbury Lab: a USENET Corpus

是啊,确实是好资源,但就是对硬件要求高了些。
 
回复: Westbury Lab: a USENET Corpus

谢谢提示,又多下了一点。
但是各位能解读它的格式吗?好像只有---END.OF.DOCUMENT--- 作为分割符,其余的好像一团糟,都不知如何进行结构化存储了。
[snip]
我在Linux下用tar解开单独的文件,用Vim打开(1G文件)没有问题,但是比较耗时间。而且,原来每个包压缩率比较大,113M解压后变成了907M!看来要对这个东西进行加工没有耐心和硬盘空间是别指望了。
既然是在linux下,不妨试一下split命令,切分为小的片段,加工就方便多了。
 
Back
顶部