谭松波:中文文本分类语料库

回复:谭松波:中文文本分类语料库

Used notepad and Access to open the files, still didn't see anything other than a bunch of text names and numbers.
 
You will not be able to open those very large files with an MAT file extension using ACCESS (shouldn't such extensions be reserved?). They are POORLY named with this extension as MAT files - ACCESS Table shortcut file type - but in fact they are not. You are not even allowed to open them with NotePad or WordPad becuase they are so named.

Anyway I have opened all of the files downloaded at that site. They are not the files for the corpus itself, but a wordlist, files indicating text categories, and files for matrix data. The only thing useful is the wordlist, which you can get here:

http://www.corpus4u.org/upload/forum/2005082607495230.rar

After downloading this wordlist, you can now remove that rubbish taking up your disk space.
 
Back
顶部