英美文学语料素材集下载

laohong

管理员
Staff member
Good stuff for you to build your own corpus of English literature! Here are a number of things you can download to make your exploration into great literature more fruitful.

Compressed texts
You can download the complete texts of all the items in the Catalogue:

all American texts (20.9 MB compressed)
http://homepage.mac.com/ericleasemorgan/american.tar.gz

all English texts (20.6 MB compressed)
http://homepage.mac.com/ericleasemorgan/english.tar.gz

all philosophy texts (10.6 MB compressed)
http://homepage.mac.com/ericleasemorgan/philosophy.tar.gz

These archived texts are originally from Eric Lease Morgan's Alex Catalogue of Electronic Texts.
 
这些文件在解压过程中,怎么有些html文件解压失败呢?其他人有这样的问题么?不过还是要谢谢laohong的推荐!
 
回复:英美文学语料素材集下载

以下是引用清风出袖在2006-4-4 16:49:33 的发言:
这些文件在解压过程中,怎么有些html文件解压失败呢?其他人有这样的问题么?不过还是要谢谢laohong的推荐!




有些html文件解压失败.
 
回复:英美文学语料素材集下载

以下是引用 armstrong2006-4-4 17:33:33 的发言:
有些html文件解压失败.

Hi, no problem found here. There are actually only two HTML files in the three zipped files. Both are in English folder, they are:

1. "shakespeare-second-52.html" in the folder "1500-1599";
2. "paine-agrarian-262.html" in the folder "1700-1799".

If you still got problem to unzip them, here are the two problem files mentioned above:http://forum.corpus4u.org/upload/forum/2006040423423674.zip
 
Hi, just found that the plain text version of "shakespeare-second-52.html" is in the folder. So you don't need the html file any more. For the ease of processing with your corpus tools (WordSmith, Concordance, AntConc, etc.), it's good to resave the "paine-agrarian-262.html" as plain text file.
 
Back
顶部