[灌水] download of LiteCorpus

Before downloading, could you tell us the sampling strategy of the corpus? In other words, what are included in the corpus? Is it a general-purpose corpus at all?

Thank you so much for sharing, however.
回复:[灌水] download of LiteCorpus

Thank you very much for your sharing it with us, however, I am as anxious to know more about your corpus as xujiajin.
回复:[灌水] download of LiteCorpus

Thank you very much for your great contribution! Could you provide us with a detailed description of it?
LiteCorpus is just a
random, stupid collection of texts,
usually, works of great writers
during the past 200 hundred years
(which sounds like the sampling strategy).
I have done some preliminary searches, the results incredibly large, but is it a corpus in the sense of modern corpus linguistics? What else can we do with it, besides for English teaching?

[本贴已被 作者 于 2005年12月11日 21时42分53秒 编辑过]
It is something without any head info,
and the tagging accuracy is not as high
as the commercially published ones,
I think. (Well, actually there're many
tagging mistakes in BNC, and sometimes
with a very high rate, like what I said about
"zips": 5 out of 14.)
You can use it as you do with BNC, doing
IE, etc.
Thank u ,ineedgerf. You have done an amazing job by altruistically sharing with us your tagged corpora, Colen, litecorpus, both slim and tagged. BTW, I am curious about the way you tagged the data. Probably I should have searched corpus4u for the info about it, but I still would like to ask you for the help:how can we get the TOSCA/LOB you strongly recommended? MTIA
Dear ineedgerf, we would like you to do some favor to us in two aspects:
1. Could you kindly share your favorite tagger(tosca/lob) with us, as we could not get access to the suggested website to have it downloaded; and
2. Could you pls provide us with a name list of the text books that you use for Colen. MTIA
THE server where TOSCA/LOB lived cannot be accessed, maybe.
This might mean that the tagger will not no longer be free. If so
a free offer of it is not allowed.

COLEN includes:
College English (an old version published by WJ; books 1-4)
College English (a completely different version by WJ; part of the series)
New College English (by FLTRP; books 1-4)
21 Century College English (books 1-4)

All these can be identified by their names.