I used WECCL 1.0 in a project and found the same problem. Here is a paragraph from my manuscript for your reference:
With a script written to verify the integrity of the files in the corpus, we found 124 of the 3,678 files unusable. These include 1 file with no header, 1 with two non-identical headers, 4 with only one sentence, 17 empty files, and 101 that duplicate other files. This leaves us 3,554 files to work with. The corpus has a total of 1,119,510 words, and the length of the individual essays ranges from 89 to 892 words (mean = 315, standard deviation = 87).