Representativeness is a defining feature of corpus. It is this feature that distinguishes a corpus from a text archive. This paper reviews literature on corpus representativeness.
The beginning part of the paper "sampling situated discourse" is an overview of spoken corpora's, SEC, LLC, BNC spoken conponent etc, sampling strategies.
Grönqvist, L. 2004. ‘Literature review of representativeness of linguistic resources’. Paper presented at the 7th Annual CLUK Research Colloquium. University of Birmingham. 6-7 January 2004.