Indeed, much extra-linguistic information is lost in transcription. That's why the transcripts of some existing audio/video corpora are time stamped and linked to the digital audio/video chunks (e.g. the LDC Santa Barbara corpus, and the SCOTS corpus http://www.scottishcorpus.ac.uk/).
However, a spoken corpus composed of transcripts properly marked up with some most common extra-lingusitic features can be used for many purposes in lingusitic inquiry.