BNC is commercially available and it is not advisable to disseminate it in this forum for the sake of copy right. Because the corpus is in itself a large one and has been annotated with a lot of metadata and POS information in XML format, the overall size reaches up to 4.35 gigabytes, which presents another obstacle in uploading and downloading work. What's more, even though you have already known the filenames containing spoken materials, without a specially designed tool, it would still be difficult to extract them and put them into just one folder.
Thank you for your reply! But I''m still wondering how to purchase the xml texts so that I might process a little part of the xml edition. I read some papers which selected some texts ( e.g. 30 demographic texts) from the spoken part of BNC, n can I know how to do it? Looking forward to any reply! Thanks