Use regular expression to detag the XML files as plain texts or keep pos tags only if you need them. BTW, WST does have text converter that works with BNC XML files to convert them to texts.
It's not designed to be human readable and you need to use with its Java API, which you can find here (https://verbs.colorado.edu/verb-index/inspector/). After compilation, you can fire up a command by following the examples. I used 'java vn.Inspector ../new_vn -i -Va -Oown-100.xml ' to...
If you have downloaded the XML BNC version, you can try to extract the genre or text category information from the header with XML parser such as Beautifulsoup or lxml in Python, or save yourself some trouble and go to http://bncweb.lancs.ac.uk/bncwebSignup/user/login.php, they have the...
The easiest way is to use some POS tagger or segmenter that allows you to use your own user dictionary, usually one term per line, such as Jieba or Zpar, but you need to have access to a Linux machine.
If you don't know how to write XML file with a scripting/programming language, then you can save your annotated file as a csv file and have it converted with either an online or an offline converter.
The University Research Priority Program (URPP) Language and Space at the University of Zurich focuses on communication as linked to physical, social and perceptual space, through research on dialect/language contact as well as on interactional situations. The URPP approaches relations between...
回复: China Scholarship Council - University of Leeds PhD in corpus linguistics
Eric is a very nice fellow and he is a major member of the NLP group at School of Computing. Anyone interested should try!