回复: 急问:许家金博士
yes. part of speech tags were stripped of before making the cluster lists.
ARTDEF is the tag for "the", i guess.
ICE-GB is richly annotated, thus it took us much effort to get the clean texts out of it.
Apart from that, we replaced different forms of filled pauses with FP to keep to the minimum the transcription consistency in terms of non-lexical items. I can't remember whether we made any other tidy-up changes to the texts.