关于LOCNESS的问题

最近拿到了LOCNESS语料库。可是有些地方不明白,望各位老师指导:
1. LOCNESS 做了标注吗?我的文本中有一些符号不很明白,比如<quote>是什么意思?
2. LOCNESS 中的字符数是怎么统计出来的呢?我用word统计好像跟官方网站的最接近,但有些还是有出入
3. British Pupils' A level essays 是大学生还是中学生写得作文?
万分期待各位老师的解答,因为论文要做这方面的研究,挺急的。
 
回复: 关于LOCNESS的问题

1. <quote> marks up the direct quotations, i.e. th parts that are not written by the student
2. your Word count may have included markup codes such as <quote>
3. British A-Level pupils are senior high school leavers.
 
回复: 关于LOCNESS的问题

I am so excited to see your reply, Dr. Xiao. I've got many questions to ask you about LOCNESS. There are many tags in it I don't understand like <*>,<?>,ect. There is even a tag reading like "...bad photocopy".

Was Loncness built through scanning or I just got a contaminated version of it?
I have checked my WS4 wordlist with yours, but they don't seem to match .
 
WordSmith search ignores everything in brackets by default, in other words, <*>,<?> won't affect your search results. They are simply not counted in any search.
 
回复: 关于LOCNESS的问题

Thank you for your help, Dr. Xu. It's a great pleasure to talk with you.
I am worried about whether the results of my research are reliable because I just cannot find the exactly the same word counts as stated in Granger's homepage. Also it makes me upset to find such kinds of sentences in the file:
They have brought him up in <quote> and in the awareness of his guilt. The language of the Catholic church is mocked when one man says, <quote>.
Surrogate Motherhood is when a<*>.
Really not sure about the validity of this Locness. Should I switch to other copora like BNC's University Essays for my study?
 
Yes. The word count IS different from the one one the website. This might be caused by the different concordancers used and the different token definitions applied. I am almost sure that your version of LOCNESS is the very one.

I would suggest that you stick to your own word counting, which you got from WordSmith WordList.
 
回复: 关于LOCNESS的问题

Thank you for all your help, Dr. Xiao and Dr. You are so patient to help a beginner. There is much for me to learn in corpus linguistics, even the word counts definition will take lots of time to understand. I'll follow your suggestions, Dr. Xu and keep working on it.
 
Back
顶部