X
Reaction score
11

个人近况 Latest activity 已发表的讯息 关於

  • 可以把检索结果拷贝到Excel,再用Rand()功能生成随机数来排序,然后截取所需的行数。
    肖博士您好!我有一个问题向您请教:北大现代汉语语料库在线检索系统检出的检索行不是随机排列的,有什么方法可以随机抽取给定数量的检索行?谢谢
    There are many existing Chinese corpora (see the relevant section in this forum) which can be used directly. If you build your won corpus, you will need to segment the Chinese text into words using tools such as ICTCLAS. Some concordancers also require the Chinese data to be converted into Unicode.
    WordSmith and Antconc can both be used to extract word clusters (also called n-grams or lexical bundles).
    肖博士, 我想做词块方面的博士论文, 目前我已有了电子文本, 请文用什么系统标注比较好?万分感谢!
    I think you will find my "Corpus-based Language Studies" of help. The 10 chapters of the first part of the book are available at this site: http://www.corpus4u.org/showthread.php?t=449.

    If you are interested in collocation study, you will find unit A10.1 particular useful. But I think you will benefit from have a quick read through the whole unit 10 to see what corpora can help in language studies so as to establish a link between your research interests and what can actually be done with a corpus.
    Thank you for your explanation and all! And now I know the next thing to do is to try to retain these terms ... There are some computer guys in our team and hopefully they will figure out how to do. Well maybe I should start to learn regular expression now. Really a big challenge for us...but big fun also!
    For this specific part I've read Mr.Yang's textbook and the chapter concerned in Susan Huston's work . Any other works you could recommend to us?
    Thank you for your patience!
    I think technical terms like CO2,CH3CHO,NaCl should be retained in your corpus as abbreviations like these are a defining feature that distinguishes this type of text from many other genres. A one M word corpus can be used for collocation studies of course. I think you should read more about what other people have done in this kind of research to see the range of research questions that can be addressed.
    Thank you very much Dr.Xiao! But what can we do with such kind of empirical formula of chemical substances like CO2,CH3CHO,NaCl? They keep on appearing in my corpus and WS5 seems to have failed to recognized them...
    And my little raw corpus is just of 1.03million words, is it large enough for word frequency counting? I've tried to do some collocation research, but really don't know what to start with. Do you have any suggestions? Thx!
    If, by the wording "nonverbal expressions", you mean tables and graphics (rather than gestures and facial expressions as in multimodal corpora) in your EST corpus, then you might find the following discussion of use: http://www.corpus4u.org/showthread.php?t=5026

    A raw corpus, if it is large in size, can be userful in lexical studies: collocations, semantic prosodies, lexical bundles etc.. It can also be used for other kinds of research such as grammatical studies if you know how to extract patternings with the help of regular expressions, or discourse studies with the help of key word analysis.
    Dr Xiao,
    My postgraduate years will take its rounds in China, and now I'm preparing my resume as a gonna-be exempt exam student. I've tried to build an EST corpus and learn how to make concordance and analysis upon concordance result(wordlist making,some collocation study). The exploring has been fanscinating. I'd like to know what more I can do with a raw corpus? And how we deal with the nonverbal expressions (as are more frequently appeared in EST)in a corpus?Looking forward to your reply!
    Brookie: It is not clear in which country you plan to pursue your postgraduate studies.
    Dr.Xiao,
    I'm a young soul in corpus linguistics' world, and I've reaaaally benefited a lot from this lovely forum when I was doing my USRP.It also reavls more of the colorful world to me:) Thank you for your work of maintaining and thriving this forum. And, Dr.Xiao, do you have any suggestions for a BA determing to devote her gratuate student's years into the course of corpus linguistics? Looking forward for your reply:)
  • 正在加载…
  • 正在加载…
  • 正在加载…
Back
顶部