
  1. Haiyang Ai

    Read their articles, and often their contact info are listed there.

    Read their articles, and often their contact info are listed there.
  2. Haiyang Ai


    回复: 【讲座】“云山学者”陆小飞剖析教育政策的变迁 Lu, Xiaofei, Gamson, David A., & Eckert, Sarah Anne (2014). Lexical difficulty and diversity in American elementary school reading textbooks: Changes over the past century. International Journal of Corpus Linguistics, 19(1):94-117. Gamson, David A., Lu, Xiaofei, &...
  3. Haiyang Ai

    救急~如何从 wordlist中选取关键词

    Probably not. You'll need a more general reference list from large corpora. For example, the word "the" in your corpus would be significantly more frequent than its occurrences in the Academic Word List, but it won't be very interesting...
  4. Haiyang Ai


    You could look at translation universal hypothesis. Read Dr. Xiao's work for details.
  5. Haiyang Ai


    Stanford NLP tools has Chinese Word Segmentation, POS tagging and even syntactic parsing utilities.
  6. Haiyang Ai

    救急~如何从 wordlist中选取关键词

    Keywords are obtained by comparing a word list from your text files and a more general reference word list.
  7. Haiyang Ai


    回复: Python中文分词等组件--免费开源 这个要顶!
  8. Haiyang Ai


    这个要看语料的词性标注的时候用的是什么标注集 (tagset) 如果是 CLAWS7 的话,代码是 VVN See http://ucrel.lancs.ac.uk/claws7tags.html
  9. Haiyang Ai

    求救 体育新闻语料库的语料来源

    回复: 求救 体育新闻语料库的语料来源 You might want to check if there are texts on sports reportages from the COCA corpus. It's probably a good idea to focus your study on examining an interesting linguistic issue related to sports reportage, rather than building such a corpus yourself.
  10. Haiyang Ai

    Copyright issues in building corpora for unpublished thesis

    回复: Copyright issues in building corpora for unpublished thesis My understanding is that if you're collecting such data for research purposes (as opposed to releasing corpus for commercial use), you're probably fine.
  11. Haiyang Ai


    回复: 【基于CNKI的语料检索】纬度?维度?外语核心期刊标题中的一起错别字 这个有些远了,就此打住吧。
  12. Haiyang Ai


    回复: 北语语料库搜索 选择按词搜索的话,应该会出现“就是”前面是逗号的情况。如果搜索 "<<就是",出现的全是以就是开头的句子。可以考虑把结果下载到 Excel 里面,想办法把“就是”单独提出来弄成一栏,然后按左边的词汇排序,应该可以把所有的你不想要的都去掉。
  13. Haiyang Ai


    回复: PSPP:完美替代SPSS的免费软件 哇,GNU 居然也出这样的好东西!
  14. Haiyang Ai


    回复: 北语语料库搜索 他们帮助文档中有说是支持正则表达式检索的。比如: [兴奋的,兴奋地]+10,30+说:查找包含“兴奋的、兴奋地”之后10到30个汉字之内出现了“说”的字符串 你想要查找什么样的模式?
  15. Haiyang Ai


    回复: 求问是否有汉语的学习者语料库或是已发表的相关研究数据 北语有一个动态作文语料库。
  16. Haiyang Ai

    关于Stanford POS Tagger 和 Morpha的问题

    回复: 关于Stanford POS Tagger 和 Morpha的问题 It is possible. If you know programming, you can just loop over your text files, and call the tagger each time.
  17. Haiyang Ai

    关于Stanford POS Tagger 和 Morpha的问题

    回复: 关于Stanford POS Tagger 和 Morpha的问题 Try the online interface: http://aihaiyang.com/synlex/lexical/
  18. Haiyang Ai

    二语习得论文表格中raw与Standardized (n/100,000) 什么意思

    回复: 二语习得论文表格中raw与Standardized (n/100,000) 什么意思 raw 指的是某个词在语料库中出现的次数,standardized 或 normalized frequency 指的是把这个次数除以语料库的总词数,然后再乘以一个10万的基数,得出每10万词中该词出现的大致频率。这么做是为了在语料库大小不同的情况下进行对比。
  19. Haiyang Ai


    回复: 如何使用R语言中的ggplot2包绘制出k-means聚类图 http://www.r-bloggers.com/bot-botany-k-means-and-ggplot2/
  20. Haiyang Ai

    ask for a paper by Barlow M.

    回复: ask for a paper by Barlow M. You could try to ask Professor Barlow directly: http://www.michaelbarlow.com/