搜寻结果

Q
在使用nltk时如何匹配两个以上的名词？

For <NN.*>{2,} try pattern = r"""NP: {<NN.*><NN.*>+}""" For <NN.*>{2,5} try pattern = r"""NP: {<NN.*><NN.*>} {<NN.*><NN.*><NN.*>} {<NN.*><NN.*><NN.*><NN.*>} {<NN.*><NN.*><NN.*><NN.*><NN.*>} """ Ugly but work. :) Run the following...
- qhdjason
- Post #2
- 2016-06-29
- 论坛: 编程与工具开发
Q
急急!有關構式搭配分析軟體問題

There's a very detailed explanation here: http://www.linguistics.ucsb.edu/faculty/stgries/teaching/groningen/readme.txt Just follow the instruction and you will get the result. The key for the R script to work is to organize your data as required. Take the collexeme_analysis as an example...
- qhdjason
- Post #2
- 2016-06-28
- 论坛: 语料库检索
Q
请教：如何进行多个子语料库（7个）在某10个观察维度上频数间的两两差异显著性检验？谢谢！

可以参考下面这本书（http://gen.lib.rus.ec/）： An Introduction to Categorical Data Analysis (2nd Ed.) 第 2.2 节有关 Odds Ratio 的介绍和第 7.1 节（p.207）有关如何解读 Log Linear 模型的结果
- qhdjason
- Post #9
- 2016-02-21
- 论坛: 编程与工具开发
Q
请教：如何进行多个子语料库（7个）在某10个观察维度上频数间的两两差异显著性检验？谢谢！

你好，附件是一些测试数据的结果。Freq 那一列应该是频率数字。
- qhdjason
- Post #7
- 2016-02-20
- 论坛: 编程与工具开发
Q
关于MAT软件

如果特征原始频率一样，z 值也应一样。猜测是否数据中有很多原始频率为 0 或 1 的特征。 p.s. 第三张不是 z 值吧？
- qhdjason
- Post #2
- 2016-02-11
- 论坛: 常见问题
Q
双语语料库软件AntPConc 用不了，没反应

多谢！果然修改路径就可以了。
- qhdjason
- Post #5
- 2015-11-10
- 论坛: 多语种语料库
Q
双/多语语料库对齐软件 InterText

回复: 双/多语语料库对齐软件 InterText InterText 的优点是提供了手工校对的图形界面，支持多人共同标注。自动对齐的效果并不好。如果你试过 ABBYY，效果不理想的话就没有必有使用 InterText 了。这些开源软件对CJK语言，即 Chinese，Japanese 和 Korean 的支持都不好，有很多问题。
- qhdjason
- Post #18
- 2015-04-30
- 论坛: 语料库标注
Q
mwetoolkit - The Multiword Expressions toolkit

回复: mwetoolkit - The Multiword Expressions toolkit Thanks! Another interesting open source NLP tool, but developed under Linux. See here for installation on Windows.
- qhdjason
- Post #2
- 2015-04-21
- 论坛: 编程与工具开发
Q
双/多语语料库对齐软件 InterText

回复: 双/多语语料库对齐软件 InterText Use search engine!:)
- qhdjason
- Post #16
- 2015-04-21
- 论坛: 语料库标注
Q
请教Stanford Parser怎么用

回复: 请教Stanford Parser怎么用 http://www.bfsu-corpus.org/static/BFSUTools/BFSU_Stanford_Parser1.0.zip
- qhdjason
- Post #2
- 2015-04-04
- 论坛: 语料库标注
Q
结巴中文分词词性标注 R 语言版

结巴中文分词有很多语言的版本，详见 https://github.com/fxsjy/jieba 试了下 R 语言的结巴，好像还不错。步骤如下： 1. 安装结巴 > install.packages("jiebaR") 2. 读取库文件 > library(jiebaR) 3. 调用 worker() 函数 > tagger = worker("tag", symbol=T) 4. 分词、词性标注 > tagger <= "D:/ludingji.txt" 标注结果保存在 D 盘。
- qhdjason
- 主题
- 2015-03-29
- 回覆: 1
- 论坛: 语料库标注
Q
双/多语语料库对齐软件 InterText

回复: 双/多语语料库对齐软件 InterText 多谢提醒！我用的是Strawberry Perl，里面自带XML::LibXML模块。如果是Active Perl的话，可能需要安装一下。
- qhdjason
- Post #14
- 2015-03-29
- 论坛: 语料库标注
Q
双/多语语料库对齐软件 InterText

回复: 双/多语语料库对齐软件 InterText 应该是ABBYY Aligner吧。谢谢提醒。我用《鹿鼎记》的中英文版试了一下，效果好像比 Hunalign 好些，不过人工校对是必不可少的。我用的 ABBYY Aligner 不支持导出 .txt 文本，因此写了perl 脚本将 ABBYY Aligner 的 TMX 格式平行语料库转换为TXT格式，希望对大家有用。
- qhdjason
- Post #12
- 2015-03-28
- 论坛: 语料库标注
Q
双/多语语料库对齐软件 InterText

回复: 双/多语语料库对齐软件 InterText I wrote an article about the usage of InterText server, if you're interested, you can read it here
- qhdjason
- Post #11
- 2015-02-24
- 论坛: 语料库标注
Q
R Fisher's Exact Test 脚本

R 语言： Fisher's Exact Test 脚本 1. Why Fisher's Exact Test? Because Chi-squared Test is not so accurate when the expected frequency is less than 5. 2. How to use the script compute_fisher.r? It's very easy. Just copy all the code into R and change the fist line setwd() to the directory...
- qhdjason
- 主题
- 2014-06-02
- 回覆: 0
- 论坛: 编程与工具开发
Q
Asking for help with data transformation for normal distribution

回复: Asking for help with data transformation for normal distribution If the random variable is not normally distributed, how can you transform it to be so? I think transformation such as standardization only affects the scale of the variable instead of the distribution of it. Yes, I think...
- qhdjason
- Post #2
- 2014-04-11
- 论坛: 语料库与外语教学
Q
请教计算词性标注正确率的方法。

回复: 请教计算词性标注正确率的方法。这个问题很有意思。现在词性标注准确率号称达到97%。但这是按词来算的，如果按句子来算，准确率会大大降低。有一篇文章讨论了这个问题 http://nlp.stanford.edu/pubs/CICLing2011-manning-tagging.pdf 提供一个思路，是否可以用不同的Tagger（如Stanford Tagger, OpenNLp Tagger, TreeTagger, CLAWS）一起标语料，然后找到其中不一样的标注，可能会是错误呢？
- qhdjason
- Post #4
- 2013-12-27
- 论坛: 语料库标注
Q
【转】Statistics for Linguistics with R: A Practical Introduction (2nd revised edition)

回复: 【转】Statistics for Linguistics with R: A Practical Introduction (2nd revised editi 这本书挺好的，稍微有点遗憾的是在多变量统计分析时没有讲到求解各参数权重的具体方法，比如逻辑回归中（logistic regression）的梯度下降。而这一步骤是最有意思的。结合cousera.org上的教程看这本书可能收获更大： https://class.coursera.org/ml/lecture/58
- qhdjason
- Post #4
- 2013-12-01
- 论坛: 书籍资料
Q
【转】Statistics for Linguistics with R: A Practical Introduction (2nd revised edition)

回复: 【转】Statistics for Linguistics with R: A Practical Introduction (2nd revised editi Thanks!
- qhdjason
- Post #2
- 2013-12-01
- 论坛: 书籍资料
Q
请教：如何进行多个子语料库（7个）在某10个观察维度上频数间的两两差异显著性检验？谢谢！

回复: 请教：如何进行多个子语料库（7个）在某10个观察维度上频数间的两两差异显著性检验？谢谢！方差分析不适合，因为你的数据是频率。卡方检验一般每个变量有 2~4 个 Level 还好分析。你的Level太多了，两两对比的话有 945 种可能，用统计分析的话也不好解释。也许你可以归并一下变量，降低分析的难度。或者把原始频率标准化后，画 7个柱状图，每个图中有 10个变量的频数，看一下大致的趋势。找出差异最明显的分析一下就可以了吧。
- qhdjason
- Post #4
- 2013-09-17
- 论坛: 编程与工具开发

Home
搜索