在CQPweb中提取N-gram/cluster/chunk的做法

回复: 在CQPweb中提取N-gram/cluster/chunk的做法

在CQPweb中,想查看动词的使用情况,如何检索所有动词的词频,看了simple query syntax, 例子中有检索反身代词的_PNX,试了检索不出反身代词,关于动词尝试了:\S+_V\w+\s, _VERB, _{VERB}, 都检索不出来,不知如何检索?CQPweb的检索好像和之前学的正则表达式不太一样,不能用!!
 
回复: 在CQPweb中提取N-gram/cluster/chunk的做法

http://www.bfsu-corpus.org/static/corpus_tools/CQPweb_guide.pdf
链接中pdf文档末了有高级检索的范例。CQP syntax不同于一般的正则表达式,但正则表达式能实现的,CQP syntax差不多也都能做到。

检索所有动词,在选择Query mode为[CQP syntax]的情况下,输入[pos="V.*"]。说起来,CQP syntax比起一般的Regex,更容易理解一些。
比如,名词[pos="N.*"],也就可以了
 
Last edited:
回复: 在CQPweb中提取N-gram/cluster/chunk的做法

在CQPweb中,想查看动词的使用情况,如何检索所有动词的词频,看了simple query syntax, 例子中有检索反身代词的_PNX,试了检索不出反身代词,关于动词尝试了:\S+_V\w+\s, _VERB, _{VERB}, 都检索不出来,不知如何检索?CQPweb的检索好像和之前学的正则表达式不太一样,不能用!!

例子中给的应是用CLAWS5,C5标注的,反身代词的code是PNX。

我们的BFSU CQPweb上都是用C7标注的,检索反身代词应该用[pos="PPX.*"]
http://ucrel.lancs.ac.uk/claws7tags.html

检索出反身代词后,还可以试试Frequency breakdown功能。
 
回复: 在CQPweb中提取N-gram/cluster/chunk的做法

To extract 3-word clusters excluding punctuation marks, try the following pattern:

[word="\w*"]{3}
 
回复: 在CQPweb中提取N-gram/cluster/chunk的做法

Re: xiaoz
Hi Richard, in your book review on McEnery and Hardie's book, you pointed out that CQPweb is not helpful to doing cluster/chunk and MD analysis.

In my view, as we could use the regex-like CQP syntax, cluster extraction and MD analysis with CQPweb should not be a problem.
 
回复: 在CQPweb中提取N-gram/cluster/chunk的做法

Yes CQP is making it possible to analyse word clusters in supported corpora. What I meant in my review refers to ordinary users' own corpora.

But probably even this is changing, as some online systems have started to allow users to upload their own corpora for analysis. For example, Leeds University's Intellitext Corpus Queries system allows users to build their own corpora. The system supports a number of languages including Chinese, and it even includes Biber-style MDA alaysis (called Multivariate analysis) - though the results are not always easy to interpret.

http://smlc09.leeds.ac.uk/itweb/htdocs/Query.html


Re: xiaoz
Hi Richard, in your book review on McEnery and Hardie's book, you pointed out that CQPweb is not helpful to doing cluster/chunk and MD analysis.

In my view, as we could use the regex-like CQP syntax, cluster extraction and MD analysis with CQPweb should not be a problem.
 
回复: 在CQPweb中提取N-gram/cluster/chunk的做法

Thanks for the reply and for sharing the Leeds query system.
 
Back
顶部