wordsmith词丛检索

Ant和Smith都能做呢

AntConc就可以进行“cluster”分析的,查询WordSmith Tools的在线帮助的话,用cluster这个词也能搜索到相关的描述。

http://www.lexically.net/downloads/version5/HTML/index.html

http://www.lexically.net/downloads/version5/HTML/single_words.htm

Clusters are words which are found repeatedly together in each others' company, in sequence. They represent a tighter relationship than collocates, more like multi-word units or groups or phrases. (I call them clusters because groups and phrases already have uses in grammar and because simply being found together in software doesn't guarantee they are true multi-word units.) Biber calls them "lexical bundles".
Language is phrasal and textual. It is not helpful to see it as a matter of selecting a word to fill a grammatical "slot" as implied by structural theories. Words keep company: the extreme example is idiom where they're bound tightly to each other, but all words have a tendency to cluster together with some others. These clustering relations may involve colligation (e.g. the relationship between depend and on), collocation, and semantic prosody (the tendency for cause to come with negative effects such as accident, trouble, etc.).
WordSmith Tools gives you two opportunities for identifying word clusters, in WordList and Concord. They use different methods. Concord only processes concordance lines, while WordList processes whole texts.


而AntConc的在线帮助中,关于cluster的功能介绍很简明:
http://www.antlab.sci.waseda.ac.jp/software/AntConc_Help/Word_Clusters/Overview_(Clusters).htm
 
名词,形容词,人名,职业名称

查找名词或形容词,对于多词性的自然语料而言,很难直接在没有标注的语料中进行自动识别和计算的,必须要先进行“Parts of Speech的自动标注”,然后针对标注用Ant或Smith进行处理。人名或专用名称的识别,表面上简单,只是查找首字母是大写的单词,但是,如果绵延多个单词的话,情况就不容易自动处理了,就算是写正则表达式(regular expression)也很麻烦的。“职业名称”这个提法本身就难以划界,有些职业名称是正规的,有些是民间的叫法,但是在口语和笔语混合的情况下,在没有后台数据库支持的情况下,普通的语料库检索工具难以胜任对职业名称的自动提取的。这样而言,人名的自动提取或检索也是需要最好有后台数据库的提前建立和加载到AntConc之类的语料分析工具中,才能瞬间实现你要查询的语料动作。
 
回复: wordsmith词丛检索

请问各位老师,在wordsmith 中执行了词丛检索的操作后出现了#,这个符号在这里是什么意思呢?
 
#很可能来自语料原文,而不是被检索工具产生的

请问各位老师,在wordsmith 中执行了词丛检索的操作后出现了#,这个符号在这里是什么意思呢?

#很可能来自语料原文,而不是被检索工具产生的,你可以打开原文件进行阅读。同时,你并没有说出井号的位置和数量和规律或其他相关信息。
 
Back
顶部