AntConc使用基本步骤：使用手册中文翻译

woody2314 · 2008-08-09

我想有些亲可能并不是学英语的，需要处理中文的语料库，所以我就把Anthony的AntConc的使用手册翻译了一下，没有全部，只是某些功能的使用步骤，版本是AntConc3.2.1w(windows)2007
1.索引工具（concordance)使用步骤
1）从file菜单的open file 或open dir选择一个或多个要处理的文件，选出来的文件
按顺序在主窗户的左边框里显示出来。
2）在左边search term下的输入框里输入一个搜索词
3）使用右边"Search Window Size"的按钮条的增加和减少按钮来选择在搜索词两边显示的字符数。
4）按“Start”键开始产生索引行的检索结果。检索过程中可按“stop"键随时停止检索。
5）使用Kwic Sort下的按钮条选择一个目标词来重排索引行，0是搜索词，1L， 2L是搜索词左边的第一，第二个单词，1R，2R是搜索词右边第一，第二个单词。注意，三级分类均可，软件刚启动时，二三级未选择。
6）按“Sort”键开始分类处理。
7）将指针移到其中一个索引行的突出的搜索词之上，系统默认为蓝色，与前一项分类得出的目标词不一致，是最开始的搜索词。指针会转变成一个手形的图标。点击突出的搜索词，可以使用户看到搜索词在原文中出现的情况。见“File View"
工具。

xudekuan · 2008-08-09

Re: AntConc使用基本步骤

谢谢

woody2314 · 2008-08-09

Re: AntConc使用基本步骤

注意concordance lines的总数在按钮栏中间的Concordance Hits下的输入框里显示出来，处理结束时，这个数字会闪一下“FINISHED”，如果某个搜索词没有产生concordance line,则显示“NO HITS”，如果是后面这种情况，concordance lines的窗户就不会更新，前面显示的concordance lines还会保留。
搜索词可以通过选择Search Term后面的word设定为“词（默认）”或“词的片断”，
也可以通过选择case来决定区不区分大小写，不区分为默认，也可以选择“Regex”使用完整的正则表达式，可以参考很多有关正则表达式的文章。可以通过网上的搜索引擎来查。也可以在网站http://www.regular-expressions.info/quickstart.html上找信息。（但是我发现这个网站不容易打开，亲们还是自己在百度google上搜吧有一个http://www.regexlab.com/zh/regref.htm可以）
按一下Search Term下面的输入框右边的Advanced键，可以进行更为复杂的搜索。第一个高级搜索项是定义一组搜索词，既可以一行一个地输入，也可以直接载入文件中的一个搜索词单。注意每一行都被认为是一个单独的搜索词。这个特征允许用户使用一大组搜索词但不用每次重复输入。第二个高级搜索项是定义上下文词（Context Words)和一个上下文的范围，在这个范围中必须出现搜索词。比如：要搜索"student"这个词，要求至少在它前三个或后三个单词中一定要出现"university", 就把搜索词定为"student",上下文词定为“university”，上下文范围定为‘From’3L‘to’3R

woody2314 · 2008-08-10

Re: AntConc使用基本步骤

"索引定位"(Concordance Plot)
进行索引定位的步骤与Concordance工具的使用方法一样。但是索引定位工具提供了索引行另一个不同的视角。每一个文件的所有索引项都在一个以条形码上定位出来，表示文中有搜索词的索引行的位置所在。这个工具提供了一个容易的方法来看哪些文件包含目标索引词，也可以用来确定索引词在哪儿碰到词丛。一个例子是确定某个特定的实义词在科技文献的哪里出现，又或者一个字符什么时候出现在小说或剧本的发展过程中。
索引数和每个文件的长度在条形码右边显示出来，定位本身的尺寸可以用左下角的Plot Zoom按钮来增大或缩小。

woody2314 · 2008-08-10

Re: AntConc使用基本步骤

“文件查看”（File View）
任何时候目标文件都可以使用“文件查看”工具来查看原文。
要查看原文，用户需要进行以下操作：
1）在主窗口的左边框里选择一个要查看的文件
2）如果已经指定了一个搜索词，搜索词将会突出显示（默认蓝色）。选定主窗口上的Flile View按钮，再按开始（Start），则得到搜索词在原文出现的情况。
3）使用左下角的Hit Location下的按钮，可以跳到上一个或下一个搜索词上。
4）改变搜索词点击开始键查看其它的搜索词项。
5）点击在原文中突出显示的搜索词，主窗口会出现一组KWIC lines(上下文关键词行或索引行）
以下是一个“File View”的快捷键
Ctrl-Click（左击）=跳到窗口中最近的搜索词项上

woody2314 · 2008-08-10

Re: AntConc使用基本步骤

“词丛”（Clusters）
词丛工具是用来生成目标文件搜索词周围排列成序的词丛列单的。
词丛既可以按照频率（Freq)也可以按照词头或词尾排序，也可以按照词丛第一个单词的出现概率(Prob)排序。所有的排列顺序可以颠倒。用户也可以选择每个词丛的最大或最小长度（Cluster Size即单词数），以及列出词丛的最小频率(Min.Cluster Frequency)。也可以选择搜索词是否总出现在词丛左边或右边（Search Term Position）。（注意：在目前的版本中，如果超过一个单词指定为搜索词，如果选择搜索词在右，只有第一个单词出现在右边。）
生成词丛列单，需进行以下操作：
1）选定主窗口上的Clusters按钮，选择合适的排序选项。
2）按开始键，任何时候可以使用停止键（Stop）中止。
3）点击词丛会产生与搜索词相关的一组上下文关键词行或索引行。

woody2314 · 2008-08-11

Re: AntConc使用基本步骤

N-Gram“N元模式”（部分词丛）
“N元模式”工具是用来生成目标文件有序的N元列单的。N元模式是单词的N元，因此，大的文件将会产生大量的N元，比如，一个句子“This is a pen.” 中的二元是‘this is', 'is a'和’a pen'。
就像和“词丛”工具一样，N元模式工具既可以按词频也可以按词头或词尾排列。也可以按词丛首单词的出现概率排列。所有的列单顺序都可以颠倒。同样，用户也可以选择每个N元的最大或最小尺寸（单词数），词频最小的N元数。
想要列一个“N元模式”的单子，用户需进行以下操作：
1）选择Clusters窗口模式下Search Term右边的N-Grams选项
2）选择合适的排列选项
3）按开始键。任何时候都可以中止。
4）点击词汇束会产生一组上下文关键词行。
N-Gram结合网上解释翻译，若有谬误，请多指正。

Haiyang Ai · 2008-08-12

回复: AntConc使用基本步骤

谢谢 woody2314 花时间给大家翻译出来。全部完成后我们可以整理一下发布到 wiki 上去，作为 AntConc 的入门材料。

woody2314 · 2008-08-12

Re: AntConc使用基本步骤

呵呵，可以啊，但是后面还有好多内容，好艰巨的呢，我在想是把操作步骤列在一起，后面的统计方法再重新分出来，还是放在一起呢？还有一点，虽然基本的操作是这样，但真正动手还会出现其它的问题，比如lemma list, 我就不知道该怎么弄，好恼火啊。。。
以下是Laurence给我回的，
You have to reformat the file in the form of the list on the AntConc
homepage (which is the same as Professor Someya's list...which is also
on the AntConc homepage). Then,

1) Go to the wordlist preferences menu option in AntConc
2) Load the lemma list
3) Check the option to use the lemma list
4) Generate a wordlist as usual (results will appear grouped).

I hope that helps.
可是我找不到他的主页上的列单形式啊。。。

xujiajin · 2008-08-12

Re: AntConc使用基本步骤

http://www.antlab.sci.waseda.ac.jp/software/e_lemma_no_hypen.zip

woody2314 · 2008-08-12

Re: AntConc使用基本步骤

谢谢许博士。

woody2314 · 2008-08-12

Re: AntConc使用基本步骤

“搭配”（Collocates）
“搭配”工具是用来生成目标文件的一组有序的搭配列单的。搭配既可以按词频，也可以按照搜索词左边或右边的搭配频率，或者词头或词尾排列。也可以按照搜索词和搭配间的统计量的值来排列。统计量的值测的是搜索词与搭配度之间的相关程度。AntConc目前可行的统计量列在下面。所有的列单顺序可以颠倒。同样的，用户也可以选择搜索词左右两边的单词跨度（单词数），显示的搭配的最小频率。比如：如果只要求一个单词跨度，要看哪些单词直接出现在搜索词右边，选中右边Window Span的Same选项，使得单词跨度的最大值和最小值相同。
统计量（Statistical Measures）
交互信息（MI)：使用M. Stubbs在Collocations and Semantic Profiles, Functions of Language 2, 1 (1995)此文中描述的方程式。交互信息测的是一个单词的出现意味着另一个单词出现概率大小的信息量（此句为我加的）。
T值（T-Score）：使用M. Stubbs在Collocations and Semantic Profiles, Functions of Language 2, 1 (1995)此文中描述的方程式。
要列一个搭配的列单，用户需要进行以下操作：
1）选择合适的排列选项
2）按开始键。任何时候都可以中止。
3）点击搭配会产生一组上下文关键词行。

woody2314 · 2008-08-12

Re: AntConc使用基本步骤

“词单”（Word List）
词单是用来生产目标文件的有序的词单（单词表）的。
词单既可以按词频，也可以按词头词尾的顺序排列，还可以颠倒顺序。词单可以在大小写敏感（case-insensitive）模式下生成，即不区分大小写，大小写同等对待；也可以在大小写不敏感case-sensitive模式下生成，即区分大小写，大小写分别处理。
要列一个单词表，用户需要进行以下操作：
1）选择合适的排列选项
2）按开始键。任何时候都可以中止。
3）点击单词会产生一组上下文关键词行。

woody2314 · 2008-08-12

Re: AntConc使用基本步骤

呜呜呜，AntConc用不起来了。。。

怎么办啊？
wordlist不管用了。。。

woody2314 · 2008-08-12

Re: AntConc使用基本步骤

呜呜呜。。。又能用了。。。

laohong · 2008-08-12

Re: AntConc使用基本步骤

是否考虑贴译文时也把英文原文贴上？这样可以方便大家一起学习、润色。

woody2314 · 2008-08-13

Re: AntConc使用基本步骤

好的。以下是前面翻译的原文。
**Concordance**
The **Concordance** tool generates concordance lines (or KWIC: key word in context) lines from one or more target texts chosen by the user.
To produce a set of concordance lines of text, a user needs to perform the following actions:
1) Select one or more files for processing from using the 'Open File(s)...' or 'Open Dir...' options in the 'File' menu. The list of selected files is shown in the left frame of the main window.
2) Enter a search term on which to build concordance lines in the entry box on the left of the button bar.
3) Choose the number of text characters to be outputted on either side of the search term, using the increase and decrease buttons on the right of the button bar under the "Search Window Size" title. (default value is 50 characters)
4) Click on the 'Start' button to start the concordance lines results generation. The concordance generation can be halted at any time by clicking on the 'Stop' button.
5) Select a target word on which to rearrange the concordance lines, using the buttons to the right of the button bar. 0 is the search word, 1L, 2L... are words to the left of the target word, 1R, 2R .. are words to the right of the target word. Note that three levels of sort are possible, with the second and third levels not-activated when the software is first launched.
6) Click on the 'Sort' button to start the sorting process.
7) Move the cursor over the highlighted search term in one of the concordance lines. The cursor will change to a small hand icon. Clicking on the highlighted search term, will allow the user to view the search term hit as it appears in the original file via. the **File View** tool (see below).
Note that the total number of concordance lines generated (hits) is shown in the middle of the AntConc button window. This number will flash with the word "FINISHED" when processing has been completed, and will flash with the word "NO HITS", if not hits are generated for a particular search term. In this case, the concordance lines view will not be updated, and the previous set of concordance lines will remain visible.
Search terms can be specified as being "words"(default) or "word fragments" by choosing the "Word" search term option. Also, searches can be either case sensitive, or case insensitive (default) by choosing the "Case" search term option. Searches can also be made using full regular expressions by choosing the "Regex" option. For details on how to use regular expressions, consult one of the many texts on the subject. E.g., Mastering Regular Expressions (O'Reilly & Associates Press) or type "regular expressions" in a web search engine to find many sites on the subject.
Information about regular expressions can be found at: http://www.regular-expressions.info/quickstart.html
By clicking on the "Advanced Search" button, more complex searches become possible. The first advanced search option is to define a set of search terms, either by typing them one per line, or by loading in a list of search terms from a file. Note that each line will be treated as a separate search term. The feature allows the user to use a large set of search terms without having to re-type them each time. The second advanced search option is to define context words and a context window within which the search term(s) must appear. For example, to search for "student" where it appears at least three words to the left or right of the word "university," set the search term as "student," the context word as "university," and the context window as 'From' 3L 'To' 3R.
A number of menu preferences are available with this tool. (See below).

**Concordance Plot**
Generating concordance plots can be achieved using the same actions as when using the **Concordance** tool. However, the **Concordance Plot** tool offers an alternative view of concordance lines. Here, all the hits for each file are plotted in the form of a 'barcode' indicating the position in the file where the hit occurred. The plot provides an easy way to see which files include the target search term, and can also be used to identify where the search term hits cluster together. An example of the use of the plot is in determining where specific content words appear in a technical paper, or when a character appears during the course of a novel or play.
The number of hits and length of each text is shown to the right of the barcode plot, and the plot itself can be enlarged or reduced in size using the zoom buttons
If you move the cursor over the highlighted search term in one of the concordance lines. The cursor will change to a small hand icon. Clicking on the highlighted search term, will allow the user to view the search term hit as it appears in the original file via. the **File View** tool (see below).

**File View**
At any time a target file can be viewed in its original form using the **File View** tool.
To produce a view of the original file, a user needs to perform the following actions:
1) Select a file to view in the file list frame to the left of the main window.
2) If a search term has been specified, the search term hits will be highlighted throughout the text. Search options are the same as for the **Concordance** and **Concordance Plot** tools.
3) Use the "Hit Location" buttons to jump to the appropriate hit in the file.
4) Change the search term and click on the 'Start' button to view other hits in the file.
5) Clicking on the highlighted text will generate a set of KWIC lines using the highlighted text as the search term.

Below is a list of Shortcuts unique to the **File View** tool.
CTRL-Click = Jumps to the nearest hit in the window

**Clusters**
The **Clusters** tool is used to generate an ordered list of clusters that appear around a search term in the target files listed in the left frame of the main window.
The clusters can be ordered either by frequency or the start or end of the word. They can also be ordered by the probability of the first word in the cluster preceding the remaining words. All list orderings can also be inverted. Also, a user can select the minimum and maximum length (number of words) in each cluster, and the minimum frequency of clusters displayed. It is also possible to select if the search term always appears on the left or right of the cluster. (Note: In the current version, if more than one word is specified as the search term, only the first word will appear on the right, if the "Search Term on Right" option is selected.)
To produce a cluster list , a user needs to perform the following actions:
1) Choose the appropriate ordering options.
2) Press the 'Start' button. At any time, the generation of the clusters list can be halted using the 'Stop' button.
3) Clicking on the cluster will generate a set of KWIC lines using the text as the search term.
A number of menu preferences are available with this tool. (See below).

woody2314 · 2008-08-13

Re: AntConc使用基本步骤

**N-Grams** (part of Word Clusters)
The **N-Grams** tool is used to generate an ordered list of n-grams that appear in the target files listed in the left frame of the main window. N-grams are word n-grams, and therefore, large files will create huge numbers of n-grams. For example, n-grams of size 2 for the sentence "this is a pen", are 'this is', 'is a' and 'a pen'.
As with the **Clusters** tool, the n-grams can be ordered either by frequency or the start or end of the word. They can also be ordered by the probability of the first word in the cluster preceding the remaining words. All list orderings can also be inverted. Also, a user can select the minimum and maximum size (number of words) in each n-gram, and the minimum frequency of n-grams displayed.
To produce an N-gram list, a user needs to perform the following actions:
1) Click on the "N-Grams" option above the search entry box.
2) Choose the appropriate ordering options.
3) Press the 'Start' button. At any time, the generation of the n-grams list can be halted using the 'Stop' button.
4) Clicking on the lexical bundle will generate a set of KWIC lines using the text as the search term.
A number of menu preferences are available with this tool. (See below).

**Collocates**
The **Collocates** tool is used to generate an ordered list of collocates that appear near a search term in the target files listed in the left frame of the main window.
The collocates can be ordered either by frequency, frequency on the left or right of the search term, or the start or end of the word. They can also be ordered by the value of a statistical measure between the search term and the collocate. The value measures how 'related' the search term and the collocate are. Current possible statistical measures are listed below. All list orderings can also be inverted. Also, a user can select the span of words to the left and right of the search term in which to find collocates, and the minimum frequency of collocates displayed. If only a one-word span is required, for example, to see which words appear directly on the right of the search term, check the "Same" box, to keep the minimum and maximum span size the same.
Statistical Measures:
(MI) Mutual Information: Using equations described in M. Stubbs, Collocations and Semantic Profiles, Functions of Language 2, 1 (1995)
(T-Score) T-Score: Using equations described in M. Stubbs, Collocations and Semantic Profiles, Functions of Language 2, 1 (1995)
To produce a collocate list, a user needs to perform the following actions:
1) Choose the appropriate ordering options.
2) Press the 'Start' button. At any time, the generation of the collocates list can be halted using the 'Stop' button.
3) Clicking on the collocates will generate a set of KWIC lines using the text as the search term.
A number of menu preferences are available with this tool. (See below).

**Word List**
The Word List feature is used to generate a list of ordered words that appear in the target files listed in the left frame of the main window.
The words can be ordered either by frequency or the start or end of the word, and the list can be inverted. The word list can also be generated in case-insensitive mode, where words in upper and lower case are treated the same (default) or case-sensitive, where words in upper and lower case are treated separately.
To produce a word list, a user needs to perform the following actions:
1) Choose the appropriate ordering options.
2) Press the 'Start' button. At any time, the generation of the word list can be halted using the 'Stop' button.
3) Clicking on the word will generate a set of KWIC lines using the text as the search term.
A number of menu preferences are available with this tool. (See below).

**Keyword List**
In addition to generating word lists, AntConc can compare the words that appear in the target files with the words that appear in a 'reference corpus' to generate a list of "Keywords", that are unusually frequent (or infrequent) in the target files.
To produce a keyword list, a user needs to perform the following actions:
1) Select a set of target files.
2) Go to the 'Preferences' menu and choose the 'Keyword Preferences' option.
3) Choose a statistical measure to assess the 'keyness' of the target file words. The default setting of Log Likelihood is recommended.
4) Choose a threshold for the number of keywords to be displayed.
5) Choose whether or not to view 'Negative Keywords' (target file words with an unusually low frequency compared with the frequency in the reference corpus)
6) Choose a reference corpus of text (.txt) files, in the same manner that the target files are chosen.
7) The reference corpus directory will be shown (if appropriate), and the list of reference corpus files will appear at the bottom of the Keyword Preferences option menu.
8) Click 'OK' in the Keyword Preferences menu, and return to the main Keywords window.
9) Choose suitable options for displaying the list of generated Keywords (in a similar manner to the options for generating a Word List).
10) Press the 'Start' button. At any time, the generation of the keyword list can be halted using the 'Stop' button.
11) Clicking on the keyword will generate a set of KWIC lines using the text as the search term.
A number of menu preferences are available with this tool. (See below).

woody2314 · 2008-08-13

Re: AntConc使用基本步骤

**MENU OPTIONS**
Menu options are divided into three groups, "File", "Global Settings" and "Tool Preferences". The options available in each group will be described below.
<FILE>
Options here relate to reading files into AntConc and writing files to the hard disk containing data of various types. There are also options to export all current settings to a file, and import user settings from a file. If a user settings file becomes corrupted for any reason, simply restart the program or use the "Restore Default Settings" option to return the program to its original state.
<GLOBAL SETTINGS>
Categories here will have an effect on multiple tools in AntConc:
<File Settings>
In the File Settings category, the user can choose to display the full path of a file or just the name. The user can also choose to show or hide any tags in the file. The tag boundaries can be specified.
“菜单选项”（Menu Options）
菜单选项分为三组，“文件”（File），“总体设置”（Global Settings）以及“工具参数选择”（Tool Preferences）。每组可用选项如下所述：
“文件”
这里的选项与将文件读取到AntConc以及将包含各种不同类型的文件写入硬盘有关。也有选项是把所有当前的设置输出到文件（Export Settings to File），或者引入文件上的用户设置（Import Settings from File）。如果用户设置文件由于某些原因损坏了，只要重启文件或使用"恢复默认设置"（Restore Default Settings)选项将程序还原到原始状态即可。
“总体设置”
这里的范畴影响AntConc的多重工具。
“文件设置”（File Settings)
文件设置范畴中，用户可以选择显示完整文件路径（Show full pathname)或只是文件名。“The user can also choose to show or hide any tags in the file. The tag boundaries can be specified.”这一部分我观察File Settings里面没有，是Tag Settings里面的，所以就不翻了。有个问题是File Settings下面还有一个选择文件格式的项，Default file type to use with "OpenDir"然后下面有显示了.txt右边有个小按钮，指的是你从File菜单选项里选择OpenDir这个选项，显示的是你可以导入的文件格式是txt形式，意思是可以从OpenDir选项中选择的是这个小按钮按下去之后选择的文件格式。OpenDir选择的只有文件夹，所以必须建立一个文件夹之后再选。

woody2314 · 2008-08-13

Re: AntConc使用基本步骤

<Tag Settings>
In the Tag Settings category, the user can choose to display or hide any tags that are
contained in the corpus files. If tags are to be hidden, the opening and closing
tag markers must be specified. The default is <>.
<Wildcard Settings>
In the Wildcard Settings category, users can edit the default wildcard characters so that they do not clash with a search entry. For example, the "or" wildcard default character (a 'pipe' character | ) can be changed to a backslash / here.
“标注设置”Tag Settings（Tag的译法请指教）
“标注设置”范畴中，用户可以选择显示或隐藏语料库文件中的任何标注。如果要
隐藏标注，标注的开始标记Start Tag和结束标记Start Tag必须确定，默认的是<>。
“通配符设置”（Wildcard Settings）
通配符设置范畴中，用户可以编辑默认的通配符字符,这样就不会跟搜索词条不符了。比如：“或”（Or）通配符的默认字符是（一个竖杠的字符| }，可以换成反斜线符号/。

AntConc使用基本步骤：使用手册中文翻译

woody2314

xudekuan

Moderator

woody2314

woody2314

woody2314

woody2314

woody2314

Haiyang Ai

Administrator

woody2314

xujiajin

管理员

woody2314

woody2314

woody2314

woody2314

woody2314

laohong

管理员

woody2314

woody2314

woody2314

woody2314