用AntConc处理中文concordance, wordlist, N-gram

主题发起人 xujiajin
时间 2006-03-29

I

iCasino

普通会员

2006-03-30

#41

回复：用AntConc处理中文concordance, wordlist, N-gram

是啊， AntConc从1.0到3.0，从丑小鸭变成了白天鹅，赞一个。感谢所有为此作出贡献的人。

xusun575

高级会员

2006-03-30

#42

回复：用AntConc处理中文concordance, wordlist, N-gram

以下是引用 seanxpq 在 2006-3-30 13:52:42 的发言：
呵呵,实在忍不住了要说:大家终于对俺大力推荐过的软件ANTCONC感兴趣了吧。

C友共商真情在，淘尽黄沙始见金

xujiajin

管理员

Staff member

2006-03-30

#43

小和两句。

蚍蜉可堪词匠职，集腋终得成大势。

注：蚍蜉，即蚂蚁，借指Ant（AntConc的中心词）。词匠即是WordSmith。这一句是用来感谢Laurence Anthony的无私之举的。

同时也希望大家通过Corpus4u这个空间，“茹切如搓，如琢如磨

”,集腋成裘，聚沙成塔，推动语料库研究在中国的发展。

xusun575

高级会员

2006-03-30

#44

回复：用AntConc处理中文concordance, wordlist, N-gram

以下是引用 xujiajin 在 2006-3-30 19:22:07 的发言：
小和两句。

蚍蜉可堪词匠职，集腋终得成大势。

注：蚍蜉，即蚂蚁，借指Ant（AntConc的中心词）。词匠即是WordSmith。这一句是用来感谢Laurence Anthony的无私之举的。

同时也希望大家通过Corpus4u这个空间，“茹切如搓，如琢如磨”,集腋成裘，聚沙成塔，推动语料库研究在中国的发展。

[emb2][emb2][emb2][emb6]

Haiyang Ai

Administrator

2006-03-31

#45

回复：用AntConc处理中文concordance, wordlist, N-gram

以下是引用 xujiajin 在 2006-3-30 19:22:07 的发言：
小和两句。

蚍蜉可堪词匠职，集腋终得成大势。

注：蚍蜉，即蚂蚁，借指Ant（AntConc的中心词）。词匠即是WordSmith。这一句是用来感谢Laurence Anthony的无私之举的。

同时也希望大家通过Corpus4u这个空间，“茹切如搓，如琢如磨”,集腋成裘，聚沙成塔，推动语料库研究在中国的发展。

赞一个！

动态语法

管理员

Staff member

2006-03-31

#46

回复：用AntConc处理中文concordance, wordlist, N-gram

以下是引用 laohong 在 2006-3-30 11:55:18 的发言：

以下是引用动态语法在 2006-3-29 15:28:58 的发言：

... I have had numerous discussions with him about code names; apparently this is the best that can be done at this point...

Click to expand...

Basically, my test showed that this tiny program works very well with Chinese texts, though it is a pity that the concordances of KWIC are not nicely presented. Can you also ask him to add an option in saving the concordance result? Something similar as Wconcord's "Save with delimiters":

With the delimiters saved, the concordance result looks as follow:

Then we can make use of regular expression to replace all "|" with a Tab, and replace "[" with a Tab and "[". The result then can be opened with Excel in three columns. Resort in Excle is of course quite easy.

[/quote]

So my understanding is that you want some characters there in the result file to work with with
a GREP program and eventually be able to export the result to Excel. I asked him to make it
possible to center the search term in the line, which he said could be done easily. If this
happens I think it would work for your need. That is, if the search term is centered
there is usally a tab character before and after the search term, so you don't need the
| -> TAB replacement process. You could still use a GREP program to replace the sequence
'TAB SEARCH_TERM TAB' with whatever you want to replace and export
the data to whatever program you want to export. As far as I can tell, having the result in
a fixed format (e.g. TAB SEARCH_TERM TAB), a lot of things can be made possible.
(With regard to the [ ] characters, that's even easier to replace with any 'search and replace'
mechanisms.)

A little bit of history: the multilingual/UNICODE capability was added in v. 3.0. Now 3.1. is
vastly better than 3.0 but it's still a bit confusing as far as the encoding names.

xujiajin

管理员

Staff member

2006-03-31

#47

A problem found when doing a multiple search words concordancing:
When I used "the|a|an" as the search term:

"idea" and "and" also showed up in the search results.

动态语法

管理员

Staff member

2006-03-31

#48

回复：用AntConc处理中文concordance, wordlist, N-gram

you best bet is to take the WordList -> Concordance route:

Go to
Tool preferences -> WordList Pref. -> select
"Use Specific Words Listed Below", input through either Add Words or From File.
You"ll get the word list information, and click on any word on the list you will see the
corresponding concordance lines.

xusun575

高级会员

2006-03-31

#49

回复：用AntConc处理中文concordance, wordlist, N-gram

以下是引用 xujiajin 在 2006-3-31 14:22:57 的发言：
A problem found when doing a multiple search words concordancing:
When I used "the|a|an" as the search term:

"idea" and "and" also showed up in the search results.

Ablank will make a difference. Try again by changing your search terms "the|a |an" to "the |a |an " ,that is, there should be blank after each of your search terms.

刘

刘语料

封禁用户

2006-03-31

#50

是的，如果要逻辑“或”的时候，符号“|”前要加一个空格。

xujiajin

管理员

Staff member

2006-04-01

#51

xusun575 and 刘语料 are right. Thanks.

xujiajin

管理员

Staff member

2006-04-01

#52

回复：用AntConc处理中文concordance, wordlist, N-gram

以下是引用动态语法在 2006-3-31 14:38:31 的发言：
you best bet is to take the WordList -> Concordance route:
Go to
Tool preferences -> WordList Pref. -> select
"Use Specific Words Listed Below", input through either Add Words or From File.
You"ll get the word list information, and click on any word on the list you will see the
corresponding concordance lines.

Yeah. This route, a detour tough, leads to concordances of multiple search terms. In this case, Wordlist serves as a pointer to Concordance. In file-based concordancing results, however, search words of a category can be viewed in one window, not in separate ones.

hancunxin

Moderator

2006-04-05

#53

回复：用AntConc处理中文concordance, wordlist, N-gram

以下是引用 xujiajin 在 2006-3-30 19:22:07 的发言：
小和两句。

蚍蜉可堪词匠职，集腋终得成大势。

注：蚍蜉，即蚂蚁，借指Ant（AntConc的中心词）。词匠即是WordSmith。这一句是用来感谢Laurence Anthony的无私之举的。

感谢 Laurence Anthony, 同时也要感谢将这个优秀软件介绍给国内C友的各位热心朋友！

yinghuang

高级会员

2006-04-06

#54

thank u all. it's quite useful for my research!

laohong

管理员

Staff member

2006-04-06

#55

回复：用AntConc处理中文concordance, wordlist, N-gram

以下是引用动态语法在 2006-3-31 1:47:48 的发言：
So my understanding is that you want some characters there in the result file to work with with a GREP program and eventually be able to export the result to Excel. I asked him to make it possible to center the search term in the line, which he said could be done easily. If this happens I think it would work for your need. That is, if the search term is centered there is usally a tab character before and after the search term, so you don't need the | -> TAB replacement process. You could still use a GREP program to replace the sequence 'TAB SEARCH_TERM TAB' with whatever you want to replace and export the data to whatever program you want to export. As far as I can tell, having the result in a fixed format (e.g. TAB SEARCH_TERM TAB), a lot of things can be made possible.

Yes, you are right, I think it's better to have TABs inserted: 'TAB SEARCH_TERM TAB' . It's good to have a lot of things made possible with it.

laohong

管理员

Staff member

2006-04-06

#56

回复：用AntConc处理中文concordance, wordlist, N-gram

以下是引用 xusun575 在 2006-3-29 12:04:04 的发言：
Laohong啊，中文分词你用的是什么宝贝啊？

前面的例子是用ICTCLAS分词的。请搜索本站找相关的自动分词和词性标注工具：SegTag、ICTCLAS、NEUCSP、Hylanda、WinAT等。

A

aillee111

初级会员

2006-04-11

#57

谢谢许博士的推荐，真的很好用哎！

Haiyang Ai

Administrator

2006-04-11

#58

昨天在Linux下面也试用过了，非常不错，真是跨平台的软件。

I

ivysweet

初级会员

2006-04-18

#59

奇怪了，AntConc在我的电脑上怎么就不能运行呢？双击.exe的图标，没有反应。那位大侠能帮俺解答一下啊？另外，我的另一台电脑能运行3.1.2.0的版本，可3.1.302又遇到同样的问题。怎么回事呢？

xusun575

高级会员

2006-04-18

#60

Restart your PC and run the AntConc. Good luck!

You must log in or register to reply here.

Share:

Reddit Pinterest Tumblr WhatsApp Email 链接

顶部