Character-based collocation for Chinese

动态语法

管理员
Staff member
#2
回复:Character-based collocation for Chinese

Interesting.

How about a comparison with a word-based study using the
same corpus? If the proposed system is really superior, which
may well be, it would be interesting to see any discrepancies
between the two approaches.
 

xiaoz

永远的超级管理员
Staff member
#4
If this approach is reliable, what we get should be statistically typical "words".
 

xujiajin

管理员
Staff member
#5
回复:Character-based collocation for Chinese

以下是引用 xiaoz2005-8-15 20:13:40 的发言:
If this approach is reliable, what we get should be statistically typical "words".
4grams, 5grams, ... may work as collocations.
 

xiaoz

永远的超级管理员
Staff member
#6
Source: (http://portal.acm.org/citation.cfm?id=991980)
Character-based collocation for Mandarin Chinese

Full text Pdf (302 KB)
Source International Conference On Computational Linguistics archive
Proceedings of the 15th conference on Computational linguistics - Volume 1 table of contents

Kyoto, Japan
SESSION: Computational linguistics table of contents

Pages: 540 - 543
Year of Publication: 1994
Authors Chu-Ren Huang Institute of History and Philology, Academia Sinica, Nankang, Taipei, Taiwan
Keh-jiann Chen Institute of Information Science, Academia Sinica, Nankang, Taipei, Taiwan
Yun-yan Yang Computing Center, National Taiwan University, Taipei, Taiwan

Sponsor undetermined : undetermined

Publisher Association for Computational Linguistics Morristown, NJ, USA
 

cncorpus

普通会员
#7
回复:Character-based collocation for Chinese

以下是引用 xiaoz2005-8-15 20:13:40 的发言:
If this approach is reliable, what we get should be statistically typical "words".
It's not the case all the time. See

http://lingua.mtsu.edu/chinese-computing/statistics/bigram/form.php

for some examples.

From a programming perspective, character collocation is not that different from multi-character (more or less equal to words/phrases) collocation. See

http://lingua.mtsu.edu/chinese-computing/concord/concordancer.php

for an online concordancer that is capable of both single- and multi-character collocation.
 
顶部