A Data-driven Approach to the Mental Lexicon

xujiajin · 2005-08-12

http://www.ihp.sinica.edu.tw/~bihp/69/69.1/huang.html
A Data-driven Approach to the Mental Lexicon: Two Studies on Chinese Corpus Linguistics

Chu-Ren Huang, Kathleen Ahrens, and Keh-jiann Chen

In this paper, we attempt to show i) that corpora offer real instances of language use (production) in a non-controlled environment, ii) that corpora constitute of a large sampling of the real input to linguistic perception, and iii) that corpora extracted from mass media represent the shared linguistic information of the language-speaking community.

Corpus-based studies are studies of linguistic theories based on linguistic objects (instead of on non-linguistic acts like naming, picture pointing, story-telling, or making decisions on yes-no questions.) We use two corpus-based studies to show that they can complement the traditional psychology-oriented studies based on controlled experiments. The two studies shed important light on the psychological reality of the notion of a word in the mental lexicon.

Our first study examines the definition of compounds based on M.I. (mutual information) values extracted from a corpus. We show that this empirically based definition of compounds easily resolves the previous controversies involving intuitive judgements (e.g. Bates et al. 1992 and 1993, and Zhou et al. 1993).

The second study involves the complex cognitive process of suo1xie3 (abbreviation) and a simple statistical model. We show that while a rule-based model can only capture incomplete aspects of Chinese abbreviation, corpus-based statistical values nicely reflect their status in the mental lexicon.

In conclusion, we argue that corpora reflect shared uses of language and are efficient tools for establishing baseline facts in (psycho-/neuro-)linguistic research.

Keywords: Mental lexicon, Corpus, Word, Mutual information, Abbreviation

　

由Z料出lC心理~臁―hZZ料煺Z言W研究二例

S居仁 ¤ 安可思 ¤ 克健

中央研究院Z言所 ¤ 立台炒W外文系 ¤ 中央研究院Y所

本文D由Z料著手硖剿髡Z言之心理真性。鹘y研究是以橐。@心理或X神Z言W研究m然得到了不少突破。但仍有其限制。首先室迫使受者在受控制的非自然h境中使用Z言；其次的O往往只限於少句子；最後限於受者注意力的限制，Z句限制L度而缺乏自然的上下文Z境。本文J榇罅空Z料除可a足上述方法之不足，且可表F出Z言的心理真性。

以Z料焯剿餍睦碚性的前提有三：一、Z料焯峁┝嗽谧匀画h境下Z言使用（生成）的例。二、Z料煺好也代表了日常Z言辨Rο蟮拇罅咳印Ｈ、m抽取的Z料正可以呈F使用Z言的人所共有的Z法知R。

文中了研究，@研究均是根中央研究院F代hZZ料榛A。第一研究探中文的}合~，第二研究探中文特殊的~F象――「s」。@研究都支持了一基本假O――即「~」@^念在hZ的心理~熘械拇_存在而且可以利用Z料熨Y料判x。也就是fZ料旆从沉苏Z言的心理F象，可提供了我由Y料入手研究Z言真性的另一蹊健

PI~：心理~ ~ Z料互息

majorlv511 · 2005-08-13

台湾的??香港的??

xiaoz · 2005-08-14

Academia Sinica, Taipei.

A Data-driven Approach to the Mental Lexicon

xujiajin

管理员

majorlv511

高级会员

xiaoz

永远的超级管理员