[讨论]中医英语语料库知多少？

foliage · 2005-10-08

首次登录，发现自己读研期间发表的一篇课程论文在此被提及，感觉十分亲切！前辈给出的批评意见也非常中肯！小女子刚刚走上工作岗位，承上海中医药大学外语中心李照国主任之命，着手准备中医英语语料库的设计工作，窘于手头资料有限，自觉底气不足啊。我只了解到国家科技部科技基础性工作专项资金项目正在建立《中医药学语言系统》（应该是汉语的）。谈建立中医英语语料库的意义及可行性的文章倒是有一些，不知国内是否已经建立起比较成熟、具有代表性的中医英语语料库？国外在这方面的研究情况不知哪位行家可以提供相关资源？
简单说说本人的初步构想：1、限于书面语。2、按语料作者国别分三个子库：中国人的中医英语60％＋英语母语者的中医英语20％＋中国人以外的非英语母语者的中医英语20％。3、文类构成参照上交大JDEST学术英语语料库：期刊25％＋教科书25％＋专著15％＋论文10％＋科普读物10％＋文摘5％＋手册、书评、新闻报道等10％。4、各子库包括中医基础理论、中医学说与相关学科、病因病机与诊断、疾病、治则治法、中药学、方剂学等16个中医药学一级学科分类的语料。5、容量初定100万词，日后开放性扩增。6、购买WordSmith Tools作为索引软件。
恳切希望得到各位老师的指点！另外，如果对这样一个语料库做出标注，有什么简单易行的工具可利用吗？非常感谢！

xujiajin · 2005-10-08

Ask yourself before you start: what is this corpus for?

xujiajin · 2005-10-09

Is a preset linguistic motivation for collecting speech data justifiable?
The second problem goes whether the sampling of the target speech data is to be theoretically motivated. A special-purpose corpus compilation is usually directed to a certain research objective, because it is not economical and practical to make a small corpus all-inclusive and all-embracing.

The speech data from fieldwork will ultimately be shaped by not only the language itself but by the research goals we aim to achieve. For instance, in situated adolescent spoken corpus, we want to investigate the discourse markers from the prosodic perspective. Therefore we need to record more casual talk, instead of formal speech or sociolinguistic interview. If the purpose is on the language of urban adolescent speakers, the sampling is confined to this particular type of population.
Some people would argue that it is myopic to limit the record to the data pertinent to issues of current theoretical interests, but we have to check our recording quantity. We cannot hope to anticipate all future needs (Mithun 2001:53), theory gives us much on methodological issues, helps us find finer things to look at. This problem again points to our discussion of the relationship between data and theory. It is not appropriate to say that we set the theoretical framework for natural data to fit it; it is economical in actual field research to include a general theoretical orientation of data collection.
Linguistics benefits when fieldworkers are doing more than merely gathering data for a theoretician to interpret (Everett forthcoming). We understand Everett as meaning linguistic theory modifies our corpus planning, narrows our categories of samples.
By linguistic motivation, generally we mean given the funding and energy we have, what priority should be given to certain genre or register of discourse. As in the Corpus of Situated Adolescent Speech, if our object of investigation is on phonetic and/or phonological aspects of discourse, we need to find less noisy settings so as to obtain higher quality audio recording.
In a sense, the identity of a corpus is shaped before it actually comes into being. A corpus is by its very nature a purpose-built linguistic databank.

tenyear · 2006-01-18

建设中医英语语料库的意义不大，原因非常Simple，语言不规范。用Concordance搜索得到的结果只是限于单个词，与一般性语料库无什么大的区别。关于此方面陕西中医学院的闻永毅老师已经作了三年，建成了一个模型库。他现在已经转向另外一个更有意义的方面了。你如果在此方面确实想做些工作，可以向他咨询。他的联系方式为yongyiwenson@gmail.com。

oscar3 · 2006-01-20

回复：[讨论]中医英语语料库知多少？

以下是引用 tenyear 在 2006-1-18 22:42:48 的发言：
建设中医英语语料库的意义不大，原因非常Simple，语言不规范。用Concordance搜索得到的结果只是限于单个词，与一般性语料库无什么大的区别。关于此方面陕西中医学院的闻永毅老师已经作了三年，建成了一个模型库。他现在已经转向另外一个更有意义的方面了。你如果在此方面确实想做些工作，可以向他咨询。他的联系方式为yongyiwenson@gmail.com。

本人不赞同您的观点。应该说任何专用语料库（specialized corpus）都有其价值。因为中医英语涉及一个非常特殊的专业领域，其语言特征既不是一般的通用语料库和一般的医学英语语料库所能涵盖的，又因为英语的使用者来自不同的语言和文化背景，所以其特征也不是一般的学习者语料库所能涵盖的。中医英语和一般的英语之间的区别不仅仅在词汇上，建库的目的也不仅仅是代替词典用于查询词汇，而是用于更加广泛的语言研究。因此，只要有明确的目标和严格的建库标准，中医英语语料库理应有其不可替代的价值。当然，对于个人来说拟订什么样的建库目标不仅仅要看理论依据，还要根据自己的实际条件来决定，因为建设语料库是一个需要高投入，费时，费力的工作。

[本贴已被作者于 2006年01月20日 10时03分55秒编辑过]

csli · 2006-01-21

从汉英翻译的角度，建库的意义不大，除非将中国人的中医英语60％＋英语母语者的中医英语20％的比例重置为“中国人的中医英语40％＋英语母语者的中医英语40％”，i.e. go-Dutch。
另外，许博士的那句话你想好了没有："Ask yourself before you start: what is this corpus for?"

[讨论]中医英语语料库知多少？

foliage

初级会员

xujiajin

管理员

xujiajin

管理员

tenyear

初级会员

oscar3

高级会员

csli

初级会员