发一些在WECCL 说明书中没有注明的语料赋码

flycap

论坛混混
在最近的分析中发现,WECCL的说明书中给出的语料赋码并不全,一些由多个单词组成的结构采用了数字序列编码,但在说明书中没有注明,我把今天发现的部分摘录出来,待到全面分析完成后,再把全部未给出的代码发到论坛上。

<CS21> <CS22>
二词连词

<CS31> <CS32> <CS33>
三词连词

<DA2>
名词前限定词
例如:
MANY
SEVERAL

<II21> <II22>
双词介词
例如:

<II31> <II32> <II33>
三词介词
例如:
AS WELL AS
IN TOUCH WITH
IN FRONT OF

<II41> <II42> <II43> <II44>
四词介词
OUT OF TOUCH WITH

<PPX221> <PPX222>
双词人称代词
EACH OTHER

<REX21> <REX22>
双词引导同位结构副词
FOR EXAMPLE
THAT IS

<RGQV31> <RGQV32> <RGQV33>
三词强调程度副词
NO MATTER HOW

<RR21> <RR22>
双词普通副词
FOR EVER
AT ONCE
AT PRESENT
AT ALL
OF COURSE

<RR31> <RR32> <RR33>
三词普通副词
AND SO ON

<RRQV31> <RRQV32> <RRQV33>
WH-EVER普通副词中的三词结构
NO MATTER HOW

<RT41> <RT42> <RT43> <RT44>
四词准名词性时间副词
FOR THE TIME BEING
 
回复: 发一些在WECCL 说明书中没有注明的语料赋码

这叫ditto tag
NOTE: "DITTO TAGS"
Any of the tags listed above may in theory be modified by the addition of a pair of numbers to it: eg. DD21, DD22 This signifies that the tag occurs as part of a sequence of similar tags, representing a sequence of words which for grammatical purposes are treated as a single unit. For example the expression in terms of is treated as a single preposition, receiving the tags:

in_II31 terms_II32 of_II33

The first of the two digits indicates the number of words/tags in the sequence, and the second digit the position of each word within that sequence.

Such ditto tags are not included in the lexicon, but are assigned automatically by a program called IDIOMTAG which looks for a range of multi-word sequences included in the idiomlist. The following sample entries from the idiomlist show that syntactic ambiguity is taken into account, and also that, depending on the context, ditto tags may or may not be required for a particular word sequence:

at_RR21 length_RR22
a_DD21/RR21 lot_DD22/RR22
in_CS21/II that_CS22/DD1
 
回复: 发一些在WECCL 说明书中没有注明的语料赋码

感谢楼上兄弟的指点,坦率地说,对语料库我的认识并不多,这次写硕士论文也是我第一次在这个领域做比较系统的研究,希望各位高手多多指点。
 
回复: 发一些在WECCL 说明书中没有注明的语料赋码

我也要写关于formulaic sequences的内容,但没有具体的例证.请大家多多指教.
 
Back
顶部