Mark-up (or markup) versus Annotation
Cited and adapted from http://lingo.lancs.ac.uk/devotedto/corpora/help.htm
David Lee's Bookmarks for Corpus-based Linguists
Mark-up = tags (added character strings) used to code the structural or surface format/renditional attributes of a text (e.g., headings, sections, page breaks, sentences, bold/italics, speaker ID, speaker turns, pauses), OR non-interpreted aspects of the situated context of the discourse (e.g. bibliographical or demographic details about the author or speaker, location of speech event, genre, etc., and also gestures, laughter, voice quality, and events such as "writes on blackboard"). In HTML/SGML/XML (mark-up languages), mark-up is always within angled brackets.
按照IT的术语,mark-up常常被译作“标记”(因为术语翻译的混乱我想我们还是直接用英文吧)。从上面的定义来看,mark-up主要是指伴随语料本身的一些基本信息,如,文本书写、编辑中的段落、语言使用者的身份、性别等社会特征,以及伴随口语语料的情境因素和副语言特征等的标记。这些特征信息常常写在语料库的头文件,或者在文中相应的地方以箭头括号标出,见下例。
CLEC example:
<ST 2> <SEX ?><Y ?> <SCH GDWYWMDXFSWYXX> <AGE ?> <WAY ?><DIC ?> <TYP 2>
Cited and adapted from http://lingo.lancs.ac.uk/devotedto/corpora/help.htm
David Lee's Bookmarks for Corpus-based Linguists
Mark-up = tags (added character strings) used to code the structural or surface format/renditional attributes of a text (e.g., headings, sections, page breaks, sentences, bold/italics, speaker ID, speaker turns, pauses), OR non-interpreted aspects of the situated context of the discourse (e.g. bibliographical or demographic details about the author or speaker, location of speech event, genre, etc., and also gestures, laughter, voice quality, and events such as "writes on blackboard"). In HTML/SGML/XML (mark-up languages), mark-up is always within angled brackets.
按照IT的术语,mark-up常常被译作“标记”(因为术语翻译的混乱我想我们还是直接用英文吧)。从上面的定义来看,mark-up主要是指伴随语料本身的一些基本信息,如,文本书写、编辑中的段落、语言使用者的身份、性别等社会特征,以及伴随口语语料的情境因素和副语言特征等的标记。这些特征信息常常写在语料库的头文件,或者在文中相应的地方以箭头括号标出,见下例。
CLEC example:
<ST 2> <SEX ?><Y ?> <SCH GDWYWMDXFSWYXX> <AGE ?> <WAY ?><DIC ?> <TYP 2>