Enhancement of a Chinese Discourse Marker Tagger with C4.5
Benjamin K. T’sou1, Tom B. Y. Lai2, Samuel W. K. Chan3, Weijun Gao4, Xuegang Zhan5
1 2 3Language Information Sciences Research Centre
City University of Hong Kong
Tat Chee Avenue, Kowloon
Hong Kong SAR, China
Northeastern University, China
{1rlbtsou, 2cttomlai}@uxmail.cityu.edu.hk, 3swkchan@cs.cityu.edu.hk,
4wjgao@mail.neu.edu.cn, 5zxg@ics.cs.neu.edu.cn
http://www.rcl.cityu.edu.hk/publications/papers/ACL2000_Tsou.PDF
Abstract
Discourse markers are complex
discontinuous linguistic expressions which
are used to explicitly signal the discourse
structure of a text. This paper describes
efforts to improve an automatic tagging
system which identifies and classifies
discourse markers in Chinese texts by
applying machine learning (ML) to the
disambiguation of discourse markers, as an
integral part of automatic text summarization
via rhetorical structure. Encouraging results
are reported.
Keywords: discourse marker, Chinese
corpus, rhetorical relation, automatic tagging,
machine learning
Benjamin K. T’sou1, Tom B. Y. Lai2, Samuel W. K. Chan3, Weijun Gao4, Xuegang Zhan5
1 2 3Language Information Sciences Research Centre
City University of Hong Kong
Tat Chee Avenue, Kowloon
Hong Kong SAR, China
Northeastern University, China
{1rlbtsou, 2cttomlai}@uxmail.cityu.edu.hk, 3swkchan@cs.cityu.edu.hk,
4wjgao@mail.neu.edu.cn, 5zxg@ics.cs.neu.edu.cn
http://www.rcl.cityu.edu.hk/publications/papers/ACL2000_Tsou.PDF
Abstract
Discourse markers are complex
discontinuous linguistic expressions which
are used to explicitly signal the discourse
structure of a text. This paper describes
efforts to improve an automatic tagging
system which identifies and classifies
discourse markers in Chinese texts by
applying machine learning (ML) to the
disambiguation of discourse markers, as an
integral part of automatic text summarization
via rhetorical structure. Encouraging results
are reported.
Keywords: discourse marker, Chinese
corpus, rhetorical relation, automatic tagging,
machine learning