Haiyang Ai
Administrator
Chinese Gigaword Release Second Edition is a comprehensive archive
of newswire text data in Chinese that has been acquired over several
years by the LDC. This release includes all of the contents in the first
release of the Chinese Gigaword corpus (LDC2003T09), material from
one new source, as well as new materials from the other two sources.
Thus, the corpus contains three distinct international sources of Chinese
newswire - Central News Agency, Taiwan, Xinhua News Agency, and
Zaobao. Some minor updates to the documents from the first release
have been made.
http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2005T14
of newswire text data in Chinese that has been acquired over several
years by the LDC. This release includes all of the contents in the first
release of the Chinese Gigaword corpus (LDC2003T09), material from
one new source, as well as new materials from the other two sources.
Thus, the corpus contains three distinct international sources of Chinese
newswire - Central News Agency, Taiwan, Xinhua News Agency, and
Zaobao. Some minor updates to the documents from the first release
have been made.
http://www.ldc.upenn.edu/Catalog/CatalogEntry.jsp?catalogId=LDC2005T14