Chinese Gigaword Fifth Edition

xujiajin

管理员
Staff member
Chinese Gigaword Fifth Edition was produced by LDC. It is a comprehensive archive of newswire text data that has been acquired from Chinese news sources by LDC at the University of Pennsylvania. Chinese Gigaword Fifth Edition includes all of the content of the fourth edition of Chinese Gigaword (LDC2009T27) plus new data covering the period from January 2009 through December 2010.

Eight distinct sources of Chinese newswire are represented here:

· Agence France Presse(afp_cmn)
· Central News Agency, Taiwan(cna_cmn)
· Central News Service(cns_cmn)
· Guangming Daily(gmw_cmn)
· People's Daily(pda_cmn)
· People's Liberation Army Daily(pla_cmn)
· Xinhua News Agency(xin_cmn)
· Zaobao Newspaper(zbn_cmn)

The seven-letter codes in the parentheses above are used for the directory names and data files for each source. Articles covering the period from January 2009 through December 2010 have been added to the Agence France Presse, Central News Agency (CNA), Central News Service, Guangming Daily, People's Liberation Army Daily and Xinhua News Agency data sets. The data from People's Daily covers the period from late June 2009 through December 2010. No new data from Zaobao has been added.
 
Back
顶部