[求助]汉语语料分时期比较如何获得key words?

yinghuang

高级会员
请教:我想比较三个时期的语言使用情况。语料都做了切分。但苦于没有reference corpus,请问我能否用一个时期的语料与三个时期总的语料比较来获得key words?
 
Try LCMC as a reference.
http://bowland-files.lancs.ac.uk/corplang/cgi-bin/conc.pl
 
Many thanks, Dr. Xu. But the thing is that my data span over more than 50 years from 1949 onward till now. It seems LCMC is almost around early 1990s. Is it good for a reference? I've no idea now.
 
It is not clear how large your target corpus for each period is. A reference is usually much larger than a target corpus.

Unless you have direct accesss to a balanced Chinese corpus covering a comparable period - e.g. the corpus developed by the National Language Committee of China - you do not have many choices.

If your target corpus for each of the three periods is less than one million tokens, you can compare your target corpora with LCMC. Because the same reference corpus is used to extract keywords for the three periods, the keywords extracted are comparable and your decision can be easily justified.
 
Back
顶部