求助!想分析联合结构,如“里外”的分布规律,使用哪个语料库更合适呢

如题:
本人是外国语言学及应用语言学专业的,不太熟悉汉语语料库,看到这里有几种选择,有些无措。有劳各位高人!
想分析联合结构,如“里外”、“上下”等词的分布规律,使用哪个语料库更合适呢

不胜感激!
 
谢谢!觉得这些词都很口语化,所以想多从口语化材料中找。目前在北大CCL取材料,很快捷,操作也简单。
 
也可以尝试用richard xiao的LCMC,如果没有听说的话,可以在站内搜索一下子,关键是这个语料库是免费的,检索软件也可以免费获取,试试看吧。祝你好运!
 
回复: 求助!想分析联合结构,如“里外”的分布规律,使用哪个语料库更合适呢

如果只是研究分布规律和搭配的话,干吗不把现有的各种汉语语料库结合使用呢?LCMC标准、规范、合理,美中不足不是“足够大”,仍是百万词。
 
回复: 求助!想分析联合结构,如“里外”的分布规律,使用哪个语料库更合适呢

LCMC is small because it was designed as a Chinese match for FLOB.

We have now completed a sizeable balanced corpus of spoken and written Mandarin amounting to 50 million words (over 70 million Chinese characters), which is used in the Routledge Frequency Dictionary of Mandarin Chinese. The corpus is not currently open to the public.
 
Re: 回复: 求助!想分析联合结构,如“里外”的分布规律,使用哪个语料库更合适呢

LCMC is small because it was designed as a Chinese match for FLOB.

We have now completed a sizeable balanced corpus of spoken and written Mandarin amounting to 50 million words (over 70 million Chinese characters), which is used in the Routledge Frequency Dictionary of Mandarin Chinese. The corpus is not currently open to the public.

when is it available? by the way, dr xiao, what is the name of the corpus mentioned? is it still a Chinese counterpart of FLOB? thanks!
 
回复: 求助!想分析联合结构,如“里外”的分布规律,使用哪个语料库更合适呢

Obviously,according to Xiao, it is for the compilation of the Rougtlege Frequency Dictionary of Mandarin Chinese, not a counterpart of FLOB.I guess it won't be available to public earlier than the issue of RFDMC.
 
Re: 回复: 求助!想分析联合结构,如“里外”的分布规律,使用哪个语料库更合适呢

Obviously,according to Xiao, it is for the compilation of the Rougtlege Frequency Dictionary of Mandarin Chinese, not a counterpart of FLOB.I guess it won't be available to public earlier than the issue of RFDMC.

消息靠谱:D;)?谢谢了。
 
回复: 求助!想分析联合结构,如“里外”的分布规律,使用哪个语料库更合适呢

hittle2008 is right - it does not model FLOB. The corpus is a colletion of texts from four registers: spoken, fiction, non-fiction, and news.
 
Re: 回复: 求助!想分析联合结构,如“里外”的分布规律,使用哪个语料库更合适呢

hittle2008 is right - it does not model FLOB. The corpus is a colletion of texts from four registers: spoken, fiction, non-fiction, and news.

Thanks a lot, Dr. Xiao and Would-be-Dr. Hittle:p!
 
回复: 求助!想分析联合结构,如“里外”的分布规律,使用哪个语料库更合适呢

U R flattering me! Everyone knows the implication of "would-be" is "not yet".:eek:
 
Back
顶部