现成的Access格式语料该怎么分析

martinlu

初级会员
[face=宋体][/face]我手头现有一个现成的语料库,是用access软件做的,该用什么方法作语料分析呢.请高手给支个招阿

[本贴已被 xujiajin 于 2006年02月16日 21时17分43秒 编辑过]
 
给点例子吧,不然搞不准情况。贴几行或抓个图贴上来也行。
 
我这里也有这样的东西。贴张图烦请高人指点。


[本贴已被 作者 于 2006年02月17日 12时58分09秒 编辑过]
2006021713063649.jpg



[本贴已被 作者 于 2006年02月17日 13时07分39秒 编辑过]
 
I think it is better to convert the ACCESS database into plain text corpus.
 
回复:现成的Access格式语料该怎么分析

这种语料加工不深,没有分词,也没有进行词性标注,尚属粗语料,也许可以用来查找一些字串的使用频率,好像在别的方面作用不是很大,当然,这还要看研究者的目标来定。总的来说,这种格式的语料库难以用别的语料分析软件来处理,用途难以广泛。

[本贴已被 作者 于 2006年02月17日 23时43分35秒 编辑过]
 
Xiao is right, it's better to convert the files into plain text format. Here is a simple method in brief:

1. In Access, click File, Export, then choose your file name and save as type, say, Microsoft Excel 97-2003. Then you'll get an Excel format of file.

2. Open the file with Excel, then click File, Save as, and choose your file name and save as type, say, Text (Tab Delimited), you'll get a plain text file with several columns.

3. Now you can use POS tagger to tag it or use Concordancer and other corpus tools to do your corpus analysis.
 
You may also want to have a try with this handy tool to convert your access files to plain texts.

mdb2txt

Full Description:

mdb2txt version 1.292 can quickly export text data from Microsoft Access 2/95/97/2000 mdb files. You can select multiple MDBs and multiple tables/queries to be exported all at once, with common text file specifications. Export data in delimited or fixed formats. Control options for field, text and line, delimiters. Select date and Boolean formats for exporting. Convert text to upper or lower case. Take advantage of the defined Input Masks in each table definition.Convert null numerical values to 0, or any other character. Export tables, system tables, linked tables and queries. Save settings in Profiles so that they can be used over and over again. Run mdb2txt from the command line as well. Export data from user/group and password secured databases. Export field names as a header row in the text file. You will be able to export huge quantities of data from multiple MDBs with only a few mouse clicks. Download the free, fully functional 45-day trial to evaluate it for yourself!


http://www.digibuy.com/cgi-bin/product.html?98636937341

[本贴已被 作者 于 2006年02月18日 11时00分45秒 编辑过]
 
thanks a lot, laohong! I have sent one copy of each ACCESS-based SIDAMINGZHU to our c-pal's gmail account. Pls feel free to go and pick up what u need.Have a nice day! Cheers!
 
回复:

Hi, I got it and here is a test with the method metioned above.

The plain text format of Sanguo (converted from Access file):

2006021811465815.jpg


A snapshot of the concordances of 并 with a simple Chinese Concordancer:

2006021811502411.jpg
 
BTW, the figure with {p} in the concordances are the paragraph numbers -- the IDs in the first column of the original Access file.
 
Back
顶部