请教一个用PowerGREP检索汉语的问题

刚在学习PowerGREP,碰到了一个简单检索的问题:
用此软件在.txt或.doc文件中检索英文很正常,在.txt文件中检索中文也很正常,但就是不能在.doc文件中检索中文,不知道是我没掌握用法还是其它原因。
请用过的前辈帮忙,谢谢。
 
回复: 请教一个用PowerGREP检索汉语的问题

.doc 内部含有 “肉眼看不见的” (格式)字符,最好还是转换为纯文本(.txt)文档吧。
 
回复: 请教一个用PowerGREP检索汉语的问题

.doc 内部含有 “肉眼看不见的” (格式)字符,最好还是转换为纯文本(.txt)文档吧。
谢谢回复。
当然如果是对正规语料检索的话肯定还是要转成.txt的,现在我只是想看看能不能查找.doc文件中的汉字。
 
回复: 请教一个用PowerGREP检索汉语的问题

谢谢回复。
当然如果是对正规语料检索的话肯定还是要转成.txt的,现在我只是想看看能不能查找.doc文件中的汉字。

这位兄弟闲得荒,先问大家iamwho,然后非要不“正规地”查找.doc文件中的汉子……
 
回复: 请教一个用PowerGREP检索汉语的问题

还真是的,pdf也没问题,不解
 
Last edited:
回复: 请教一个用PowerGREP检索汉语的问题

这位兄弟闲得荒,先问大家iamwho,然后非要不“正规地”查找.doc文件中的汉子……

老大就是老大,看问题准。闲得慌了才会无事生非嘛
 
回复: 请教一个用PowerGREP检索汉语的问题

老大就是老大,看问题准。闲得慌了才会无事生非嘛

开个玩笑,别生气。 如果你阅读了PowerGrep的说明书,就应该知道,即使它可以检索Word、Excel等文档,也只是有限的,而且也是在背后转化成纯文本文件后检索的(虽然用户在使用时并不知道这一点)。检索汉语文本就更难了。既然这样,直接用纯文本文件检索就行了,没必要折腾自己。参见下面内容:

PowerGREP is primarily designed to work with plain text files. Plain text files include plain text documents, HTML and XML files, software source code, comma-delimited and other data files, configuration files, etc. When searching through tagged formats such as HTML, PowerGREP does not make any difference between the HTML tags and the text that you would see when viewing the HTML file in a browser. It's all plain text.

Many applications store their data in binary files. These files often use file formats that are proprietary to the developer of the application. Such files generally cannot be processed by software other than the software that created them.

PowerGREP can decode a small number of proprietary file formats. Prior to searching, PowerGREP will create a textual representation of the file's contents. That textual representation is then searched through.

PowerGREP can currently decode Microsoft Word documents (*.doc and *.dot files), Excel spreadsheets (*.xls files), and PDF files (*.pdf). PowerGREP also has partial support for Quattro Pro spreadsheets and Lotus 1-2-3 spreadsheets. "Partial support" means that not all versions of these file formats can be decoded by PowerGREP. PowerGREP cannot encode any of these formats. This means you can run "find files", "display search matches" and "collect data" actions on these files, but not search-and-replace actions.
 
回复: 请教一个用PowerGREP检索汉语的问题

呵呵,没有生气,到了这个岁数已经不容易激动了。
谢谢,我也看了这个说明,看来是不能直接检索doc中的汉字。
 
Back
顶部