回复: 请教一个用PowerGREP检索汉语的问题
开个玩笑,别生气。 如果你阅读了PowerGrep的说明书,就应该知道,即使它可以检索Word、Excel等文档,也只是有限的,而且也是在背后转化成纯文本文件后检索的(虽然用户在使用时并不知道这一点)。检索汉语文本就更难了。既然这样,直接用纯文本文件检索就行了,没必要折腾自己。参见下面内容:
PowerGREP is primarily designed to work with plain text files. Plain text files include plain text documents, HTML and XML files, software source code, comma-delimited and other data files, configuration files, etc. When searching through tagged formats such as HTML, PowerGREP does not make any difference between the HTML tags and the text that you would see when viewing the HTML file in a browser. It's all plain text.
Many applications store their data in binary files. These files often use file formats that are proprietary to the developer of the application. Such files generally cannot be processed by software other than the software that created them.
PowerGREP can decode a small number of proprietary file formats. Prior to searching, PowerGREP will create a textual representation of the file's contents. That textual representation is then searched through.
PowerGREP can currently decode Microsoft Word documents (*.doc and *.dot files), Excel spreadsheets (*.xls files), and PDF files (*.pdf). PowerGREP also has partial support for Quattro Pro spreadsheets and Lotus 1-2-3 spreadsheets. "Partial support" means that not all versions of these file formats can be decoded by PowerGREP. PowerGREP cannot encode any of these formats. This means you can run "find files", "display search matches" and "collect data" actions on these files, but not search-and-replace actions.