X xiaoz 永远的超级管理员 Staff member 2005-12-11 #3 That's because the plain text uses ASCII code which cannot deal with accented characters in such languages. Saving webpages as text files encoded in Unicode will solve the problem.
That's because the plain text uses ASCII code which cannot deal with accented characters in such languages. Saving webpages as text files encoded in Unicode will solve the problem.
P patricx 高级会员 2005-12-12 #4 我是用HTML2TXT4.0转换的,我刚才查了,正是ASCII码。Dr.xiaoz说的是直接用工具把HTML转成以Unicode编码的TXT吗?如果是的,用什么工具呢?还是再把ASCII码的TXT进行转换呢?谢谢两位博士的及时指点
我是用HTML2TXT4.0转换的,我刚才查了,正是ASCII码。Dr.xiaoz说的是直接用工具把HTML转成以Unicode编码的TXT吗?如果是的,用什么工具呢?还是再把ASCII码的TXT进行转换呢?谢谢两位博士的及时指点
ineedgerf 普通会员 2005-12-12 #5 可能你不喜欢文本编辑器,其实它能做很多工作。就语料库来讲,文本编辑器胜过Word这类东西。 不推荐使用html2txt之类的东西。 不过,盘外符号象一些法语字母等尽量使用“实体”,这是比较保险的做法。