我近来在收集一些语料,主要是工科类论文,不过里面很多的公式,图表,我想问一下这个应该怎么处理比较好呢?还请高人指教,谢谢!
The usual practice in corpus creation is retain textual data while omitting such graphics and tables and replacing them with a "place holder" (e.g. an XML element indicating what has been omitted).
The usual practice in corpus creation is retain textual data while omitting such graphics and tables and replacing them with a "place holder" (e.g. an XML element indicating what has been omitted).