# 关于专门用途语料库的文本清理问题

#### ArthurW

Formulas are annoying for corpus building indeed. Since they lose their meaning in pure text, deleting them from the corpus text will be fine. However, this is not true if the theses are to be retrieved in more advanced, or multimodal ways. The answer to your question depends on the nature of your investigation, and the extent to which those formulas are significant for you.

#### Byron B. Chang

I would replace them with a symbol, for instance, "FML" . This will preserve the integrity of sentence structure as formulars are often part of a sentence.