Formulas are annoying for corpus building indeed. Since they lose their meaning in pure text, deleting them from the corpus text will be fine. However, this is not true if the theses are to be retrieved in more advanced, or multimodal ways. The answer to your question depends on the nature of your investigation, and the extent to which those formulas are significant for you.