AntCorGen: 新发布的语料采集工具

AntCorGen (Windows)
Build 1.0.1 (Released June 5, 2017)

AntCorGen is a freeware corpus generation tool. AntCorGen lets you search for documents in the PLOS ONE research database via search queries and/or subject category browsing and decide which sections (e.g. title, abstract, introduction) of these documents should be stored. AntCorGen then accesses the database, downloads the sections, and saves each one as a text file in an appropriate folder. AntCorGen can also analyze the different parts of speech (e.g. adjectives, verbs) of words in the files and cluster similar sentences into sub-groups. These sub-groups will show similar patterns of language use.

AntCorGen runs on any computer running Microsoft Windows (tested on Win 7), Macintosh OS X (tested on OS X 10.9 Mavericks), and Linux (tested on Linux Mint 17) computers. It is developed in Python and Qt using the PyInstaller compiler to generate executables for the different operating systems.