The ACCURAT project (http://www.accurat-project.eu/) is pleased to announce the release of ACCURAT Toolkit - a collection of tools for comparable corpora collection and multi-level alignment and information extraction from comparable corpora.
By using the ACCURAT Toolkit, users may obtain:
- Comparable corpora from the Web (current news corpora, filtered Wikipedia corpora, and narrow domain focussed corpora);
- Comparable document alignments;
- Semi-parallel sentence/phrase mapping from comparable corpora (for SMT training purposes or other tasks);
- Translated terminology extracted and mapped from bilingual comparable corpora;
- Translated named entities extracted and mapped from bilingual comparable corpora.
The toolkit is open source and freely available. It can be downloaded from the ACCURAT Web Site at http://www.accurat-project.eu/ under the terms of the Apache 2.0 licence.
The ACCURAT project has received funding from the European Community’s Seventh Framework Programme (FP7/2007-2013) under Grant Agreement n° 248347.
=-=-=-= REFERENCES =-=-=-=
ACCURAT D2.6 2012. Toolkit for multi-level alignment and information extraction from comparable corpora. (http://www.accurat-project.eu/uploa...n extraction from comparable corpora v3.0.pdf).
ACCURAT D3.5 2012. Tools for building comparable corpus from the Web. (http://www.accurat-project.eu/uploa...lding comparable corpus from the Web v3.0.pdf).
Pinnis, M., Ion, R., ?tef?nescu, D., Su, F., Skadi?a, I., Vasi?jevs, A., & Babych, B. (2012). ACCURAT Toolkit for Multi-Level Alignment and Information Extraction from Comparable Corpora. Proceedings of the ACL 2012 System Demonstrations (pp. 91–96). Association for Computational Linguistics. Jeju, South Korea.
By using the ACCURAT Toolkit, users may obtain:
- Comparable corpora from the Web (current news corpora, filtered Wikipedia corpora, and narrow domain focussed corpora);
- Comparable document alignments;
- Semi-parallel sentence/phrase mapping from comparable corpora (for SMT training purposes or other tasks);
- Translated terminology extracted and mapped from bilingual comparable corpora;
- Translated named entities extracted and mapped from bilingual comparable corpora.
The toolkit is open source and freely available. It can be downloaded from the ACCURAT Web Site at http://www.accurat-project.eu/ under the terms of the Apache 2.0 licence.
The ACCURAT project has received funding from the European Community’s Seventh Framework Programme (FP7/2007-2013) under Grant Agreement n° 248347.
=-=-=-= REFERENCES =-=-=-=
ACCURAT D2.6 2012. Toolkit for multi-level alignment and information extraction from comparable corpora. (http://www.accurat-project.eu/uploa...n extraction from comparable corpora v3.0.pdf).
ACCURAT D3.5 2012. Tools for building comparable corpus from the Web. (http://www.accurat-project.eu/uploa...lding comparable corpus from the Web v3.0.pdf).
Pinnis, M., Ion, R., ?tef?nescu, D., Su, F., Skadi?a, I., Vasi?jevs, A., & Babych, B. (2012). ACCURAT Toolkit for Multi-Level Alignment and Information Extraction from Comparable Corpora. Proceedings of the ACL 2012 System Demonstrations (pp. 91–96). Association for Computational Linguistics. Jeju, South Korea.