西班牙语语料库资源

xujiajin

管理员
Staff member
#1
http://www.lllf.uam.es/ING/index.html Universidad Autonoma de Madrid

http://www.lllf.uam.es/ING/Recursos.html
RESOURCE DESCRIPTION

CORLEC

Reference Spoken Corpus of Contemporary Spanish Text database (spoken language corpus): 1.100.000 words.

Reference Corpus of Spanish in Argentina Text database (written language corpus): more than 2.000.000 words

Reference Corpus of Spanish in Chile Text database (written language corpus): 2.000.000 words

Spanish Treebank Corpus 1.500 sentences from newspapers, syntactically annotated
C-ORAL-ROM Multilingual Spoken Corpus (Spanish, French, Portuguese and Italian) with 300.000 words in each language
CHIEDE

Spontaneous Child Language Corpus of Spanish Spoken child language corpus Corpus with 60.000 words
Spanish Learner Oral Corpus Interlanguage oral corpus of speech of learners of Spanish (over 50.000 words).
French Learner Oral Corpus Interlanguage oral corpus of speech of learners of French (over 61.000 words).
GRAMPAL Morphosyntactic tagger
Arabic-Spanish Corpus Arabic-Spanish Parallel Corpus with 1179 sentences
Spanish-French Dictionary Dictionary of french prepositions
JAPANESE Japanese Spoken Corpus with 50.000 words and dictionary of the 800 basic japanese words with sound
MAVIR Corpus Spoken Corpus made up of the lectures from the MAVIR Conferences.
Acoustic database of questions Collection of spoken questions compiled after the participation in CLEF
Arabic morphological parser Demo

其中特别值得一提的是grampal标注工具。http://cartago.lllf.uam.es/grampal/grampal.cgi?m=xml
 
顶部