ParaConc: Concordancer for Multilingual Corpora

xujiajin

管理员
Staff member
ParaConc: Concordance Software for Multilingual Parallel Corporahttp://forum.corpus4u.org/upload/forum/2006010819383527.pdf
1. Alignment
The successful searching and analysis of parallel texts
depends on the presence of aligned text segments in each
language corpus (and, of course, on the availability of
parallel corpora). The alignment, an indication of
equivalent text segments in the two languages, typically
uses the sentence unit as the basic alignment segment,
although naturally such an alignment is not one in which
each sentence of Language A is always aligned with a
sentence of Language B throughout the texts, since
occasionally a sentence in Language A may, for example,
be equivalent to two sentences in Language B, or perhaps
absent from Language B altogether. (More difficult
problems arise in cases where the translation of one
sentence in Language A is distributed over several
sentences in Language B.) The size of the aligned
segments is not set by the software, however. It would be
possible to work with paragraphs as the basic alignment
unit, but then the results of a search will be more
cumbersome because the translation of a word or phrase
will be embedded within a large amount of text, which is
especially difficult in cases in which the language is not
well-known.
The alignment utility in ParaConc is semi-automatic.
When files are loaded, the user enters information about
the format of the files either through reference to SGML
tags or via specifications of patterns. The user specifies
the form of headings and the form of paragraphs.
ParaConc uses the information to align the documents at
this level and the user can make adjustments by
merging/splitting units, as appropriate. Sentence level
alignment, if it is not indicated by SGML tags, is performed
using the Gale-Church algorithm (Gale and Church,
 
who can tell me where to download the registered edition of paraconc? the link here does not work.
thanks
 
The registered version is not available online to download. You need pay the developer (US$99 software + US$15 shipping fee), then they will deliver it you. However, if you don't want to pay it, you can still use the trial verison. Here is the link to download the demo version (In the demo the number of hits is restricted to 150 and the results cannot be saved or printed) :

http://www.athel.com/ParaConcDemo.zip
 
but it seems unable to deal with Chinese sentence markers such as !。?when aligning. Maybe it is a bug. It can only deal with English sentence markers .?
 
One way out is to replace all Chinese punctuations with similar English punctuations which are recognizable by ParaConc.
 
I don't have that Chinese punctuations problem. Maybe you need check the settings and language encoding of your files.
 
Convert the character encoding into UTF8 and and select UTF8 when you load the corpus.
 
Thank you all for kind response of my question.
Actually, the software is useful not only in different-language parallel corpora, but in same language parallel corpora.
 
thank laohong very much, but in my paraconc (the official edition) the box for UTF-8 is grey. i don't know why.
 
回复:ParaConc: Concordancer for Multilingual Corpora

 
When you load files, select files in the text box, UTF checkbox will be available.
 
Back
顶部