ParaConc: Concordancer for Multilingual Corpora

xujiajin · 2006-01-08

ParaConc: Concordance Software for Multilingual Parallel Corporahttp://forum.corpus4u.org/upload/forum/2006010819383527.pdf
1. Alignment
The successful searching and analysis of parallel texts
depends on the presence of aligned text segments in each
language corpus (and, of course, on the availability of
parallel corpora). The alignment, an indication of
equivalent text segments in the two languages, typically
uses the sentence unit as the basic alignment segment,
although naturally such an alignment is not one in which
each sentence of Language A is always aligned with a
sentence of Language B throughout the texts, since
occasionally a sentence in Language A may, for example,
be equivalent to two sentences in Language B, or perhaps
absent from Language B altogether. (More difficult
problems arise in cases where the translation of one
sentence in Language A is distributed over several
sentences in Language B.) The size of the aligned
segments is not set by the software, however. It would be
possible to work with paragraphs as the basic alignment
unit, but then the results of a search will be more
cumbersome because the translation of a word or phrase
will be embedded within a large amount of text, which is
especially difficult in cases in which the language is not
well-known.
The alignment utility in ParaConc is semi-automatic.
When files are loaded, the user enters information about
the format of the files either through reference to SGML
tags or via specifications of patterns. The user specifies
the form of headings and the form of paragraphs.
ParaConc uses the information to align the documents at
this level and the user can make adjustments by
merging/splitting units, as appropriate. Sentence level
alignment, if it is not indicated by SGML tags, is performed
using the Gale-Church algorithm (Gale and Church,

noel · 2006-01-08

patricx · 2006-01-08

any body can tell us something about "aliged corpus" function in the registered edition

zephyr · 2006-05-24

who can tell me where to download the registered edition of paraconc? the link here does not work.
thanks

laohong · 2006-05-24

The registered version is not available online to download. You need pay the developer (US$99 software + US$15 shipping fee), then they will deliver it you. However, if you don't want to pay it, you can still use the trial verison. Here is the link to download the demo version (In the demo the number of hits is restricted to 150 and the results cannot be saved or printed) :

http://www.athel.com/ParaConcDemo.zip

xudekuan · 2006-05-24

but it seems unable to deal with Chinese sentence markers such as ！。？when aligning. Maybe it is a bug. It can only deal with English sentence markers .?

xujiajin · 2006-05-24

One way out is to replace all Chinese punctuations with similar English punctuations which are recognizable by ParaConc.

laohong · 2006-05-24

I don't have that Chinese punctuations problem. Maybe you need check the settings and language encoding of your files.

xiaoz · 2006-05-25

Convert the character encoding into UTF8 and and select UTF8 when you load the corpus.

xudekuan · 2006-05-25

Thank you all for kind response of my question.
Actually, the software is useful not only in different-language parallel corpora, but in same language parallel corpora.

xudekuan · 2006-05-25

Is there any simple way or software to convert character encoding into UTF-8 encoding?

xiaoz · 2006-05-25

Multilingual Corpus Tool (MLCT) available at this site.

xudekuan · 2006-05-25

Thank u dear Xiao, you are always so prompt in answering questions.

laohong · 2006-05-25

回复：ParaConc: Concordancer for Multilingual Corpora

Here is another one for you, small (only 41 kb) and useful. After you download it, pls change the file name from ChineseEncodingConverter.pdf to ChineseEncodingConverter.jar, then double click it to run.
http://forum.corpus4u.org/upload/forum/2006052521360891.pdf

xudekuan · 2006-05-26

thank laohong very much, but in my paraconc (the official edition) the box for UTF-8 is grey. i don't know why.

xudekuan · 2006-05-26

回复：ParaConc: Concordancer for Multilingual Corpora

xudekuan · 2006-05-26

回复：ParaConc: Concordancer for Multilingual Corpora

xiaoz · 2006-05-26

When you load files, select files in the text box, UTF checkbox will be available.

laohong · 2006-05-26

Yes, Xiaoz is right.

xujiajin · 2006-08-23

What else have you found about the use of ParaConc besides parallel concordancing?

ParaConc: Concordancer for Multilingual Corpora

管理员

sidere mens eadem mutato

高级会员

大巫师

管理员

Moderator

管理员

管理员

永远的超级管理员

Moderator

Moderator

永远的超级管理员

Moderator

管理员

Moderator

Moderator

Moderator

永远的超级管理员

管理员

管理员