English compound finding tool Perl script

xujiajin · 2011-01-31

I wrote a Perl program, find-compounds.pl, to find the longest compound words of the text.
It is part of the Text-NSP package. The following link is the description.

http://search.cpan.org/~tpederse/Text-NSP-1.21/bin/utils/find-compounds.pl

The original text contains "This is the new york city". In the compound word list, it has

new_york
new_york_city

The find-compounds.pl will find the longest match. After replace the compound words, the text is "This is the new_york_city".

This code needs to input an offline ready list of the compound words you are interested in.
The output is the text file with compound words replaced. In order to pick out the sentences
which contain the compound words, you need to further process the output text. Hope this helpful.

Thanks,
Ying

Quote from Corpora List

volfer · 2011-01-31

回复: English compound finding tool Perl script

Thank you very muh for sharing.

English compound finding tool Perl script

xujiajin

管理员

volfer

Moderator