NLTK, the Natural Language Toolkit, is a suite of Python libraries and programs for symbolic and statistical natural language processing.
Version 0.6.4 has been released, and can be downloaded from
http://nltk.sourceforge.net/
CONTENTS
Software Modules: corpus readers, tokenizers & stemmers, taggers (regexp, n-gram, backoff, Brill, HMM), parsers (recursive descent, shift-reduce, chart, probabilistic, ...), clusterers (EM, k-means, ...), probability distributions, chatbots, demonstrations, ...
Corpora and Corpus Samples: Brown Corpus, CMU Pronunciation Dictionary, CoNNL-2000, Genesis, Gutenberg, IEER, Presidential Addresses, Names, PP-Attachment, Senseval 2, TIMIT, Treebank, Words
Documentation: Tutorials and exercises (161pp), API documentation for all software modules, installation instructions for Windows, Mac, Unix.
Version 0.6.4 has been released, and can be downloaded from
http://nltk.sourceforge.net/
CONTENTS
Software Modules: corpus readers, tokenizers & stemmers, taggers (regexp, n-gram, backoff, Brill, HMM), parsers (recursive descent, shift-reduce, chart, probabilistic, ...), clusterers (EM, k-means, ...), probability distributions, chatbots, demonstrations, ...
Corpora and Corpus Samples: Brown Corpus, CMU Pronunciation Dictionary, CoNNL-2000, Genesis, Gutenberg, IEER, Presidential Addresses, Names, PP-Attachment, Senseval 2, TIMIT, Treebank, Words
Documentation: Tutorials and exercises (161pp), API documentation for all software modules, installation instructions for Windows, Mac, Unix.