[E-BOOK] SPEECH and LANGUAGE PROCESSING

laohong

管理员
Staff member
SPEECH and LANGUAGE PROCESSING
An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition

Second Edition

Last Update on Tuesday, June 19, 2007

cover-a.gif


By Daniel Jurafsky and James H. Martin
with contributions from:

Andrew Kehler (Chapter 18)
Keith Vander Linden (Chapter 20)
Nigel Ward (Chapter 21)

Prentice-Hall, 2000
ISBN: 0-13-095069-6

Chapter 1: Introduction
This chapter is largely the same with updated history and pointers to newer applications.

Chapter 2: Regular Expressions and Automata
This chapter is largely the same with some bug fixes.

Chapter 3: Words and Transducers
This new version of the chapter still focuses on morphology and FSTs, but is expanded in various ways. There are more details about the formal descriptions of finite-state transducers, many bugs are fixed, and two new sections are added relating to words and subwords. The first new section is on word and sentence tokenization, including algorithms for English as well as the maxmatch algorithm for Chinese word segmentation. The second new section is on spelling correction and minimum edit distance, and is an extended version of the edit-distance section from Chapter 5 of the first edition, with clearer figures for example for explaining the minimum-edit-distance backtrace.

Chapter 4: N-grams
This updated language model chapter has had a complete overhaul. This draft includes more examples, a more complete description of Good-Turing, expanded sections on practical issues like perplexity and evaluation, language modeling toolkits, including ARPA format, and an overview of modern methods like interpolated Kneser-Ney.

Chapter 5: Word Classes and Part-of-Speech Tagging
The main change to this revised chapter is a greatly expanded, and hence self-contained, description of bigram and trigram HMM part-of-speech tagging, including Viterbi decoding and deleted interpolation smoothing. Courses that don't include Chapter 7 (speech and HMMs) can now use this chapter to introduce HMM tagging in a self-contained way. Other changes in this chapter include expanded descriptions of unknown word modeling and part-of-speech tagging in other languages, and many bug fixes. Finally, we've moved this chapter earlier in the book and called it Chapter 5; it should be used after the FST chapter 3 and N-gram chapter 4.

Chapter 6
This new chapter introduces two sequence models: HMMs and MEMMs. It gives the details of Hidden Markov Models, including Forward, Viterbi, and EM. It then introduces MaxEnt models, begining with linear regression, followed by logistic regression, then the extension to MaxEnt, and finally the MEMM and the Viterbi intuition.

Chapter 7: Phonetics
This chapter is an introduction to articulatory and acoustic phonetics for speech processing, as well as foundational tools like the ARPAbet, wavefile formats, phonetic dictionaries, and PRAAT.

Chapter 8: Speech Synthesis
This is a new chapter on speech synthesis.

Chapter 9: Automatic Speech Recognition
This new significantly-expanded speech recognition chapter gives a complete introduction to HMM-based speech recognition, including extraction of MFCC features, Gaussian Mixture Model acoustic models, and embedded training.

Chapter 10: Speech Recognition: Advanced Topics
This new second chapter on speech recognition covers advanced topics like decision-tree clustering for context-dependent phones, advanced decoding (including n-best lists, lattices, confusion networks, and stack decoding), robustness (including MLLR adaptation), discriminative training, and human speech recognition.

Chapter 11: Computational Phonology
This chapter is a brief introduction to computational phonology, including phonological and morphological learning, finite-state models, OT, and Stochastic OT.

Chapter 12: Formal Grammars of English
This chapter still focuses on CFGs for English and includes a revamped and somewhat expanded grammar for the ATIS domain. New and expanded sections cover: treebanks with a focus on the Penn Treebank, searching treebanks with tgrep and tgrep2, heads and head-finding rules, dependency grammars, Categorial grammar, and grammars for spoken language processing.

Chapter 13: Parsing with Context-Free Grammars
The focus of this chapter is still on parsing with CFGs. It now includes sections on CKY, Earley and agenda-based (chart) parsing. In addition, there is a new section on partial parsing with a focus on machine learning based base-phrase chunking and the use of IOB tags.

Chapter 14: Lexicalized and Probabilistic Parsing
This statistical parsing chapter has been extensively revised. It now covers PCFGs, probabilistic CKY parsing, parent annotations, the Collins parser, and touches on advanced topics such as discriminative reranking and parsing for language modeling.

Chapter 17: Semantics
This chapter still covers basic notions surrounding meaning representation languages. It now has better coverage of model-theoretic semantics for meaning representations, and a new section on Description Logics and their role as a basis for OWL and its role in the Semantic Web.

Chapter 19: Lexical Semantics
This chapter still covers the basics of lexical semantics, including sense relations, semantic roles, and primitive decomposition. The treatment of semantic roles has been updated, as has the coverage of WordNet, and new sections added for PropBank and FrameNet.

Chapter 20: Computational Lexical Semantics
The focus of this new chapter is on computing with word meanings. The three main topics are word sense disambiguation, computing relations between words (similarity, hyponymy, etc.), and semantic role labeling. It considerably expands the treatment of these topics.

Chapter 21: Discourse
This rewritten chapter includes a number of updates to the first edition. The anaphora resolution section is updated to include modern log-linear methods, and a section on the more general problem of coreference is also included. The coherence section describes cue-based methods for rhetorical relation and coherence relation extraction. Finally, there is a significant new section on discourse segmentation (including TextTiling).

Chapter 23: Dialog and Conversational Agents
This is a completely rewritten version of the dialogue chapter. It includes much more information on modern dialogue systems, including VoiceXML, confirmation and clarification dialogues, the information-state model, markov decision processes, and other current approaches to dialogue agents.

Chapter 24: Machine Translation
The MT chapter has been extensively rewritten and a significant new section added covering statistical MT, including IBM Model 1, Model 3, and HMM alignment. A new evaluation section covering human evaluation and Bleu has also been added, as well as sections on SYSTRAN and more details on cross-linguistic divergences.


Download the book:
http://www.cs.colorado.edu/~martin/slp2.html
 
回复: [E-BOOK] SPEECH and LANGUAGE PROCESSING

一口气收了那么多宝贝.

谢谢师父!师父辛苦了:)
 
回复: [E-BOOK] SPEECH and LANGUAGE PROCESSING

"The MT chapter has been extensively rewritten and a significant new section added covering statistical MT, including IBM Model 1, Model 3, and HMM alignment. A new evaluation section covering human evaluation and Bleu has also been added, as well as sections on SYSTRAN and more details on cross-linguistic divergences. "
exactly.
Thanks, Lao Hong for the update information.
best regards
 
回复: [E-BOOK] SPEECH and LANGUAGE PROCESSING

merciiiii, laohong
 
回复: [E-BOOK] SPEECH and LANGUAGE PROCESSING

这里面也有讲道正则表达式(Regular Expressions )的使用。
 
Back
顶部