Speech and Language Processing

Hardcover
from $0.00

Author: Daniel Jurafsky

ISBN-10: 0131873210

ISBN-13: 9780131873216

Category: Natural Language Processing & Speech Recognition / Synthesis

An explosion of Web-based language techniques, merging of distinct fields, availability of phone-based dialogue systems, and much more make this an exciting time in speech and language processing. The first of its kind to thoroughly cover language technology – at all levels and with all modern technologies – this book takes an empirical approach to the subject, based on applying statistical and other machine-learning algorithms to large corporations. Builds each chapter around one or more...

Search in google:

An explosion of Web-based language techniques, merging of distinct fields, availability of phone-based dialogue systems, and much more make this an exciting time in speech and language processing. The first of its kind to thoroughly cover language technology – at all levels and with all modern technologies – this book takes an empirical approach to the subject, based on applying statistical and other machine-learning algorithms to large corporations. Builds each chapter around one or more worked examples demonstrating the main idea of the chapter, usingthe examples to illustrate the relative strengths and weaknesses of various approaches. Adds coverage of statistical sequence labeling, information extraction, question answering and summarization, advanced topics in speech recognition, speech synthesis. Revises coverage of language modeling, formal grammars, statistical parsing, machine translation, and dialog processing. A useful reference for professionals in any of the areas of speech and language processing.

ForewordPrefaceAbout the Authors 1 Introduction 1.1 Knowledge in Speech and Language Processing1.2 Ambiguity1.3 Models and Algorithms1.4 Language, Thought, and Understanding1.5 The State of the Art1.6 Some Brief History1.6.1 Foundational Insights: 1940s and 1950s1.6.2 The Two Camps: 1957—19701.6.3 Four Paradigms: 1970—19831.6.4 Empiricism and Finite State Models Redux: 1983—19931.6.5 The Field Comes Together: 1994—19991.6.6 The Rise of Machine Learning: 2000—20081.6.7 On Multiple Discoveries1.6.8 A Final Brief Note on Psychology1.7 SummaryBibliographical and Historical NotesPart I Words2 Regular Expressions and Automata 2.1 Regular Expressions2.1.1 Basic Regular Expression Patterns2.1.2 Disjunction, Grouping, and Precedence2.1.3 A Simple Example2.1.4 A More Complex Example2.1.5 Advanced Operators2.1.6 Regular Expression Substitution, Memory, and ELIZA2.2 Finite-State Automata2.2.1 Using an FSA to Recognize Sheeptalk2.2.2 Formal Languages2.2.3 Another Example2.2.4 Non-Deterministic FSAs2.2.5 Using an NFSA to Accept Strings2.2.6 Recognition as Search2.2.7 Relating Deterministic and Non-Deterministic Automata2.3 Regular Languages and FSAs2.4 SummaryBibliographical and Historical NotesExercises3 Words and Transducers 3.1 Survey of (Mostly) English Morphology3.1.1 Inflectional Morphology3.1.2 Derivational Morphology3.1.3 Cliticization3.1.4 Non-Concatenative Morphology3.1.5 Agreement3.2 Finite-State Morphological Parsing3.3 Construction of a Finite-State Lexicon3.4 Finite-State Transducers3.4.1 Sequential Transducers and Determinism3.5 FSTs for Morphological Parsing3.6 Transducers and Orthographic Rules3.7 The COmbination of an FST Lexicon and Rules3.8 Lexicon-Free FSTs: The Porter Stemmer3.9 Word and Sentence Tokenization3.9.1 Segmentation in Chinese3.10 Detection and Correction of Spelling Errors3.11 Minimum Edit Distance3.12 Human Morphological Processing3.13 SummaryBibliographical and Historical NotesExercises4 N-grams4.1 Word Counting in Corpora4.2 Simple (Unsmoothed) N-grams4.3 Training and Test Sets4.3.1 N-gram Sensitivity to the Training Corpus4.3.2 Unknown Words: Open Versus Closed Vocabulary Tasks4.4 Evaluating N-grams: Perplexity4.5 Smoothing4.5.1 Laplace Smoothing4.5.2 Good-Turing Discounting4.5.3 Some Advanced Issues in Good-Turing Estimation4.6 Interpolation4.7 Backoff4.7.1 Advanced: Details of Computing Katz Backoff a and P∗4.8 Practical Issues: Toolkits and Data Formats4.9 Advanced Issues in Language Modeling4.9.1 Advanced Smoothing Methods: Kneser-Ney Smoothing4.9.2 Class-Based N-grams4.9.3 Language Model Adaptation and Web Use4.9.4 Using Longer Distance Information: A Brief Summary4.10 Advanced: Information Theory Background4.10.1 Cross-Entropy for Comparing Models4.11 Advanced: The Entropy of English and Entropy Rate Constancy4.12 SummaryBibliographical and Historical NotesExercises5 Part-of-Speech Tagging5.1 (Mostly) English Word Classes5.2 Tagsets for English5.3 Part-of-Speech Tagging5.4 Rule-Based Part-of-Speech Tagging5.5 HMM Part-of-Speech Tagging5.5.1 Computing the Most-Likely Tag Sequence: An Example5.5.2 Formalizing Hidden Markov Model Taggers5.5.3 Using the Viterbi Algorithm for HMM Tagging5.5.4 Extending the HMM Algorithm to Trigrams5.6 Transformation-Based Tagging5.6.1 How TBL Rules Are Applied5.6.2 How TBL Rules Are Learned5.7 Evaluation and Error Analysis5.7.1 Error Analysis5.8 Advanced Issues in Part-of-Speech Tagging5.8.1 Practical Issues: Tag Indeterminacy and Tokenization5.8.2 Unknown Words5.8.3 Part-of-Speech Tagging for Other Languages5.8.4 Tagger Combination5.9 Advanced: The Noisy Channel Model for Spelling5.9.1 Contextual Spelling Error Correction5.10 SummaryBibliographical and Historical NotesExercises6 Hidden Markov and Maximum Entropy Models 6.1 Markov Chains6.2 The Hidden Markov Model6.3 Likelihood Computation: The Forward Algorithm6.4 Decoding: The Viterbi Algorithm6.5 HMM Training: The Forward-Backward Algorithm6.6 Maximum Entropy Models: Background6.6.1 Linear Regression6.6.2 Logistic Regression6.6.3 Logistic Regression: Classification6.6.4 Advanced: Learning in Logistic Regression6.7 Maximum Entropy Modeling6.7.1 Why We Call it Maximum Entropy6.8 Maximum Entropy Markov Models6.8.1 Decoding and Learning in MEMMs6.9 SummaryBibliographical and Historical NotesExercisesPart II Speech7 Phonetics 7.1 Speech Sounds and Phonetic Transcription7.2 Articulatory Phonetics7.2.1 The Vocal Organs7.2.2 Consonants: Place of Articulation7.2.3 Consonants: Manner of Articulation7.2.4 Vowels7.2.5 Syllables7.3 Phonological Categories and Pronunciation Variation7.3.1 Phonetic Features7.3.2 Predicting Phonetic Variation7.3.3 Factors Influencing Phonetic Variation7.4 Acoustic Phonetics and Signals7.4.1 Waves7.4.2 Speech Sound Waves7.4.3 Frequency and Amplitude; Pitch and Loudness7.4.4 Interpretation of Phones from a Waveform7.4.5 Spectra and the Frequency Domain7.4.6 The Source-Filter Model7.5 Phonetic Resources7.6 Advanced: Articulatory and Gestural Phonology7.7 SummaryBibliographical and Historical NotesExercises8 Speech Synthesis 8.1 Text Normalization8.1.1 Sentence Tokenization8.1.2 Non-Standard Words8.1.3 Homograph Disambiguation8.2 Phonetic Analysis8.2.1 Dictionary Lookup8.2.2 Names8.2.3 Grapheme-to-Phoneme Conversion8.3 Prosodic Analysis8.3.1 Prosodic Structure8.3.2 Prosodic Prominence8.3.3 Tune8.3.4 More Sophisticated Models: ToBI8.3.5 Computing Duration from Prosodic Labels8.3.6 Computing F0 from Prosodic Labels8.3.7 Final Result of Text Analysis: Internal Representation8.4 Diphone Waveform synthesis8.4.1 Steps for Building a Diphone Database8.4.2 Diphone Concatenation and TD-PSOLA for Prosody8.5 Unit Selection (Waveform) Synthesis8.6 EvaluationBibliographical and Historical NotesExercises9 Automatic Speech Recognition 9.1 Speech Recognition Architecture9.2 Applying the Hidden Markov Model to Speech9.3 Feature Extraction: MFCC vectors9.3.1 Preemphasis9.3.2 Windowing9.3.3 Discrete Fourier Transform9.3.4 Mel Filter Bank and Log9.3.5 The Cepstrum: Inverse Discrete Fourier Transform9.3.6 Deltas and Energy9.3.7 Summary: MFCC9.4 Acoustic Likelihood Computation9.4.1 Vector Quantization9.4.2 Gaussian PDFs9.4.3 Probabilities, Log Probabilities and Distance Functions9.5 The Lexicon and Language Model9.6 Search and Decoding9.7 Embedded Training9.8 Evaluation: Word Error Rate9.9 SummaryBibliographical and Historical NotesExercises10 Speech Recognition: Advanced Topics 10.1 Multipass Decoding: N-best Lists and Lattices10.2 A∗ (‘Stack’) Decoding10.3 Context-Dependent Acoustic Models: Triphones10.4 Discriminative Training10.4.1 Maximum Mutual Information Estimation10.4.2 Acoustic Models Based on Posterior Classifiers10.5 Modeling Variation10.5.1 Environmental Variation and Noise10.5.2 Speaker Variation and Speaker Adaptation10.5.3 Pronunciation Modeling: Variation Due to Genre10.6 Metadata: Boundaries, Punctuation, and Disfluencies10.7 Speech Recognition by Humans10.8 SummaryBibliographical and Historical NotesExercises11 Computational Phonology 11.1 Finite-State Phonology11.2 Advanced Finite-State Phonology11.2.1 Harmony11.2.2 Templatic Morphology11.3 Computational Optimality Theory11.3.1 Finite-State Transducer Models of Optimality Theory11.3.2 Stochastic Models of Optimality Theory11.4 Syllabification11.5 Learning Phonology and Morphology11.5.1 Learning Phonological Rules11.5.2 Learning Morphology11.5.3 Learning in Optimality Theory11.6 SummaryBibliographical and Historical NotesExercisesPart III Syntax12 Formal Grammars of English 12.1 Constituency12.2 Context-Free Grammars12.2.1 Formal definition of Context-Free Grammar12.3 Some Grammar Rules for English12.3.1 Sentence-Level Constructions12.3.2 Clauses and Sentences12.3.3 The Noun Phrase12.3.4 Agreement12.3.5 The Verb Phrase and Subcategorization12.3.6 Auxiliaries12.3.7 Coordination12.4 Treebanks12.4.1 Example: The Penn Treebank Project12.4.2 Treebanks as Grammars12.4.3 Treebank Searching12.4.4 Heads and Head Finding12.5 Grammar Equivalence and Normal Form12.6 Finite-State and Context-Free Grammars12.7 Dependency Grammars12.7.1 The Relationship Between Dependencies and Heads12.7.2 Categorial Grammar12.8 Spoken Language Syntax12.8.1 Disfluencies and Repair12.8.2 Treebanks for Spoken Language12.9 Grammars and Human Processing12.10 SummaryBibliographical and Historical NotesExercises13 Syntactic Parsing13.1 Parsing as Search13.1.1 Top-Down Parsing13.1.2 Bottom-Up Parsing13.1.3 Comparing Top-Down and Bottom-Up Parsing13.2 Ambiguity13.3 Search in the Face of Ambiguity13.4 Dynamic Programming Parsing Methods13.4.1 CKY Parsing13.4.2 The Earley Algorithm13.4.3 Chart Parsing13.5 Partial Parsing13.5.1 Finite-State Rule-Based Chunking13.5.2 Machine Learning-Based Approaches to Chunking13.5.3 Evaluating Chunking Systems13.6 SummaryBibliographical and Historical NotesExercises14 Statistical Parsing 14.1 Probabilistic Context-Free Grammars14.1.1 PCFGs for Disambiguation14.1.2 PCFGs for Language Modeling14.2 Probabilistic CKY Parsing of PCFGs14.3 Learning PCFG Rule Probabilities14.4 Problems with PCFGs14.4.1 Independence Assumptions Miss Structural Dependencies Between Rules14.4.2 Lack of Sensitivity to Lexical Dependencies14.5 Improving PCFGs by Splitting Non-Terminals14.6 Probabilistic Lexicalized CFGs14.6.1 The Collins Parser14.6.2 Advanced: Further Details of the Collins Parser14.7 Evaluating Parsers14.8 Advanced: Discriminative Reranking14.9 Advanced: Parser-Based Language Modeling14.10 Human Parsing14.11 SummaryBibliographical and Historical NotesExercises15 Features and Unification 15.1 Feature Structures15.2 Unification of Feature Structures15.3 Feature Structures in the Grammar15.3.1 Agreement15.3.2 Head Features15.3.3 Subcategorization15.3.4 Long-Distance Dependencies15.4 Implementation of Unification15.4.1 Unification Data Structures15.4.2 The Unification Algorithm15.5 Parsing with Unification Constraints15.5.1 Integration of Unification into an Earley Parser15.5.2 Unification-Based Parsing15.6 Types and Inheritance15.6.1 Advanced: Extensions to Typing15.6.2 Other Extensions to Unification15.7 SummaryBibliographical and Historical NotesExercises16 Language and Complexity 16.1 The Chomsky Hierarchy16.2 Ways to Tell if a Language Isn’t Regular16.2.1 The Pumping Lemma16.2.2 Proofs That Various Natural Languages Are Not Regular16.3 Is Natural Language Context-Free?16.4 Complexity and Human Processing16.5 SummaryBibliographical and Historical NotesExercisesPart IV Semantics and Pragmatics17 The Representation of Meaning 17.1 Computational Desiderata for Representations17.1.1 Verifiability17.1.2 Unambiguous Representations17.1.3 Canonical Form17.1.4 Inference and Variables17.1.5 Expressiveness17.2 Model-Theoretic Semantics17.3 First-Order Logic17.3.1 Basic Elements of First-Order Logic17.3.2 Variables and Quantifiers17.3.3 Lambda Notation17.3.4 The Semantics of First-Order Logic17.3.5 Inference17.4 Event and State Representations17.4.1 Representing Time17.4.2 Aspect17.5 Description Logics17.6 Embodied and Situated Approaches to Meaning17.7 SummaryBibliographical and Historical NotesExercises18 Computational Semantics 18.1 Syntax-Driven Semantic Analysis18.2 Semantic Augmentations to Syntactic Rules18.3 Quantifier Scope Ambiguity and Underspecification18.3.1 Store and Retrieve Approaches18.3.2 Constraint-Based Approaches18.4 Unification-Based Approaches to Semantic Analysis18.5 Integration of Semantics into the Earley Parser18.6 Idioms and Compositionality18.7 SummaryBibliographical and Historical NotesExercises19 Lexical Semantics 19.1 Word Senses19.2 Relations Between Senses19.2.1 Synonymy and Antonymy19.2.2 Hyponymy19.2.3 Semantic Fields19.3 WordNet: A Database of Lexical Relations19.4 Event Participants19.4.1 Thematic Roles19.4.2 Diathesis Alternations19.4.3 Problems with Thematic Roles19.4.4 The Proposition Bank19.4.5 FrameNet19.4.6 Selectional Restrictions19.5 Primitive Decomposition19.6 Advanced: Metaphor19.7 SummaryBibliographical and Historical NotesExercises20 Computational Lexical Semantics 20.1 Word Sense Disambiguation: Overview20.2 Supervised Word Sense Disambiguation20.2.1 Feature Extraction for Supervised Learning20.2.2 Naive Bayes and Decision List Classifiers20.3 WSD Evaluation, Baselines, and Ceilings20.4 WSD: Dictionary and Thesaurus Methods20.4.1 The Lesk Algorithm20.4.2 Selectional Restrictions and Selectional Preferences . . . . 68420.5 Minimally Supervised WSD: Bootstrapping20.6 Word Similarity: Thesaurus Methods20.7 Word Similarity: Distributional Methods20.7.1 Defining a Word’s Co-Occurrence Vectors20.7.2 Measuring Association with Context20.7.3 Defining Similarity Between Two Vectors20.7.4 Evaluating Distributional Word Similarity20.8 Hyponymy and Other Word Relations20.9 Semantic Role Labeling20.10 Advanced: Unsupervised Sense Disambiguation20.11 SummaryBibliographical and Historical NotesExercises21 Computational Discourse 21.1 Discourse Segmentation21.1.1 Unsupervised Discourse Segmentation21.1.2 Supervised Discourse Segmentation21.1.3 Discourse Segmentation Evaluation21.2 Text Coherence21.2.1 Rhetorical Structure Theory21.2.2 Automatic Coherence Assignment21.3 Reference Resolution21.4 Reference Phenomena21.4.1 Five Types of Referring Expressions21.4.2 Information Status21.5 Features for Pronominal Anaphora Resolution21.6 Three Algorithms for Pronominal Anaphora Resolution21.6.1 Pronominal Anaphora Baseline: The Hobbs Algorithm21.6.2 A Centering Algorithm for Anaphora Resolution21.6.3 A Log-Linear Model for Pronominal Anaphora Resoluton21.6.4 Features for Pronominal Anaphora Resoluton21.7 Coreference Resolution21.8 Evaluation of Coreference Resolution21.9 Advanced: Inference-Based Coherence Resolution21.10 Psycholinguistic Studies of Reference21.11 SummaryBibliographical and Historical NotesExercisesPart V Applications22 Information Extraction 22.1 Named Entity Recognition22.1.1 Ambiguity in Named Entity Recognition22.1.2 NER as Sequence Labeling22.1.3 Evaluation of Named Entity Recognition22.1.4 Practical NER Architectures22.2 Relation Detection and Classification22.2.1 Supervised Learning Approaches to Relation Analysis22.2.2 Lightly Supervised Approaches to Relation Analysis22.2.3 Evaluation of Relation Analysis Systems22.3 Temporal and Event Processing22.3.1 Temporal Expression Recognition22.3.2 Temporal Normalization22.3.3 Event Detection and Analysis22.3.4 TimeBank22.4 Template-Filling22.4.1 Statistical Approaches to Template-Filling22.4.2 Finite-State Template-Filling Systems22.5 Advanced: Biomedical Information Extraction22.5.1 Biological Named Entity Recognition22.5.2 Gene Normalization22.5.3 Biological Roles and Relations22.6 SummaryBibliographical and Historical NotesExercises23 Question Answering and Summarization 23.1 Information Retrieval23.1.1 The Vector Space Model23.1.2 Term Weighting23.1.3 Term Selection and Creation23.1.4 Evaluation of Information-Retrieval Systems23.1.5 Homonymy, Polysemy, and Synonymy23.1.6 Ways to Improve User Queries23.2 Factoid Question Answering23.2.1 Question Processing23.2.2 Passage Retrieval23.2.3 Answer Processing23.2.4 Evaluation of Factoid Answers23.3 Summarization23.4 Single Document Summarization23.4.1 Unsupervised Content Selection23.4.2 Unsupervised Summarization Based on Rhetorical Parsing23.4.3 Supervised Content Selection23.4.4 Sentence Simplification23.5 Multi-Document Summarization23.5.1 Content Selection in Multi-Document Summarization23.5.2 Information Ordering in Multi-Document Summarization23.6 Focused Summarization and Question Answering23.7 Summarization Evaluation23.8 SummaryBibliographical and Historical NotesExercises24 Dialogue and Conversational Agents 24.1 Properties of Human Conversations24.1.1 Turns and Turn-Taking24.1.2 Language as Action: Speech Acts24.1.3 Language as Joint Action: Grounding24.1.4 Conversational Structure24.1.5 Conversational Implicature24.2 Basic Dialogue Systems24.2.1 ASR component24.2.2 NLU component24.2.3 Generation and TTS components24.2.4 Dialogue Manager24.2.5 Dealing with Errors: Confirmation and Rejection24.3 Voice24.4 Dialogue System Design and Evaluation24.4.1 Designing Dialogue Systems24.4.2 Evaluating Dialogue Systems24.5 Information-State and Dialogue Acts24.5.1 Using Dialogue Acts24.5.2 Interpreting Dialogue Acts24.5.3 Detecting Correction Acts24.5.4 Generating Dialogue Acts: Confirmation and Rejection24.6 Markov Decision Process Architecture24.7 Advanced: Plan-Based Dialogue Agents24.7.1 Plan-Inferential Interpretation and Production24.7.2 The Intentional Structure of Dialogue24.8 SummaryBibliographical and Historical NotesExercises25 Machine Translation 25.1 Why Machine Translation Is Hard25.1.1 Typology25.1.2 Other Structural Divergences25.1.3 Lexical Divergences25.2 Classical MT and the Vauquois Triangle25.2.1 Direct Translation25.2.2 Transfer25.2.3 Combined Direct and Tranfer Approaches in Classic MT25.2.4 The Interlingua Idea: Using Meaning25.3 Statistical MT25.4 P(FE): the Phrase-Based Translation Model25.5 Alignment in MT25.5.1 IBM Model 125.5.2 HMM Alignment25.6 Training Alignment Models25.6.1 EM for Training Alignment Models25.7 Symmetrizing Alignments for Phrase-Based MT25.8 Decoding for Phrase-Based Statistical MT25.9 MT Evaluation25.9.1 Using Human Raters25.9.2 Automatic Evaluation: BLEU25.10 Advanced: Syntactic Models for MT25.11 Advanced: IBM Model 3 and Fertitlity25.11.1 Training for Model 325.12 Advanced: Log-linearModels for MT . . . . . . . . . . . . . . . . 90325.13 SummaryBibliographical and Historical NotesExercisesBibliography Author Index Subject Index