Semisupervised Learning for Computational Linguistics

Hardcover
from $0.00

Author: Steven Abney

ISBN-10: 1584885599

ISBN-13: 9781584885597

Category: Linguistics & Semiotics

The rapid advancement in the theoretical understanding of statistical and machine learning methods for semisupervised learning has made it difficult for nonspecialists to keep up to date in the field. Providing a broad, accessible treatment of the theory as well as linguistic applications, Semisupervised Learning for Computational Linguistics offers self-contained coverage of semisupervised methods that includes background material on supervised and unsupervised learning.\ The book presents a...

Search in google:

The rapid advancement in the theoretical understanding of statistical and machine learning methods for semisupervised learning has made it difficult for nonspecialists to keep up to date in the field. Providing a broad, accessible treatment of the theory as well as linguistic applications, Semisupervised Learning for Computational Linguistics offers self-contained coverage of semisupervised methods that includes background material on supervised and unsupervised learning.The book presents a brief history of semisupervised learning and its place in the spectrum of learning methods before moving on to discuss well-known natural language processing methods, such as self-training and co-training. It then centers on machine learning techniques, including the boundary-oriented methods of perceptrons, boosting, support vector machines (SVMs), and the null-category noise model. In addition, the book covers clustering, the expectation-maximization (EM) algorithm, related generative methods, and agreement methods. It concludes with the graph-based method of label propagation as well as a detailed discussion of spectral methods.Taking an intuitive approach to the material, this lucid book facilitates the application of semisupervised learning methods to natural language processing and provides the framework and motivation for a more systematic study of machine learning.

Introduction     1A brief history     1Probabilistic methods in computational linguistics     1Supervised and unsupervised training     2Semisupervised learning     3Semisupervised learning     4Major varieties of learning problem     4Motivation     6Evaluation     7Active learning     8Organization and assumptions     8Leading ideas     8Mathematical background     10Notation     11Self-training and Co-training     13Classification     13The standard setting     13Features and rules     14Decision lists     16Self-training     18The algorithm     19Parameters and variants     20Evaluation     23Symmetry of features and instances     25Related algorithms     27Co-Training     28Applications of Self-Training and Co-Training     31Part-of-speech tagging     31Information extraction     33Parsing     35Word senses     36WordNet     36Word-sense disambiguation     38Taxonomic inference     40Classification     43Two simple classifiers     43Naive Bayes     43k-nearest-neighbor classifier     45Abstract setting     48Function approximation     48Defining succes     50Fit and simplicity     52Evaluating detectors and classifiers that abstain     53Confidence-rated classifiers     53Measures for detection     54Idealized performance curves     57The multiclass case     59Binary classifiers and ECOC     62Mathematics for Boundary-Oriented Methods     67Linear separators     67Representing a hyperplane     67Eliminating the threshold     69The point-normal form     70Naive Bayes decision boundary     72The gradient     74Graphs and domains     74Convexity     76Differentiation of vector and matrix expressions     79An example: linear regression     81Constrained optimization      83Optimization     83Equality constraints     84Inequality constraints     87The Wolfe dual     91Boundary-Oriented Methods     95The perceptron     97The algorithm     97An example     99Convergence     100The perceptron algorithm as gradient descent     101Game self-teaching     103Boosting     105Abstention     110Semisupervised boosting     111Co-boosting     113Support Vector Machines (SVMs)     114The margin     114Maximizing the margin     116The nonseparable case     119Slack in the separable case     121Multiple slack points     123Transductive SVMs     125Training a transductive SVM     127Null-category noise model     129Clustering     131Cluster and label     131Clustering concepts     132Objective     132Distance and similarity     133Graphs     136Hierarchical clustering     137Self-training revisited     139k-means clustering     139Pseudo relevance feedback     140Graph mincut     143Label propagation     146Clustering by propagation     146Self-training as propagation     147Co-training as propagation     150Bibliographic notes     152Generative Models     153Gaussian mixtures     153Definition and geometric interpretation     153The linear discriminant decision boundary     156Decision-directed approximation     159McLachlan's algorithm     162The EM algorithm     163Maximizing likelihood     163Relative frequency estimation     164Divergence     166The EM algorithm     169Agreement Constraints     175Co-training     175The conditional independence assumption     176The power of conditional independence     178Agreement-based self-teaching     182Random fields     184Applied to self-training and co-training     184Gibbs sampling     186Markov chains and random walks      187Bibliographic notes     192Propagation Methods     193Label propagation     194Random walks     196Harmonic functions     198Fluids     203Flow     203Pressure     205Conservation of energy     209Thomson's principle     210Computing the solution     213Graph mincuts revisited     215Bibliographic notes     220Mathematics for Spectral Methods     221Some basic concepts     221The norm of a vector     221Matrices as linear operators     222The column space     222Eigenvalues and eigenvectors     224Definition of eigenvalues and eigenvectors     224Diagonalization     225Orthogonal diagonalization     226Eigenvalues and the scaling effects of a matrix     227Matrix norms     227The Rayleigh quotient     228The 2 x 2 case     230The general case     232The Courant-Fischer minimax theorem     234Bibliographic notes     236Spectral Methods      237Simple harmonic motion     237Harmonics     237Mixtures of harmonics     239An oscillating particle     241A vibrating string     243Spectra of matrices and graphs     251The spectrum of a matrix     252Relating matrices and graphs     253The Laplacian matrix and graph spectrum     256Spectral clustering     257The second smallest eigenvector of the Laplacian     257The cut size and the Laplacian     259Approximating cut size     260Minimizing cut size     262Ratiocut     263Spectral methods for semisupervised learning     265Harmonics and harmonic functions     265Eigenvalues and energy     267The Laplacian and random fields     268Harmonic functions and the Laplacian     270Using the Laplacian for regularization     272Transduction to induction     274Bibliographic notes     275Bibliography     277Index     301