Machine Learning: An Algorithmic Perspective

ISBN-10: 1420067184

ISBN-13: 9781420067187

Traditional books on machine learning can be divided into two groups — those aimed at advanced undergraduates or early postgraduates with reasonable mathematical knowledge and those that are primers on how to code algorithms. The field is ready for a text that not only demonstrates how to use the algorithms that make up machine learning methods, but also provides the background needed to understand how and why these algorithms work. Machine Learning: An Algorithmic Perspective is that text.Theory Backed up by Practical ExamplesThe book covers neural networks, graphical models, reinforcement learning, evolutionary algorithms, dimensionality reduction methods, and the important area of optimization. It treads the fine line between adequate academic rigor and overwhelming students with equations and mathematical concepts. The author addresses the topics in a practical way while providing complete information and references where other expositions can be found. He includes examples based on widely available datasets and practical and theoretical problems to test understanding and application of the material. The book describes algorithms with code examples backed up by a website that provides working implementations in Python. The author uses data from a variety of applications to demonstrate the methods and includes practical problems for students to solve.Highlights a Range of Disciplines and ApplicationsDrawing from computer science, statistics, mathematics, and engineering, the multidisciplinary nature of machine learning is underscored by its applicability to areas ranging from finance to biology and medicine to physics and chemistry. Written in an easily accessible style, this book bridges the gaps between disciplines, providing the ideal blend of theory and practical, applicable knowledge.

Prologue xv1 Introduction 11.1 If Data Had Mass, the Earth Would Be a Black Hole 21.2 Learning 41.2.1 Machine Learning 51.3 Types of Machine Learning 61.4 Supervised Learning 71.4.1 Regression 81.4.2 Classification 91.5 The Brain and the Neuron 111.5.1 Hebb's Rule 121.5.2 McCulloch and Pitts Neurons 131.5.3 Limitations of the McCulloch and Pitt Neuronal Model 15Further Reading 162 Linear Discriminants 172.1 Preliminaries 182.2 The Perceptron 192.2.1 The Learning Rate η 212.2.2 The Bias Input 222.2.3 The Perceptron Learning Algorithm 232.2.4 An Example of Perceptron Learning 242.2.5 Implementation 262.2.6 Testing the Network 312.3 Linear Separability 322.3.1 The Exclusive Or (XOR) Function 342.3.2 A Useful Insight 362.3.3 Another Example: The Pima Indian Dataset 372.4 Linear Regression 412.4.1 Linear Regression Examples 43Further Reading 44Practice Questions 453 The Multi-Layer Perceptron 473.1 Going Forwards 493.1.1 Biases 503.2 Going Backwards: Back-Propagation of Error 503.2.1 The Multi-Layer Preceptron Algorithm 543.2.2 Initialising the Weights 573.2.3 Different Output Activation Functions 583.2.4 Sequential and Batch Training 593.2.5 Local Minima 603.2.6 Picking Up Momentum 613.2.7 Other Improvements 623.3 The Multi-Layer Perceptron in Practice 633.3.1 Data Preparation 633.3.2 Amount of Training Data 633.3.3 Number of Hidden Layers 643.3.4 Generalisation and Overfitting 663.3.5 Training, Testing, and Validation 663.3.6 When to Stop Learning 683.3.7 Computing and Evaluating the Results 693.4 Examples of Using the MLP703.4.1 A Regression Problem 703.4.2 Classification with the MLP 743.4.3 A Classification Example 753.4.4 Time-Series Prediction 773.4.5 Data Compression: The Auto-Associative Network 803.5 Overview 833.6 Deriving Back-Propagation 843.6.1 The Network Output and the Error 843.6.2 The Error of the Network 853.6.3 A Suitable Activation Function 873.6.4 Back-Propagation of Error 88Further Reading 90Practice Questions 914 Radial Basis Functions and Splines 954.1 Concepts 954.1.1 Weight Space 954.1.2 Receptive Fields 974.2 The Radial Basis Function (RBF) Network 1004.2.1 Training the RBF Network 1034.3 The Curse of Dimensionality 1064.4 Interpolation and Basis Functions 1084.4.1 Bases and Basis Functions 1084.4.2 The Cubic Spline 1124.4.3 Fitting the Spline to the Data 1124.4.4 Smoothing Splines 1134.4.5 Higher Dimensions 1144.4.6 Beyond the Bounds 116Further Reading 116Practice Questions 1175 Support Vector Machines 1195.1 Optimal Separation 1205.2 Kernels 1255.2.1 Example: XOR 1285.2.2 Extensions to the Support Vector Machine 128Further Reading 130Practice Questions 1316 Learning with Trees 1336.1 Using Decision Trees 1336.2 Constructing Decision Trees 1346.2.1 Quick Aside: Entropy in Information Theory 1356.2.2 ID3 1366.2.3 Implementing Trees and Graphs in Python 1396.2.4 Implementation of the Decision Tree 1406.2.5 Dealing with Continuous Variables 1436.2.6 Computational Complexity 1436.3 Classification and Regression Trees (CART) 1456.3.1 Gini Impurity 1466.3.2 Regression in Trees 1476.4 Classification Example 147Further Reading 150Practice Questions 1517 Decision by Committee: Ensemble Learning 1537.1 Boosting 1547.1.1 AdaBoost 1557.1.2 Stumpting 1607.2 Bagging 1607.2.1 Subagging 1627.3 Different Ways to Combine Classifiers 162Further Reading 164Practice Questions 1658 Probability and Learning 1678.1 Turning Data into Probabilities 1678.1.1 Minimising Risk 1718.1.2 The Naive Bayes' Classifier 1718.2 Some Basic Statistics 1738.2.1 Averages 1738.2.2 Variance and Covariance 1748.2.3 The Gaussian 1768.2.4 The Bias-Variance Tradeoff 1778.3 Gaussian Mixture Models 1788.3.1 The Expectation-Maximisation (EM) Algorithm 1798.4 Nearest Neighbour Methods 1838.4.1 Nearest Neighbour Smoothing 1858.4.2 Efficient Distance Computations: the KD-Tree 1868.4.3 Distance Measures 190Further Reading 192Practice Questions 1939 Unsupervised Learning 1959.1 The κ-Means Algorithm 1969.1.1 Dealing with Noise 2009.1.2 The κ-Means Neural Network 2009.1.3 Normalisation 2029.1.4 A Better Weight Update Rule 2039.1.5 Example: The Iris Dataset Again 2049.1.6 Using Competitive Learning for Clustering 2059.2 Vector Quantisation 2069.3 The Self-Organising Feature Map 2079.3.1 The SOM Algorithm 2109.3.2 Neighbourhood Connections 2119.3.3 Self-Organisation 2149.3.4 Network Dimensionality and Boundary Conditions 2149.3.5 Examples of Using the SOM 215Further Reading 218Practice Questions 22010 Dimensionality Reduction 22110.1 Linear Discriminant Analysis (LDA) 22310.2 Principal Components Analysis (PCA) 22610.2.1 Relation with the Multi-Layer Perceptron 23110.2.2 Kernel PCA 23210.3 Factor Analysis 23410.4 Independent Components Analysis (ICA) 23710.5 Locally Linear Embedding 23910.6 Isomap 24210.6.1 Multi-Dimensional Scaling (MDS) 242Further Reading 245Practice Questions 24611 Optimisation and Search 24711.1 Going Downhill 24811.2 Least-Squares Optimisation 25111.2.1 Taylor Expansion 25111.2.2 The Levenberg-Marquardt Algorithm 25211.3 Conjugate Gradients 25711.3.1 Conjugate Gradients Example 26011.4 Search: Three Basic Approaches 26111.4.1 Exhaustive Search 26111.4.2 Greedy Search 26111.4.3 Hill Climbing 26211.5 Exploitation and Exploration 26411.6 Simulated Annealing 26511.6.1 Comparison 266Further Reading 267Practice Questions 26712 Evolutionary Learning 26912.1 The Genetic Algorithm (GA) 27012.1.1 String Representation 27112.1.2 Evaluating Fitness 27212.1.3 Population 27312.1.4 Generating Offspring: Parent Selection 27312.2 Generating Offspring: Genetic Operators 27512.2.1 Crossover 27512.2.2 Mutation 27712.2.3 Elitism, Tournaments, and Niching 27712.3 Using Genetic Algorithms 27912.3.1 Map Colouring 27912.3.2 Punctuated Equilibrium 28112.3.3 Example: The Knapsack Problem 28112.3.4 Example: The Four Peaks Problem 28212.3.5 Limitations of the GA 28412.3.6 Training Neural Networks with Genetic Algorithms 28512.4 Genetic Programming 28512.5 Combining Sampling with Evolutionary Learning 286Further Reading 289Practice Questions 29013 Reinforcement Learning 29313.1 Overview 29413.2 Example: Getting Lost 29613.2.1 State and Action Spaces 29813.2.2 Carrots and Sticks: the Reward Function 29913.2.3 Discounting 30013.2.4 Action Selection 30113.2.5 Policy 30213.3 Markov Decision Processes 30213.3.1 The Markov Property 30213.3.2 Probabilities in Markov Decision Processes 30313.4 Values 30513.5 Back on Holiday: Using Reinforcement Learning 30913.6 The Difference between Sarsa and Q-Learning 31013.7 Uses of Reinforcement Learning 311Further Reading 312Practice Questions 31214 Markov Chain Monte Carlo (MCMC) Methods 31514.1 Sampling 31514.1.1 Random Numbers 31614.1.2 Gaussian Random Numbers 31714.2 Monte Carlo or Bust 31914.3 The Proposal Distribution 32014.4 Markov Chain Monte Carlo 32514.4.1 Markov Chains 32514.4.2 The Metropolis-Hastings Algorithm 32614.4.3 Simulated Annealing (Again) 32714.4.4 Gibbs Sampling 328Further Reading 331Practice Questions 33215 Graphical Models 33315.1 Bayesian Networks 33515.1.1 Example: Exam Panic 33515.1.2 Approximate Inference 33915.1.3 Making Bayesian Networks 34215.2 Markov Random Fields 34415.3 Hidden Markov Models (HMMs) 34715.3.1 The Forward Algorithm 34915.3.2 The Viterbi Algorithm 35215.3.3 The Baum-Welch or Forward-Backward Algorithm 35315.4 Tracking Methods 35615.4.1 The Kalman Filter 35715.4.2 The Particle Filter 360Further Reading 361Practice Questions 36216 Python 36516.1 Installing Python and Other Packages 36516.2 Getting Started 36516.2.1 Python for MATLAB and R users 37016.3 Code Basics 37016.3.1 Writing and Importing Code 37016.3.2 Control Flow 37116.3.3 Functions 37216.3.4 The doc String 37316.3.5 map and lambda 37316.3.6 Exceptions 37416.3.7 Classes 37416.4 Using NumPy and Matplotlib 37516.4.1 Arrays 37516.4.2 Random Numbers 37916.4.3 Linear Algebra 37916.4.4 Plotting 380Further Reading 381Practice Questions 382Index 383