Praise for the First Edition" full of vivid and thought-provoking anecdotes needs to be read by anyone with a serious interest in research and marketing."—Research magazine"Shmueli et al. have done a wonderful job in presenting the field of data mining a welcome addition to the literature."—computingreviews.comIncorporating a new focus on data visualization and time series forecasting, Data Mining for Business Intelligence, Second Edition continues to supply insightful, detailed guidance on fundamental data mining techniques. This new edition guides readers through the use of the Microsoft Office Excel® add-in XLMiner® for developing predictive models and techniques for describing and finding patterns in data.From clustering customers into market segments and finding the characteristics of frequent flyers to learning what items are purchased with other items, the authors use interesting, real-world examples to build a theoretical and practical understanding of key data mining methods, including classification, prediction, and affinity analysis as well as data reduction, exploration, and visualization.The Second Edition now features:Three new chapters on time series forecasting, introducing popular business forecasting methods including moving average, exponential smoothing methods; regression-based models; and topics such as explanatory vs. predictive modeling, two-level models, and ensemblesA revised chapter on data visualization that now features interactive visualization principles and added assignments that demonstrate interactive visualization in practiceSeparate chapters that each treat k-nearest neighbors and Naïve Bayes methodsSummaries at the start of each chapter that supply an outline of key topicsThe book includes access to XLMiner®, allowing readers to work hands-on with the provided data. Throughout the book, applications of the discussed topics focus on the business problem as motivation and avoid unnecessary statistical theory. Each chapter concludes with exercises that allow readers to assess their comprehension of the presented material. The final chapter includes a set of cases that require use of the different data mining techniques, and a related Web site features data sets, exercise solutions, PowerPoint® slides, and case solutions.Data Mining for Business Intelligence, Second Edition is an excellent book for courses on data mining, forecasting, and decision support systems at the upper-undergraduate and graduate levels. It is also a one-of-a-kind resource for analysts, researchers, and practitioners working with quantitative methods in the fields of business, finance, marketing, computer science, and information technology.
Foreword xiiiPreface xvAcknowledgments xviiIntroduction 1What Is Data Mining? 1Where Is Data Mining Used? 2The Origins of Data Mining 2The Rapid Growth of Data Mining 3Why Are There So Many Different Methods? 4Terminology and Notation 4Road Maps to This Book 6Overview of the Data Mining Process 9Introduction 9Core Ideas in Data Mining 9Supervised and Unsupervised Learning 11The Steps in Data Mining 11Preliminary Steps 13Building a Model: Example with Linear Regression 21Using Excel for Data Mining 27Problems 31Data Exploration and Dimension Reduction 35Introduction 35Practical Considerations 35House Prices in Boston 36Data Summaries 37Data Visualization 38Correlation Analysis 40Reducing the Number of Categories in Categorical Variables 41Principal Components Analysis 41Breakfast Cereals 42Principal Components 45Normalizing the Data 46Using Principal Components for Classification and Prediction 49Problems 51Evaluating Classification and Predictive Performance 53Introduction 53Judging Classification Performance 53Accuracy Measures 53Cutoff for Classification 56Performance in Unequal Importance of Classes 60Asymmetric Misclassification Costs 61Oversampling and Asymmetric Costs 66Classification Using a Triage Strategy 72Evaluating Predictive Performance 72Problems 74Multiple Linear Regression 75Introduction 75Explanatory vs. Predictive Modeling 76Estimating the Regression Equation and Prediction 76Example: Predicting the Price of Used Toyota Corolla Automobiles 77Variable Selection in Linear Regression 81Reducing the Number of Predictors 81How to Reduce the Number of Predictors 82Problems 86Three Simple Classification Methods 91Introduction 91Predicting Fraudulent Financial Reporting 91Predicting Delayed Flights 92The Naive Rule 92Naive Bayes 93Conditional Probabilities and Pivot Tables 94A Practical Difficulty 94A Solution: Naive Bayes 95Advantages and Shortcomings of the naive Bayes Classifier 100k-Nearest Neighbors 103Riding Mowers 104Choosing k 105k-NN for a Quantitative Response 106Advantages and Shortcomings of k-NN Algorithms 106Problems 108Classification and Regression Trees 111Introduction 111Classification Trees 113Recursive Partitioning 113Example 1: Riding Mowers 113Measures of Impurity 115Evaluating the Performance of a Classification Tree 120Acceptance of Personal Loan 120Avoiding Overfitting 121Stopping Tree Growth: CHAID 121Pruning the Tree 125Classification Rules from Trees 130Regression Trees 130Prediction 130Measuring Impurity 131Evaluating Performance 132Advantages, Weaknesses, and Extensions 132Problems 134Logistic Regression 137Introduction 137The Logistic Regression Model 138Example: Acceptance of Personal Loan 139Model with a Single Predictor 141Estimating the Logistic Model from Data: Computing Parameter Estimates 143Interpreting Results in Terms of Odds 144Why Linear Regression Is Inappropriate for a Categorical Response 146Evaluating Classification Performance 148Variable Selection 148Evaluating Goodness of Fit 150Example of Complete Analysis: Predicting Delayed Flights 153Data Preprocessing 154Model Fitting and Estimation 155Model Interpretation 155Model Performance 155Goodness of fit 157Variable Selection 158Logistic Regression for More Than Two Classes 160Ordinal Classes 160Nominal Classes 161Problems 163Neural Nets 167Introduction 167Concept and Structure of a Neural Network 168Fitting a Network to Data 168Tiny Dataset 169Computing Output of Nodes 170Preprocessing the Data 172Training the Model 172Classifying Accident Severity 176Avoiding overfitting 177Using the Output for Prediction and Classification 181Required User Input 181Exploring the Relationship Between Predictors and Response 182Advantages and Weaknesses of Neural Networks 182Problems 184Discriminant Analysis 187Introduction 187Example 1: Riding Mowers 187Example 2: Personal Loan Acceptance 188Distance of an Observation from a Class 188Fisher's Linear Classification Functions 191Classification Performance of Discriminant Analysis 194Prior Probabilities 195Unequal Misclassification Costs 195Classifying More Than Two Classes 196Medical Dispatch to Accident Scenes 196Advantages and Weaknesses 197Problems 200Association Rules 203Introduction 203Discovering Association Rules in Transaction Databases 203Example 1: Synthetic Data on Purchases of Phone Faceplates 204Generating Candidate Rules 204The Apriori Algorithm 205Selecting Strong Rules 206Support and Confidence 206Lift Ratio 207Data Format 207The Process of Rule Selection 209Interpreting the Results 210Statistical Significance of Rules 211Example 2: Rules for Similar Book Purchases 212Summary 212Problems 215Cluster Analysis 219Introduction 219Example: Public Utilities 220Measuring Distance Between Two Records 222Euclidean Distance 223Normalizing Numerical Measurements 223Other Distance Measures for Numerical Data 223Distance Measures for Categorical Data 226Distance Measures for Mixed Data 226Measuring Distance Between Two Clusters 227Hierarchical (Agglomerative) Clustering 228Minimum Distance (Single Linkage) 229Maximum Distance (Complete Linkage) 229Group Average (Average Linkage) 230Dendrograms: Displaying Clustering Process and Results 230Validating Clusters 231Limitations of Hierarchical Clustering 232Nonhierarchical Clustering: The k-Means Algorithm 233Initial Partition into k Clusters 234Problems 237Cases 241Charles Book Club 241German Credit 250Tayko Software Cataloger 254Segmenting Consumers of Bath Soap 258Direct-Mail Fundraising 262Catalog Cross-Selling 265Predicting Bankruptcy 267References 271Index 273