Data Mining for Business Intelligence: Concepts, Techniques, and Applications in Microsoft Office Excel with XLMiner

Hardcover
from $0.00

Author: Galit Shmueli

ISBN-10: 0470084855

ISBN-13: 9780470084854

Category: Data Warehousing & Mining

Search in google:

Praise for the First Edition" full of vivid and thought-provoking anecdotes needs to be read by anyone with a serious interest in research and marketing."—Research magazine"Shmueli et al. have done a wonderful job in presenting the field of data mining a welcome addition to the literature."—computingreviews.comIncorporating a new focus on data visualization and time series forecasting, Data Mining for Business Intelligence, Second Edition continues to supply insightful, detailed guidance on fundamental data mining techniques. This new edition guides readers through the use of the Microsoft Office Excel® add-in XLMiner® for developing predictive models and techniques for describing and finding patterns in data.From clustering customers into market segments and finding the characteristics of frequent flyers to learning what items are purchased with other items, the authors use interesting, real-world examples to build a theoretical and practical understanding of key data mining methods, including classification, prediction, and affinity analysis as well as data reduction, exploration, and visualization.The Second Edition now features:Three new chapters on time series forecasting, introducing popular business forecasting methods including moving average, exponential smoothing methods; regression-based models; and topics such as explanatory vs. predictive modeling, two-level models, and ensemblesA revised chapter on data visualization that now features interactive visualization principles and added assignments that demonstrate interactive visualization in practiceSeparate chapters that each treat k-nearest neighbors and Naïve Bayes methodsSummaries at the start of each chapter that supply an outline of key topicsThe book includes access to XLMiner®, allowing readers to work hands-on with the provided data. Throughout the book, applications of the discussed topics focus on the business problem as motivation and avoid unnecessary statistical theory. Each chapter concludes with exercises that allow readers to assess their comprehension of the presented material. The final chapter includes a set of cases that require use of the different data mining techniques, and a related Web site features data sets, exercise solutions, PowerPoint® slides, and case solutions.Data Mining for Business Intelligence, Second Edition is an excellent book for courses on data mining, forecasting, and decision support systems at the upper-undergraduate and graduate levels. It is also a one-of-a-kind resource for analysts, researchers, and practitioners working with quantitative methods in the fields of business, finance, marketing, computer science, and information technology.

Foreword     xiiiPreface     xvAcknowledgments     xviiIntroduction     1What Is Data Mining?     1Where Is Data Mining Used?     2The Origins of Data Mining     2The Rapid Growth of Data Mining     3Why Are There So Many Different Methods?     4Terminology and Notation     4Road Maps to This Book     6Overview of the Data Mining Process     9Introduction     9Core Ideas in Data Mining     9Supervised and Unsupervised Learning     11The Steps in Data Mining     11Preliminary Steps     13Building a Model: Example with Linear Regression     21Using Excel for Data Mining     27Problems     31Data Exploration and Dimension Reduction     35Introduction     35Practical Considerations     35House Prices in Boston     36Data Summaries     37Data Visualization     38Correlation Analysis     40Reducing the Number of Categories in Categorical Variables     41Principal Components Analysis     41Breakfast Cereals     42Principal Components     45Normalizing the Data     46Using Principal Components for Classification and Prediction     49Problems     51Evaluating Classification and Predictive Performance     53Introduction     53Judging Classification Performance     53Accuracy Measures     53Cutoff for Classification     56Performance in Unequal Importance of Classes     60Asymmetric Misclassification Costs     61Oversampling and Asymmetric Costs     66Classification Using a Triage Strategy     72Evaluating Predictive Performance     72Problems     74Multiple Linear Regression     75Introduction     75Explanatory vs. Predictive Modeling     76Estimating the Regression Equation and Prediction     76Example: Predicting the Price of Used Toyota Corolla Automobiles     77Variable Selection in Linear Regression     81Reducing the Number of Predictors     81How to Reduce the Number of Predictors     82Problems     86Three Simple Classification Methods      91Introduction     91Predicting Fraudulent Financial Reporting     91Predicting Delayed Flights     92The Naive Rule     92Naive Bayes     93Conditional Probabilities and Pivot Tables     94A Practical Difficulty     94A Solution: Naive Bayes     95Advantages and Shortcomings of the naive Bayes Classifier     100k-Nearest Neighbors     103Riding Mowers     104Choosing k     105k-NN for a Quantitative Response     106Advantages and Shortcomings of k-NN Algorithms     106Problems     108Classification and Regression Trees     111Introduction     111Classification Trees     113Recursive Partitioning     113Example 1: Riding Mowers     113Measures of Impurity     115Evaluating the Performance of a Classification Tree     120Acceptance of Personal Loan     120Avoiding Overfitting     121Stopping Tree Growth: CHAID     121Pruning the Tree     125Classification Rules from Trees     130Regression Trees      130Prediction     130Measuring Impurity     131Evaluating Performance     132Advantages, Weaknesses, and Extensions     132Problems     134Logistic Regression     137Introduction     137The Logistic Regression Model     138Example: Acceptance of Personal Loan     139Model with a Single Predictor     141Estimating the Logistic Model from Data: Computing Parameter Estimates     143Interpreting Results in Terms of Odds     144Why Linear Regression Is Inappropriate for a Categorical Response     146Evaluating Classification Performance     148Variable Selection     148Evaluating Goodness of Fit     150Example of Complete Analysis: Predicting Delayed Flights     153Data Preprocessing     154Model Fitting and Estimation     155Model Interpretation     155Model Performance     155Goodness of fit     157Variable Selection     158Logistic Regression for More Than Two Classes     160Ordinal Classes     160Nominal Classes     161Problems      163Neural Nets     167Introduction     167Concept and Structure of a Neural Network     168Fitting a Network to Data     168Tiny Dataset     169Computing Output of Nodes     170Preprocessing the Data     172Training the Model     172Classifying Accident Severity     176Avoiding overfitting     177Using the Output for Prediction and Classification     181Required User Input     181Exploring the Relationship Between Predictors and Response     182Advantages and Weaknesses of Neural Networks     182Problems     184Discriminant Analysis     187Introduction     187Example 1: Riding Mowers     187Example 2: Personal Loan Acceptance     188Distance of an Observation from a Class     188Fisher's Linear Classification Functions     191Classification Performance of Discriminant Analysis     194Prior Probabilities     195Unequal Misclassification Costs     195Classifying More Than Two Classes     196Medical Dispatch to Accident Scenes      196Advantages and Weaknesses     197Problems     200Association Rules     203Introduction     203Discovering Association Rules in Transaction Databases     203Example 1: Synthetic Data on Purchases of Phone Faceplates     204Generating Candidate Rules     204The Apriori Algorithm     205Selecting Strong Rules     206Support and Confidence     206Lift Ratio     207Data Format     207The Process of Rule Selection     209Interpreting the Results     210Statistical Significance of Rules     211Example 2: Rules for Similar Book Purchases     212Summary     212Problems     215Cluster Analysis     219Introduction     219Example: Public Utilities     220Measuring Distance Between Two Records     222Euclidean Distance     223Normalizing Numerical Measurements     223Other Distance Measures for Numerical Data     223Distance Measures for Categorical Data     226Distance Measures for Mixed Data     226Measuring Distance Between Two Clusters     227Hierarchical (Agglomerative) Clustering     228Minimum Distance (Single Linkage)     229Maximum Distance (Complete Linkage)     229Group Average (Average Linkage)     230Dendrograms: Displaying Clustering Process and Results     230Validating Clusters     231Limitations of Hierarchical Clustering     232Nonhierarchical Clustering: The k-Means Algorithm     233Initial Partition into k Clusters     234Problems     237Cases     241Charles Book Club     241German Credit     250Tayko Software Cataloger     254Segmenting Consumers of Bath Soap     258Direct-Mail Fundraising     262Catalog Cross-Selling     265Predicting Bankruptcy     267References     271Index     273