Applied Data Mining for Business and Industry

Paperback
from $0.00

Author: Paolo Giudici

ISBN-10: 0470058870

ISBN-13: 9780470058879

Category: Economic Reference

The increasing availability of data in our current, information overloaded society has led to the need for valid tools for its modelling and analysis. Data mining and applied statistical methods are the appropriate tools to extract knowledge from such data. This book provides an accessible introduction to data mining methods in a consistent and application oriented statistical framework, using case studies drawn from real industry projects and highlighting the use of data mining methods in a...

Search in google:

The increasing availability of data in our current, information overloaded society has led to the need for valid tools for its modelling and analysis. Data mining and applied statistical methods are the appropriate tools to extract knowledge from such data. This book provides an accessible introduction to data mining methods in a consistent and application oriented statistical framework, using case studies drawn from real industry projects and highlighting the use of data mining methods in a variety of business applications. Introduces data mining methods and applications.Covers classical and Bayesian multivariate statistical methodology as well as machine learning and computational data mining methods.Includes many recent developments such as association and sequence rules, graphical Markov models, lifetime value modelling, credit risk, operational risk and web mining.Features detailed case studies based on applied projects within industry.Incorporates discussion of data mining software, with case studies analysed using R.Is accessible to anyone with a basic knowledge of statistics or data analysis.Includes an extensive bibliography and pointers to further reading within the text. Applied Data Mining for Business and Industry, 2nd edition is aimed at advanced undergraduate and graduate students of data mining, applied statistics, database management, computer science and economics. The case studies will provide guidance to professionals working in industry on projects involving large volumes of data, such as customer relationship management, web design, risk management, marketing, economics and finance.

1 Introduction 1Part I Methodology 52 Organisation of the data 72.1 Statistical units and statistical variables 72.2 Data matrices and their transformations 92.3 Complex data structures 102.4 Summary 113 Summary statistics 133.1 Univariate exploratory analysis 133.1.1 Measures of location 133.1.2 Measures of variability 153.1.3 Measures of heterogeneity 163.1.4 Measures of concentration 173.1.5 Measures of asymmetry 193.1.6 Measures of kurtosis 203.2 Bivariate exploratory analysis of quantitative data 223.3 Multivariate exploratory analysis of quantitative data 253.4 Multivariate exploratory analysis of qualitative data 273.4.1 Independence and association 283.4.2 Distance measures 293.4.3 Dependency measures 313.4.4 Model-based measures 323.5 Reduction of dimensionality 343.5.1 Interpretation of the principal components 363.6 Further reading 394 Model specification 414.1 Measures of distance 424.1.1 Euclidean distance 434.1.2 Similarity measures 444.1.3 Multidimensional scaling 464.2 Cluster analysis 474.2.1 Hierarchical methods 494.2.2 Evaluation of hierarchical methods 534.2.3 Non-hierarchical methods 554.3 Linear regression 574.3.1 Bivariate linear regression 574.3.2 Properties of the residuals 604.3.3 Goodness of fit 624.3.4 Multiple linear regression 634.4 Logistic regression 674.4.1 Interpretation of logistic regression 684.4.2 Discriminant analysis 704.5 Tree models 714.5.1 Division criteria 734.5.2 Pruning 744.6 Neural networks 764.6.1 Architecture of a neural network 794.6.2 The multilayer perceptron 814.6.3 Kohonen networks 874.7 Nearest-neighbour models 894.8 Local models 904.8.1 Association rules 904.8.2 Retrieval by content 964.9 Uncertainty measures and inference 964.9.1 Probability 974.9.2 Statistical models 994.9.3 Statistical inference 1034.10 Non-parametric modelling 1094.11 The normal linear model 1124.11.1 Main inferential results 1134.12 Generalised linear models 1164.12.1 The exponential family 1174.12.2 Definition of generalised linear models 1184.12.3 The logistic regression model 1254.13 Log-linear models 1264.13.1 Construction of a log-linear model 1264.13.2 Interpretation of a log-linear model 1284.13.3 Graphical log-linear models 1294.13.4 Log-linear model comparison 1324.14 Graphical models 1334.14.1 Symmetric graphical models 1354.14.2 Recursive graphical models 1394.14.3 Graphical models and neural networks 1414.15 Survival analysis models 1424.16 Further reading 1445 Model evaluation 1475.1 Criteria based on statistical tests 1485.1.1 Distance between statistical models 1485.1.2 Discrepancy of a statistical model 1505.1.3 Kullback-Leibler discrepancy 1515.2 Criteria based on scoring functions 1535.3 Bayesian criteria 1555.4 Computational criteria 1565.5 Criteria based on loss functions 1595.6 Further reading 162Part II Business case studies 1636 Describing website visitors 1656.1 Objectives of the analysis 1656.2 Description of the data 1656.3 Exploratory analysis 1676.4 Model building 1676.4.1 Cluster analysis 1686.4.2 Kohonen networks 1696.5 Model comparison 1716.6 Summary report 1727 Market basket analysis 1757.1 Objectives of the analysis 1757.2 Description of the data 1767.3 Exploratory data analysis 1787.4 Model building 1817.4.1 Log-linear models 1817.4.2 Association rules 1847.5 Model comparison 1867.6 Summary report 1918 Describing customer satisfaction 1938.1 Objectives of the analysis 1938.2 Description of the data 1948.3 Exploratory data analysis 1948.4 Model building 1978.5 Summary 2019 Predicting credit risk of small businesses 2039.1 Objectives of the analysis 2039.2 Description of the data 2039.3 Exploratory data analysis 2059.4 Model building 2069.5 Model comparison 2099.6 Summary report 21010 Predicting e-learning student performance 21110.1 Objectives of the analysis 21110.2 Description of the data 21210.3 Exploratory data analysis 21210.4 Model specification 21410.5 Model comparison 21710.6 Summary report 21811 Predicting customer lifetime value 21911.1 Objectives of the analysis 21911.2 Description of the data 22011.3 Exploratory data analysis 22111.4 Model specification 22311.5 Model comparison 22411.6 Summary report 22512 Operational risk management 22712.1 Context and objectives of the analysis 22712.2 Exploratory data analysis 22812.3 Model building 23012.4 Model comparison 23212.5 Summary conclusions 235References 237Index 243

\ From the Publisher“If I had to recommend a good introduction to data mining, I would choose this one.” (Stat Papers, 2011)\ \ \