Klienditugi: 7440010 (E-R 10-18)

Abi | Registreeri | Logi sisse

Foundations of Predictive Analytics [Pehme köide]

3.50/5 (4 hinnangut Goodreads-ist)

Stephen Coggeshall, James Wu

Formaat: Paperback / softback, 338 pages, kõrgus x laius: 234x156 mm, kaal: 453 g
Ilmumisaeg: 05-Sep-2019
Kirjastus: Chapman & Hall/CRC
ISBN-10: 0367381680
ISBN-13: 9780367381684

Teised raamatud teemal:

Computer science - (Hetkel poes: 7 nimetust)
Data mining - (Hetkel poes: 1 nimetust)
Automatic control engineering
Probability & statistics - (Hetkel poes: 2 nimetust)

Pehme köide
Hind: 86,49 €
Raamatu kohalejõudmiseks kirjastusest kulub orienteeruvalt 2-4 nädalat
Kogus:
- - 1
  - 2
  - 3
  - 4
  - 5
  - 6
  - 7
  - 8
  - 9
  - 10
Lisa ostukorvi
Tasuta tarne
Tellimisaeg 2-4 nädalat
Lisa soovinimekirja
Raamatukogudele

Formaat: Paperback / softback, 338 pages, kõrgus x laius: 234x156 mm, kaal: 453 g
Ilmumisaeg: 05-Sep-2019
Kirjastus: Chapman & Hall/CRC
ISBN-10: 0367381680
ISBN-13: 9780367381684

Teised raamatud teemal:

Computer science - (Hetkel poes: 7 nimetust)
Data mining - (Hetkel poes: 1 nimetust)
Automatic control engineering
Probability & statistics - (Hetkel poes: 2 nimetust)

Püsilink: https://www.kriso.ee/db/9780367381684.html

Märksõnad:

Data mining

Drawing on the authors two decades of experience in applied modeling and data mining, Foundations of Predictive Analytics presents the fundamental background required for analyzing data and building models for many practical applications, such as consumer behavior modeling, risk and marketing analytics, and other areas. It also discusses a variety of practical topics that are frequently missing from similar texts.

The book begins with the statistical and linear algebra/matrix foundation of modeling methods, from distributions to cumulant and copula functions to CornishFisher expansion and other useful but hard-to-find statistical techniques. It then describes common and unusual linear methods as well as popular nonlinear modeling approaches, including additive models, trees, support vector machine, fuzzy systems, clustering, naļve Bayes, and neural nets. The authors go on to cover methodologies used in time series and forecasting, such as ARIMA, GARCH, and survival analysis. They also present a range of optimization techniques and explore several special topics, such as DempsterShafer theory.

An in-depth collection of the most important fundamental material on predictive analytics, this self-contained book provides the necessary information for understanding various techniques for exploratory data analysis and modeling. It explains the algorithmic details behind each technique (including underlying assumptions and mathematical formulations) and shows how to prepare and encode data, select variables, use model goodness measures, normalize odds, and perform reject inference.

Web ResourceThe books website at www.DataMinerXL.com offers the DataMinerXL software for building predictive models. The site also includes more examples and information on modeling.

List of Figures

List of Tables

xvii

Preface

xix

1 Introduction

(8)

1.1 What Is a Model?

(1)

1.2 What Is a Statistical Model?

(1)

1.3 The Modeling Process

(1)

1.4 Modeling Pitfalls

(1)

1.5 Characteristics of Good Modelers

(2)

1.6 The Future of Predictive Analytics

(2)

2 Properties of Statistical Distributions

(54)

2.1 Fundamental Distributions

(29)

2.1.1 Uniform Distribution

(1)

2.1.2 Details of the Normal (Gaussian) Distribution

(9)

2.1.3 Lognormal Distribution

(1)

2.1.4 F Distribution

(2)

2.1.5 Chi-Squared Distribution

(3)

2.1.6 Non-Central Chi-Squared Distribution

(3)

2.1.7 Student's t-Distribution

(1)

2.1.8 Multivariate t-Distribution

(2)

2.1.9 F-Distribution

(1)

2.1.10 Binomial Distribution

(1)

2.1.11 Poisson Distribution

(1)

2.1.12 Exponential Distribution

(1)

2.1.13 Geometric Distribution

(1)

2.1.14 Hypergeometric Distribution

(1)

2.1.15 Negative Binomial Distribution

(1)

2.1.16 Inverse Gaussian (IG) Distribution

(1)

2.1.17 Normal Inverse Gaussian (NIG) Distribution

(2)

2.2 Central Limit Theorem

(2)

2.3 Estimate of Mean, Variance, Skewness, and Kurtosis from Sample Data

(1)

2.4 Estimate of the Standard Deviation of the Sample Mean

(1)

2.5 (Pseudo) Random Number Generators

(2)

2.5.1 Mersenne Twister Pseudorandom Number Generator

(1)

2.5.2 Box-Muller Transform for Generating a Normal Distribution

(1)

2.6 Transformation of a Distribution Function

(1)

2.7 Distribution of a Function of Random Variables

(3)

2.7.1 Z = X + Y

(1)

2.7.2 Z = X · Y

(1)

2.7.3 (Z1, Z2, ..., Zn) = (X1, X2, ..., Xn) · Y

(1)

2.7.4 Z = X/Y

(1)

2.7.5 Z = max(X, Y)

(1)

2.7.6 Z = min(X, Y)

(1)

2.8 Moment Generating Function

(2)

2.8.1 Moment Generating Function of Binomial Distribution

(1)

2.8.2 Moment Generating Function of Normal Distribution

(1)

2.8.3 Moment Generating Function of the Γ Distribution

(1)

2.8.4 Moment Generating Function of Chi-Square Distribution

(1)

2.8.5 Moment Generating Function of the Poisson Distribution

(1)

2.9 Cumulant Generating Function

(2)

2.10 Characteristic Function

(3)

2.10.1 Relationship between Cumulative Function and Characteristic Function

(1)

2.10.2 Characteristic Function of Normal Distribution

(1)

2.10.3 Characteristic Function of F Distribution

(1)

2.11 Chebyshev's Inequality

(1)

2.12 Markov's Inequality

(1)

2.13 Gram-Charlier Series

(1)

2.14 Edgeworth Expansion

(1)

2.15 Cornish-Fisher Expansion

(2)

2.15.1 Lagrange Inversion Theorem

(1)

2.15.2 Cornish-Fisher Expansion

(1)

2.16 Copula Functions

(5)

2.16.1 Gaussian Copula

(1)

2.16.2 t-Copula

(1)

2.16.3 Archimedean Copula

(1)

3 Important Matrix Relationships

(20)

3.1 Pseudo-Inverse of a Matrix

(1)

3.2 A Lemma of Matrix Inversion

(2)

3.3 Identity for a Matrix Determinant

(1)

3.4 Inversion of Partitioned Matrix

(1)

3.5 Determinant of Partitioned Matrix

(1)

3.6 Matrix Sweep and Partial Correlation

(2)

3.7 Singular Value Decomposition (SVD)

(2)

3.8 Diagonalization of a Matrix

(4)

3.9 Spectral Decomposition of a Positive Semi-Definite Matrix

(1)

3.10 Normalization in Vector Space

(1)

3.11 Conjugate Decomposition of a Symmetric Definite Matrix

(1)

3.12 Cholesky Decomposition

(3)

3.13 Cauchy-Schwartz Inequality

(1)

3.14 Relationship of Correlation among Three Variables

(2)

4 Linear Modeling and Regression

(46)

4.1 Properties of Maximum Likelihood Estimators

(4)

4.1.1 Likelihood Ratio Test

(1)

4.1.2 Wald Test

(1)

4.1.3 Lagrange Multiplier Statistic

(1)

4.2 Linear Regression

(18)

4.2.1 Ordinary Least Squares (OLS) Regression

(6)

4.2.2 Interpretation of the Coefficients of Linear Regression

(2)

4.2.3 Regression on Weighted Data

(3)

4.2.4 Incrementally Updating a Regression Model with Additional Data

100

(1)

4.2.5 Partitioned Regression

101

(1)

4.2.6 How Does the Regression Change When Adding One More Variable?

101

(2)

4.2.7 Linearly Restricted Least Squares Regression

103

(2)

4.2.8 Significance of the Correlation Coefficient

105

(1)

4.2.9 Partial Correlation

105

(1)

4.2.10 Ridge Regression

105

(1)

4.3 Fisher's Linear Discriminant Analysis

106

(3)

4.4 Principal Component Regression (PCR)

109

(1)

4.5 Factor Analysis

110

(1)

4.6 Partial Least Squares Regression (PLSR)

111

(2)

4.7 Generalized Linear Model (GLM)

113

(3)

4.8 Logistic Regression: Binary

116

(3)

4.9 Logistic Regression: Multiple Nominal

119

(2)

4.10 Logistic Regression: Proportional Multiple Ordinal

121

(2)

4.11 Fisher Scoring Method for Logistic Regression

123

(2)

4.12 Tobit Model: A Censored Regression Model

125

(4)

4.12.1 Some Properties of the Normal Distribution

125

(1)

4.12.2 Formulation of the Tobit Model

126

(3)

5 Nonlinear Modeling

129

(44)

5.1 Naive Bayesian Classifier

129

(2)

5.2 Neural Network

131

(6)

5.2.1 Back Propagation Neural Network

131

(6)

5.3 Segmentation and Tree Models

137

(14)

5.3.1 Segmentation

137

(1)

5.3.2 Tree Models

138

(2)

5.3.3 Sweeping to Find the Best Cutpoint

140

(3)

5.3.4 Impurity Measure of a Population: Entropy and Gini Index

143

(4)

5.3.5 Chi-Square Splitting Rule

147

(1)

5.3.6 Implementation of Decision Trees

148

(3)

5.4 Additive Models

151

(7)

5.4.1 Boosted Tree

153

(1)

5.4.2 Least Squares Regression Boosting Tree

154

(1)

5.4.3 Binary Logistic Regression Boosting Tree

155

(3)

5.5 Support Vector Machine (SVM)

158

(10)

5.5.1 Wolfe Dual

158

(1)

5.5.2 Linearly Separable Problem

159

(2)

5.5.3 Linearly Inseparable Problem

161

(1)

5.5.4 Constructing Higher-Dimensional Space and Kernel

162

(1)

5.5.5 Model Output

163

(1)

5.5.6 C-Support Vector Classification (C-SVC) for Classification

164

(1)

5.5.7 E-Support Vector Regression (E-SVR) for Regression

164

(3)

5.5.8 The Probability Estimate

167

(1)

5.6 Fuzzy Logic System

168

(1)

5.6.1 A Simple Fuzzy Logic System

168

(1)

5.7 Clustering

169

(4)

5.7.1 K Means, Fuzzy C Means

170

(1)

5.7.2 Nearest Neighbor, K Nearest Neighbor (KNN)

171

(1)

5.7.3 Comments on Clustering Methods

171

(2)

6 Time Series Analysis

173

(22)

6.1 Fundamentals of Forecasting

173

(8)

6.1.1 Box-Cox Transformation

174

(1)

6.1.2 Smoothing Algorithms

175

(1)

6.1.3 Convolution of Linear Filters

176

(1)

6.1.4 Linear Difference Equation

177

(1)

6.1.5 The Autocovariance Function and Autocorrelation Function

178

(1)

6.1.6 The Partial Autocorrelation Function

179

(2)

6.2 ARIMA Models

181

(6)

6.2.1 MA(q) Process

182

(2)

6.2.2 AR(p) Process

184

(2)

6.2.3 ARMA(p, q) Process

186

(1)

6.3 Survival Data Analysis

187

(4)

6.3.1 Sampling Method

190

(1)

6.4 Exponentially Weighted Moving Average (EWMA) and GARCH(1, 1)

191

(4)

6.4.1 Exponentially Weighted Moving Average (EWMA)

191

(1)

6.4.2 ARCH and GARCH Models

192

(3)

7 Data Preparation and Variable Selection

195

(18)

7.1 Data Quality and Exploration

196

(1)

7.2 Variable Scaling and Transformation

197

(1)

7.3 How to Bin Variables

197

(2)

7.3.1 Equal Interval

198

(1)

7.3.2 Equal Population

198

(1)

7.3.3 Tree Algorithms

199

(1)

7.4 Interpolation in One and Two Dimensions

199

(1)

7.5 Weight of Evidence (WOE) Transformation

200

(4)

7.6 Variable Selection Overview

204

(2)

7.7 Missing Data Imputation

206

(1)

7.8 Stepwise Selection Methods

207

(2)

7.8.1 Forward Selection in Linear Regression

208

(1)

7.8.2 Forward Selection in Logistic Regression

208

(1)

7.9 Mutual Information, KL Distance

209

(1)

7.10 Detection of Multicollinearity

210

(3)

8 Model Goodness Measures

213

(18)

8.1 Training, Testing, Validation

213

(2)

8.2 Continuous Dependent Variable

215

(3)

8.2.1 Example: Linear Regression

217

(1)

8.3 Binary Dependent Variable (Two-Group Classification)

218

(9)

8.3.1 Kolmogorov-Smirnov (KS) Statistic

218

(2)

8.3.2 Confusion Matrix

220

(1)

8.3.3 Concordant and Discordant

221

(2)

8.3.4 R2 for Logistic Regression

223

(1)

8.3.5 AIC and SBC

224

(1)

8.3.6 Hosmer-Lemeshow Goodness-of-Fit Test

224

(1)

8.3.7 Example: Logistic Regression

225

(2)

8.4 Population Stability Index Using Relative Entropy

227

(4)

9 Optimization Methods

231

(40)

9.1 Lagrange Multiplier

232

(2)

9.2 Gradient Descent Method

234

(2)

9.3 Newton-Raphson Method

236

(2)

9.4 Conjugate Gradient Method

238

(2)

9.5 Quasi-Newton Method

240

(2)

9.6 Genetic Algorithms (GA)

242

(1)

9.7 Simulated Annealing

242

(1)

9.8 Linear Programming

243

(4)

9.9 Nonlinear Programming (NLP)

247

(16)

9.9.1 General Nonlinear Programming (GNLP)

248

(1)

9.9.2 Lagrange Dual Problem

249

(1)

9.9.3 Quadratic Programming (QP)

250

(4)

9.9.4 Linear Complementarity Programming (LCP)

254

(2)

9.9.5 Sequential Quadratic Programming (SQP)

256

(7)

9.10 Nonlinear Equations

263

(1)

9.11 Expectation-Maximization (EM) Algorithm

264

(4)

9.12 Optimal Design of Experiment

268

(3)

10 Miscellaneous Topics

271

(20)

10.1 Multidimensional Scaling

271

(3)

10.2 Simulation

274

(4)

10.3 Odds Normalization and Score Transformation

278

(2)

10.4 Reject Inference

280

(1)

10.5 Dempster-Shafer Theory of Evidence

281

(10)

10.5.1 Some Properties in Set Theory

281

(1)

10.5.2 Basic Probability Assignment, Belief Function, and Plausibility Function

282

(3)

10.5.3 Dempster-Shafer's Rule of Combination

285

(2)

10.5.4 Applications of Dempster-Shafer Theory of Evidence: Multiple Classifier Function

287

(4)

Appendix A Useful Mathematical Relations

291

(8)

A.1 Information Inequality

291

(1)

A.2 Relative Entropy

291

(1)

A.3 Saddle-Point Method

292

(1)

A.4 Stirling's Formula

293

(1)

A.5 Convex Function and Jensen's Inequality

294

(5)

Appendix B DataMinerXL - Microsoft Excel Add-In for Building Predictive Models

299

(10)

B.1 Overview

299

(1)

B.2 Utility Functions

299

(1)

B.3 Data Manipulation Functions

300

(1)

B.4 Basic Statistical Functions

300

(1)

B.5 Modeling Functions for All Models

301

(1)

B.6 Weight of Evidence Transformation Functions

301

(1)

B.7 Linear Regression Functions

302

(1)

B.8 Partial Least Squares Regression Functions

302

(1)

B.9 Logistic Regression Functions

303

(1)

B.10 Time Series Analysis Functions

303

(1)

B.11 Naive Bayes Classifier Functions

303

(1)

B.12 Tree-Based Model Functions

304

(1)

B.13 Clustering and Segmentation Functions

304

(1)

B.14 Neural Network Functions

304

(1)

B.15 Support Vector Machine Functions

304

(1)

B.16 Optimization Functions

305

(1)

B.17 Matrix Operation Functions

305

(1)

B.18 Numerical Integration Functions

306

(1)

B.19 Excel Built-in Statistical Distribution Functions

306

(3)

Bibliography

309

(4)

Index

313

James Wu is a Fixed Income Quant with extensive expertise in a wide variety of applied analytical solutions in consumer behavior modeling and financial engineering. He previously worked at ID Analytics, Morgan Stanley, JPMorgan Chase, Los Alamos Computational Group, and CASA. He earned a PhD from the University of Idaho.

Stephen Coggeshall is the Chief Technology Officer of ID Analytics. He previously worked at Los Alamos Computational Group, Morgan Stanley, HNC Software, CASA, and Los Alamos National Laboratory. During his over 20 year career, Dr. Coggeshall has helped teams of scientists develop practical solutions to difficult business problems using advanced analytics. He earned a PhD from the University of Illinois and was named 2008 Technology Executive of the Year by the San Diego Business Journal.

Foundations of Predictive Analytics [Pehme köide]

Konto & seaded

Otsing

Otsingu andmebaas

Filtreeri tulemusi

Teemad Ingliskeelsed raamatud

Vali ostukorv