Klienditugi: 7440010 (E-R 10-18)

Abi | Registreeri | Logi sisse

EST

E-raamat: Data Mining with SPSS Modeler: Theory, Exercises and Solutions

4.40/5 (8 hinnangut Goodreads-ist)

Sören Gröttrup, Tilo Wendler

Formaat: PDF+DRM
Ilmumisaeg: 06-Jun-2016
Kirjastus: Springer International Publishing AG
Keel: eng
ISBN-13: 9783319287096

Teised raamatud teemal:

Formaat - PDF+DRM
Hind: 122,88 €*
* hind on lõplik, st. muud allahindlused enam ei rakendu
Lisa ostukorvi
Lisa soovinimekirja
See e-raamat on mõeldud ainult isiklikuks kasutamiseks. E-raamatuid ei saa tagastada.

Formaat: PDF+DRM
Ilmumisaeg: 06-Jun-2016
Kirjastus: Springer International Publishing AG
Keel: eng
ISBN-13: 9783319287096

Teised raamatud teemal:

DRM piirangud

Kopeerimine (copy/paste):

ei ole lubatud
Printimine:

ei ole lubatud
Kasutamine:

Digitaalõiguste kaitse (DRM)
Kirjastus on väljastanud selle e-raamatu krüpteeritud kujul, mis tähendab, et selle lugemiseks peate installeerima spetsiaalse tarkvara. Samuti peate looma endale Adobe ID Rohkem infot siin. E-raamatut saab lugeda 1 kasutaja ning alla laadida kuni 6'de seadmesse (kõik autoriseeritud sama Adobe ID-ga).

Vajalik tarkvara
Mobiilsetes seadmetes (telefon või tahvelarvuti) lugemiseks peate installeerima selle tasuta rakenduse: PocketBook Reader (iOS / Android)

PC või Mac seadmes lugemiseks peate installima Adobe Digital Editionsi (Seeon tasuta rakendus spetsiaalselt e-raamatute lugemiseks. Seda ei tohi segamini ajada Adober Reader'iga, mis tõenäoliselt on juba teie arvutisse installeeritud )

Seda e-raamatut ei saa lugeda Amazon Kindle's.

Introducingthe IBM SPSS Modeler, this book guides readers through data mining processesand presents relevant statistical methods. There is a special focus onstep-by-step tutorials and well-documented examples that help demystify complexmathematical algorithms and computer programs. The variety of exercises andsolutions as well as an accompanying website with data sets and SPSS Modeler streams are particularly valuable. Whileintended for students, the simplicity of the Modeler makes the book useful foranyone wishing to learn about basic and more advanced data mining, and put thisknowledge into practice.

Preface.- Introduction.- Basic Functions of the SPSS Modeler.- Univariate Statistics.- Multivariate Statistics.- Regression Models.- Factor Analysis.- Cluster Analysis.- Classification Models.- Using R with the Modeler.- Data Sets Used in This Book.

1 Introduction

1

(24)

1.1 The Concept of the SPSS Modeler

2

(3)

1.2 Structure and Features of This Book

5

(8)

1.2.1 Prerequisites for Using This Book

5

(1)

1.2.2 Structure of the Book and the Exercise/Solution Concept

6

(2)

1.2.3 Using the Data and Streams Provided with the Book

8

(1)

1.2.4 Datasets Provided with This Book

9

(1)

1.2.5 Template Concept of This Book

10

(3)

1.3 Introducing the Modeling Process

13

(9)

1.3.1 Exercises

16

(2)

1.3.2 Solutions

18

(4)

Literature

22

(3)

2 Basic Functions of the SPSS Modeler

25

(160)

2.1 Defining Streams and Scrolling Through a Dataset

25

(7)

2.2 Switching Between Different Streams

32

(2)

2.3 Defining or Modifying Value Labels

34

(6)

2.4 Adding Comments to a Stream

40

(3)

2.5 Exercises

43

(1)

2.6 Solutions

44

(5)

2.7 Data Handling and Sampling Methods

49

(135)

2.7.1 Theory

49

(1)

2.7.2 Calculations

50

(6)

2.7.3 String Functions

56

(5)

2.7.4 Extracting/Selecting Records

61

(4)

2.7.5 Filtering Data

65

(8)

2.7.6 Data Standardization: Z-Transformation

73

(9)

2.7.7 Partitioning Datasets

82

(6)

2.7.8 Sampling Methods

88

(23)

2.7.9 Merge Datasets

111

(13)

2.7.10 Append Datasets

124

(8)

2.7.11 Exercises

132

(15)

2.7.12 Solutions

147

(37)

Literature

184

(1)

3 Univariate Statistics

185

(102)

3.1 Theory

185

(9)

3.1.1 Discrete Versus Continuous Variables

185

(2)

3.1.2 Scales of Measurement

187

(1)

3.1.3 Exercises

188

(3)

3.1.4 Solutions

191

(3)

3.2 Simple Data Examination Tasks

194

(92)

3.2.1 Theory

194

(1)

3.2.2 Frequency Distribution of Discrete Variables

194

(5)

3.2.3 Frequency Distribution of Continuous Variables

199

(3)

3.2.4 Distribution Analysis with the Data Audit Node

202

(5)

3.2.5 Concept of "SuperNodes" and Transforming a Variable to Normality

207

(17)

3.2.6 Reclassifying Values

224

(12)

3.2.7 Binning Continuous Data

236

(12)

3.2.8 Exercises

248

(11)

3.2.9 Solutions

259

(27)

Literature

286

(1)

4 Multivariate Statistics

287

(60)

4.1 Theory

287

(3)

4.2 Scatterplot

290

(6)

4.3 Scatterplot Matrix

296

(6)

4.4 Correlation

302

(8)

4.5 Correlation Matrix

310

(4)

4.6 Exclusion of Spurious Correlations

314

(1)

4.7 Contingency Tables

315

(8)

4.8 Exercises

323

(2)

4.9 Solutions

325

(20)

Literature

345

(2)

5 Regression Models

347

(166)

5.1 Introduction to Regression Models

348

(5)

5.1.1 Motivating Examples

348

(2)

5.1.2 Concept of the Modeling Process and Cross-Validation

350

(3)

5.2 Simple Linear Regression

353

(37)

5.2.1 Theory

353

(3)

5.2.2 Building the Stream in SPSS Modeler

356

(4)

5.2.3 Identification and Interpretation of the Model Parameters

360

(2)

5.2.4 Assessment of the Goodness of Fit

362

(3)

5.2.5 Predicting Unknown Values

365

(2)

5.2.6 Exercises

367

(2)

5.2.7 Solutions

369

(21)

5.3 Multiple Linear Regression

390

(58)

5.3.1 Theory

390

(2)

5.3.2 Building the Model in SPSS Modeler

392

(5)

5.3.3 Final MLR Model and Its Goodness of Fit

397

(7)

5.3.4 Prediction of Unknown Values

404

(1)

5.3.5 Cross-Validation of the Model

404

(2)

5.3.6 Boosting and Bagging (for Regression Models)

406

(9)

5.3.7 Exercises

415

(3)

5.3.8 Solutions

418

(30)

5.4 Generalized Linear (Mixed) Model

448

(40)

5.4.1 Theory

448

(2)

5.4.2 Building a Model with the GLMM Node

450

(5)

5.4.3 The Model Nugget

455

(3)

5.4.4 Cross-Validation and Fitting a Quadric Regression Model

458

(10)

5.4.5 Exercises

468

(1)

5.4.6 Solutions

469

(19)

5.5 The Auto Numeric Node

488

(23)

5.5.1 Building a Stream with the Auto Numeric Node

490

(7)

5.5.2 The Auto Numeric Model Nugget

497

(3)

5.5.3 Exercises

500

(1)

5.5.4 Solutions

500

(11)

Literature

511

(2)

6 Factor Analysis

513

(74)

6.1 Motivating Example

513

(2)

6.2 General Theory of Factor Analysis

515

(4)

6.3 Principal Component Analysis

519

(50)

6.3.1 Theory

519

(1)

6.3.2 Building a Model in SPSS Modeler

520

(27)

6.3.3 Exercises

547

(3)

6.3.4 Solutions

550

(19)

6.4 Principal Factor Analysis

569

(15)

6.4.1 Theory

569

(4)

6.4.2 Building a Model

573

(6)

6.4.3 Exercises

579

(1)

6.4.4 Solutions

579

(5)

Literature

584

(3)

7 Cluster Analysis

587

(126)

7.1 Motivating Examples

587

(2)

7.2 General Theory of Cluster Analysis

589

(12)

7.2.1 Exercises

596

(2)

7.2.2 Solutions

598

(3)

7.3 TwoStep Hierarchical Agglomerative Clustering

601

(39)

7.3.1 Theory of Hierarchical Clustering

601

(13)

7.3.2 Characteristics of the TwoStep Algorithm

614

(1)

7.3.3 Building a Model in SPSS Modeler

615

(12)

7.3.4 Exercises

627

(2)

7.3.5 Solutions

629

(11)

7.4 K-Means Partitioning Clustering

640

(45)

7.4.1 Theory

640

(2)

7.4.2 Building a Model in SPSS Modeler

642

(17)

7.4.3 Exercises

659

(3)

7.4.4 Solutions

662

(23)

7.5 Auto Clustering

685

(25)

7.5.1 Motivation and Implementation of the Auto Cluster Node

685

(2)

7.5.2 Building a Model in SPSS Modeler

687

(12)

7.5.3 Exercises

699

(1)

7.5.4 Solutions

700

(10)

7.6 Summary

710

(1)

Literature

711

(2)

8 Classification Models

713

(272)

8.1 Motivating Examples

714

(2)

8.2 General Theory of Classification Models

716

(17)

8.2.1 Process of Training and Using a Classification Model

716

(2)

8.2.2 Classification Algorithms

718

(2)

8.2.3 Classification vs. Clustering

720

(1)

8.2.4 Making a Decision and the Decision Boundary

721

(2)

8.2.5 Performance Measures of Classification Models

723

(2)

8.2.6 The Analysis Node

725

(2)

8.2.7 Exercises

727

(3)

8.2.8 Solutions

730

(3)

8.3 Logistic Regression

733

(43)

8.3.1 Theory

734

(2)

8.3.2 Building the Model in SPSS Modeler

736

(7)

8.3.3 Optional: Model Types and Variable Interactions

743

(3)

8.3.4 Final Model and Its Goodness of Fit

746

(4)

8.3.5 Classification of Unknown Values

750

(1)

8.3.6 Cross-Validation of the Model

751

(5)

8.3.7 Exercises

756

(2)

8.3.8 Solutions

758

(18)

8.4 Linear Discriminate Classification

776

(32)

8.4.1 Theory

776

(3)

8.4.2 Building the Model with SPSS Modeler

779

(6)

8.4.3 The Model Nugget and the Estimated Model Parameters

785

(3)

8.4.4 Exercises

788

(1)

8.4.5 Solutions

789

(19)

8.5 Support Vector Machine

808

(35)

8.5.1 Theory

809

(1)

8.5.2 Building the Model with SPSS Modeler

810

(10)

8.5.3 The Model Nugget

820

(1)

8.5.4 Exercises

821

(1)

8.5.5 Solutions

822

(21)

8.6 Neuronal Networks

843

(35)

8.6.1 Theory

844

(2)

8.6.2 Building a Network with SPSS Modeler

846

(10)

8.6.3 The Model Nugget

856

(4)

8.6.4 Exercises

860

(2)

8.6.5 Solutions

862

(16)

8.7 k-Nearest Neighbor

878

(39)

8.7.1 Theory

878

(4)

8.7.2 Building the Model with SPSS Modeler

882

(9)

8.7.3 The Model Nugget

891

(2)

8.7.4 Dimensional Reduction with PCA for Data Preprocessing

893

(8)

8.7.5 Exercises

901

(2)

8.7.6 Solutions

903

(14)

8.8 Decision Trees

917

(43)

8.8.1 Theory

917

(8)

8.8.2 Building a Decision Tree with the C5.0 Node

925

(4)

8.8.3 The Model Nugget

929

(3)

8.8.4 Building a decision tree with the CHAID node

932

(6)

8.8.5 Exercises

938

(1)

8.8.6 Solutions

939

(21)

8.9 The Auto Classifier Node

960

(23)

8.9.1 Building a Stream with the Auto Classifier Node

961

(10)

8.9.2 The Auto Classifier Model Nugget

971

(2)

8.9.3 Exercises

973

(1)

8.9.4 Solutions

974

(9)

Literature

983

(2)

9 Using R with the Modeler

985

(52)

9.1 Advantages of R with the Modeler

985

(1)

9.2 Connecting with R

986

(4)

9.3 Test the SPSS Modeler Connection to R

990

(4)

9.4 Calculating New Variables in R

994

(5)

9.5 Model Building in R

999

(9)

9.6 Exercises

1008

(10)

9.7 Solutions

1018

(17)

Literature

1035

(2)

10 Appendix

1037

10.1 Data Sets Used in This Book

1037

(20)

10.1.1 adult_income_data.txt

1037

(1)

10.1.2 beer.sav

1037

(1)

10.1.3 benchmark.xlsx

1037

(2)

10.1.4 car_simple.sav

1039

(1)

10.1.5 car_sales_modified. sav

1039

(1)

10.1.6 chess_endgame_data.txt

1039

(1)

10.1.7 customer_bank_data.csv

1040

(1)

10.1.8 diabetes_data_reduced.sav

1040

(1)

10.1.9 DRUG1n.sav

1041

(1)

10.1.10 EEG_Sleep_Signals.csv

1042

(1)

10.1.11 employee_dataset_001 and employee_dataset_002

1042

(1)

10.1.12 England Payment Datasets

1042

(2)

10.1.13 Features_eeg_signals.csv

1044

(1)

10.1.14 gene_expre ssion_leukemi a. csv

1044

(1)

10.1.15 gene_expression_leukemia_short.csv

1045

(1)

10.1.16 gravity_constant_data.csv

1045

(1)

10.1.17 Housing.data.txt

1046

(1)

10.1.18 Iris.csv

1046

(1)

10.1.19 IT-projects.txt

1047

(1)

10.1.20 IT user satisfaction.sav

1047

(1)

10.1.21 longley.csv

1047

(2)

10.1.22 LPGA2009.csv

1049

(1)

10.1.23 Mtcars.csv

1050

(1)

10.1.24 nutrition_habites.sav

1051

(1)

10.1.25 optdigits_training.txt, optdigits_test.txt

1051

(1)

10.1.26 Orthodont.csv

1052

(1)

10.1.27 Ozone.csv

1052

(1)

10.1.28 pisa2012_math_q45.sav

1052

(2)

10.1.29 sales_list.sav

1054

(1)

10.1.30 ships.csv

1054

(1)

10.1.31 test_scores.sav

1054

(1)

10.1.32 Titanic.xlsx

1055

(1)

10.1.33 tree_credit. sav

1055

(1)

10.1.34 wine_data.txt

1056

(1)

10.1.35 WisconsinBreastCancerData.csv

1056

(1)

10.1.36 z_pm_customer1.sav

1057

(1)

Literature

1057

Prof. Dr. Tilo Wendler studied mathematics, physics and business information technology. In his doctoral thesis he examined determinants of user expectations in using information technology. With much interest he applied complex statistical methods in the banking sector especially in the field of rating methods. He has been teaching business statistics and data mining for ten years.

Dr. Sören Gröttrup studied mathematics and computer science with focus on probability theory and statistics and got his Ph.D. for his research on biological models. Parallel to his doctoral studies, he worked in a research institute as a data analyst on genomic data sets. Today, he works as a data analyst and statistician in the industrial and marketing sector.

Lisainfo e-raamatute kohta

Lisainfo e-raamatute kohta

Püsilink: https://www.kriso.ee/db/97833192870962e.html

Märksõnad: