Klienditugi: 7440010 (E-R 10-18)

Abi | Registreeri | Logi sisse

Data Preparation for Data Mining [Pehme köide]

Dorian Pyle (Chief Scientist and Founder of PTI, Leominster, MA, USA)

Formaat: Paperback / softback, 560 pages, kaal: 1010 g
Sari: The Morgan Kaufmann Series in Data Management Systems
Ilmumisaeg: 15-Apr-1999
Kirjastus: Morgan Kaufmann Publishers In
ISBN-10: 1558605290
ISBN-13: 9781558605299

Teised raamatud teemal:

Pehme köide
Hind: 83,79 €
Raamatu kohalejõudmiseks kirjastusest kulub orienteeruvalt 2-4 nädalat
Kogus:
- - 1
  - 2
  - 3
  - 4
  - 5
  - 6
  - 7
  - 8
  - 9
  - 10
Lisa ostukorvi
Tasuta tarne
Tellimisaeg 2-4 nädalat
Lisa soovinimekirja

Formaat: Paperback / softback, 560 pages, kaal: 1010 g
Sari: The Morgan Kaufmann Series in Data Management Systems
Ilmumisaeg: 15-Apr-1999
Kirjastus: Morgan Kaufmann Publishers In
ISBN-10: 1558605290
ISBN-13: 9781558605299

Teised raamatud teemal:

Püsilink: https://www.kriso.ee/db/9781558605299.html

Märksõnad:

Data Preparation for Data Mining addresses an issue unfortunately ignored by most authorities on data mining: data preparation. Thanks largely to its perceived difficulty, data preparation has traditionally taken a backseat to the more alluring question of how best to extract meaningful knowledge. But without adequate preparation of your data, the return on the resources invested in mining is certain to be disappointing.

Dorian Pyle corrects this imbalance. A twenty-five-year veteran of what has become the data mining industry, Pyle shares his own successful data preparation methodology, offering both a conceptual overview for managers and complete technical details for IT professionals. Apply his techniques and watch your mining efforts pay off-in the form of improved performance, reduced distortion, and more valuable results.

On the enclosed CD-ROM, you'll find a suite of programs as C source code and compiled into a command-line-driven toolkit. This code illustrates how the author's techniques can be applied to arrive at an automated preparation solution that works for you. Also included are demonstration versions of three commercial products that help with data preparation, along with sample data with which you can practice and experiment.

* Offers in-depth coverage of an essential but largely ignored subject.
* Goes far beyond theory, leading you-step by step-through the author's own data preparation techniques.
* Provides practical illustrations of the author's methodology using realistic sample data sets.
* Includes algorithms you can apply directly to your own project, along with instructions for understanding when automation is possible and when greater intervention is required.
* Explains how to identify and correct data problems that may be present in your application.
* Prepares miners, helping them head into preparation with a better understanding of data sets and their limitations.

Muu info

* Offers in-depth coverage of an essential but largely ignored subject. * Goes far beyond theory, leading you-step by step-through the author's own data preparation techniques. * Provides practical illustrations of the author's methodology using realistic sample data sets. * Includes algorithms you can apply directly to your own project, along with instructions for understanding when automation is possible and when greater intervention is required. * Explains how to identify and correct data problems that may be present in your application. * Prepares miners, helping them head into preparation with a better understanding of data sets and their limitations.

Preface

xvii

Introduction

(8)

Data Exploration as a Process

(36)

The Data Exploration Process

(18)

Stage 1: Exploring the Problem Space

(7)

Stage 2: Exploring the Solution Space

(3)

Stage 3: Specifying the Implementation Method

(1)

Stage 4: Mining the Data

(6)

Exploration: Mining and Modeling

(1)

Data Mining, Modeling, and Modeling Tools

(9)

Ten Golden Rules

(1)

Introducing Modeling Tools

(2)

Types of Models

(1)

Active and Passive Models

(1)

Explanatory and Predictive Models

(2)

Static and Continuously Learning Models

(2)

Summary

(2)

Supplemental Material

(6)

A Continuously Learning Model Application

(1)

How the Continuously Learning Model Worked

(5)

The Nature of the World and Its Impact on Data Preparation

(44)

Measuring the World

(7)

Objects

(1)

Capturing Measurements

(1)

Errors of Measurement

(5)

Typing Measurements to the Real World

(1)

Types of Measurements

(7)

Scalar Measurements

(6)

Nonscalar Measurements

(1)

Continua of Attributes of Variables

(6)

The Qualitative-Quantitative Continuum

(1)

The Discrete-Continuous Continuum

(5)

Scale Measurement Example

(1)

Transformations and Difficulties---Variables, Data, and Information

(1)

Building Mineable Data Representations

(19)

Data Representation

(1)

Building Data---Dealing with Variables

(8)

Building Mineable Data Sets

(9)

Summary

(1)

Supplemental Material

(2)

Combinations

(2)

Data Preparation as a Process

(36)

Data Preparation: Inputs, Outputs, Models, and Decisions

(10)

Step 1: Prepare the Data

(5)

Step 2: Survey the Data

(1)

Step 3: Model the Data

(1)

Use the Model

(2)

Modeling Tools and Data Preparation

100

(12)

How Modeling Tools Drive Data Preparation

102

(2)

Decision Trees

104

(1)

Decision Lists

104

(3)

Neural Networks

107

(1)

Evolution Programs

107

(1)

Modeling Data with the Tools

107

(2)

Predictions and Rules

109

(2)

Choosing Techniques

111

(1)

Missing Data and Modeling Tools

111

(1)

Stages of Data Preparation

112

(10)

Stage 1: Accessing the Data

112

(1)

Stage 2: Auditing the Data

113

(1)

Stage 3: Enhancing and Enriching the Data

114

(1)

Stage 4: Looking for Sampling Bias

114

(1)

Stage 5: Determining Data Structure (Super-, Macro-, and Micro-)

115

(1)

Stage 6: Building the PIE

116

(5)

Stage 7: Surveying the Data

121

(1)

Stage 8: Modeling the Data

122

(1)

And the Result Is . . .?

122

(3)

Getting the Data: Basic Preparation

125

(30)

Data Discovery

127

(2)

Data Access Issues

127

(2)

Data Characterization

129

(6)

Detail/Aggregation Level (Granularity)

129

(2)

Consistency

131

(1)

Pollution

132

(1)

Objects

133

(1)

Relationship

133

(1)

Domain

133

(1)

Defaults

134

(1)

Integrity

134

(1)

Concurrency

135

(1)

Duplicate or Redundant Variables

135

(1)

Data Set Assembly

135

(6)

Reverse Pivoting

136

(1)

Feature Extraction

137

(1)

Physical or Behavioral Data Sets

138

(1)

Explanatory Structure

138

(1)

Data Enhancement or Enrichment

139

(1)

Sampling Bias

140

(1)

Example 1: Credit

141

(8)

Looking at the Variables

141

(5)

Relationships between Variables

146

(3)

Example 2: Shoe

149

(2)

Looking at the Variables

149

(1)

Relationships between Variables

150

(1)

The Data Assay

151

(4)

Sampling, Variability, and Confidence

155

(36)

Sampling, or First Catch Your Hare!

155

(11)

How Much Data?

155

(1)

Variability

156

(3)

Converging on a Representative Sample

159

(3)

Measuring Variability

162

(1)

Variability and Deviation

162

(4)

Confidence

166

(1)

Variability of Numeric Variables

167

(3)

Variability and Sampling

168

(1)

Variability and Convergence

168

(2)

Variability and Confidence in Alpha Variables

170

(2)

Ordering and Rate of Discovery

171

(1)

Measuring Confidence

172

(6)

Modeling and Confidence with the Whole Population

172

(1)

Testing for Confidence

173

(3)

Confidence Tests and Variability

176

(2)

Confidence in Capturing Variability

178

(6)

A Brief Introduction to the Normal Distribution

178

(2)

Normally Distributed Probabilities

180

(1)

Capturing Normally Distributed Probabilities: An Example

181

(1)

Capturing Confidence, Capturing Variance

182

(2)

Problems and Shortcomings of Taking Samples Using Variability

184

(4)

Missing Values

184

(1)

Constants (Variables with Only One Value)

185

(1)

Problems with Sampling

185

(1)

Monotonic Variable Detection

186

(1)

Interstitial Linearity

187

(1)

Rate of Discovery

187

(1)

Confidence and Instance Count

188

(1)

Summary

188

(1)

Supplemental Material

189

(2)

Confidence Samples

189

(2)

Handling Nonnumerical Variables

191

(48)

Representing Alphas and Remapping

192

(10)

One-of-n Remapping

193

(1)

m-of-n Remapping

194

(1)

Remapping to Eliminate Ordering

195

(1)

Remapping One-to-Many Patterns, or Ill-Formed Problems

196

(4)

Remapping Circular Discontinuity

200

(2)

State Space

202

(20)

Unit State Space

202

(2)

Pythagoras in State Space

204

(1)

Position in State Space

204

(1)

Neighbors and Associates

205

(1)

Density and Sparsity

206

(5)

Nearby and Distant Nearest Neighbors

211

(1)

Normalizing Measured Point Separation

211

(2)

Contours, Peaks, and Valleys

213

(1)

Mapping State Space

213

(1)

Objects in State Space

213

(1)

Phase Space

214

(1)

Mapping Alpha Values

215

(1)

Location, Location, Location!

216

(1)

Numerics, Alphas, and the Montreal Canadiens

216

(6)

Joint Distribution Tables

222

(8)

Two-Way Tables

223

(5)

More Values, More Variables, and Meaning of the Numeration

228

(1)

Dealing with Low-Frequency Alpha Labels and Other Problems

229

(1)

Dimensionality

230

(5)

Multidimensional Scaling

230

(1)

Squashing a Triangle

231

(3)

Projecting Alpha Values

234

(1)

Scree Plots

234

(1)

Practical Consideration---Implementing Alpha Numeration in the Demonstration Code

235

(3)

Implementing Neighborhoods

235

(2)

Implementing Numeration in All Alpha Data Sets

237

(1)

Implementing Dimensionality Reduction for Variables

237

(1)

Summary

238

(1)

Normalizing and Redistributing Variables

239

(36)

Normalizing a Variable's Range

240

(19)

Review of Data Preparation and Modeling (Training, Testing, and Execution)

241

(1)

The Nature and Scope of the Out-of-Range Values Problem

242

(1)

Discovering the Range of Values When Building the PIE

243

(4)

Out-of-Range Values When Training

247

(2)

Out-of-Range Values When Testing

249

(1)

Out-of-Range Values When Executing

250

(1)

Scaling Transformations

251

(6)

Softmax Scaling

257

(1)

Normalizing Ranges

258

(1)

Redistributing Variable Values

259

(10)

The Nature of Distributions

259

(1)

Distributive Difficulties

260

(1)

Adjusting Distributions

261

(5)

Modified Distributions

266

(3)

Summary

269

(2)

Supplemental Material

271

(4)

The Logistic Function

271

(3)

Modifying the Linear Part of the Logistic Function Range

274

(1)

Replacing Missing and Empty Values

275

(24)

Retaining Information about Missing Values

275

(3)

Missing-Value Patterns

276

(1)

Capturing Patterns

277

(1)

Replacing Missing Values

278

(7)

Unbiased Estimators

279

(1)

Variability Relationships

279

(3)

Relationships between Variables

282

(2)

Preserving Between-Variable Relationships

284

(1)

Summary

285

(1)

Supplemental Material

286

(13)

Using Regression to Find Least Information-Damaging Missing Values

286

(8)

Alternative Methods of Missing-Value Replacement

294

(5)

Series Variables

299

(52)

Here There Be Dragons!

300

(1)

Types of Series

300

(1)

Describing Series Data

301

(19)

Constructing a Series

302

(1)

Features of a Series

302

(1)

Describing a Series---Fourier

303

(4)

Describing a Series---Spectrum

307

(7)

Describing a Series---Trend, Seasonality, Cycles, Noise

314

(2)

Describing a Series---Autocorrelation

316

(4)

Modeling Series Data

320

(1)

Repairing Series Data Problems

320

(5)

Missing Values

320

(2)

Outliers

322

(1)

Nonuniform Displacement

322

(1)

Trend

323

(2)

Tools

325

(14)

Filtering

325

(1)

Moving Averages

326

(7)

Smoothing 1---PVM Smoothing

333

(1)

Smoothing 2---Median Smoothing, Resmoothing, and Hanning

333

(2)

Extraction

335

(1)

Differencing

336

(3)

Data Preparation for Data Mining [Pehme köide]

Muu info

Konto & seaded

Otsing

Otsingu andmebaas

Filtreeri tulemusi

Teemad Ingliskeelsed raamatud

Vali ostukorv