Klienditugi: 7440010 (E-R 10-18)

Abi | Registreeri | Logi sisse

Minimum Description Length Principle [Pehme köide]

4.08/5 (12 hinnangut Goodreads-ist)

Peter D. Grünwald (Centrum voor Wiskunde en Informatica)

Formaat: Paperback / softback, 736 pages, kõrgus x laius x paksus: 233x178x38 mm, 12 illus.; 12 Illustrations
Sari: Adaptive Computation and Machine Learning series
Ilmumisaeg: 23-Mar-2007
Kirjastus: MIT Press
ISBN-10: 0262529637
ISBN-13: 9780262529631

Teised raamatud teemal:

Mathematical & statistical software - (Hetkel poes: 1 nimetust)

Pehme köide
Hind: 83,80 €
Raamatu kohalejõudmiseks kirjastusest kulub orienteeruvalt 2-4 nädalat
Kogus:
- - 1
  - 2
  - 3
  - 4
  - 5
  - 6
  - 7
  - 8
  - 9
  - 10
Lisa ostukorvi
Tasuta tarne
Tellimisaeg 2-4 nädalat
Lisa soovinimekirja

Formaat: Paperback / softback, 736 pages, kõrgus x laius x paksus: 233x178x38 mm, 12 illus.; 12 Illustrations
Sari: Adaptive Computation and Machine Learning series
Ilmumisaeg: 23-Mar-2007
Kirjastus: MIT Press
ISBN-10: 0262529637
ISBN-13: 9780262529631

Teised raamatud teemal:

Mathematical & statistical software - (Hetkel poes: 1 nimetust)

Püsilink: https://www.kriso.ee/db/9780262529631.html

The minimum description length (MDL) principle is a powerful method of inductive inference, the basis of statistical modeling, pattern recognition, and machine learning. It holds that the best explanation, given a limited set of observed data, is the one that permits the greatest compression of the data. MDL methods are particularly well-suited for dealing with model selection, prediction, and estimation problems in situations where the models under consideration can be arbitrarily complex, and overfitting the data is a serious concern. This extensive, step-by-step introduction to the MDL Principle provides a comprehensive reference (with an emphasis on conceptual issues) that is accessible to graduate students and researchers in statistics, pattern classification, machine learning, and data mining, to philosophers interested in the foundations of statistics, and to researchers in other applied sciences that involve model selection, including biology, econometrics, and experimental psychology.

Part I provides a basic introduction to MDL and an overview of the concepts in statistics and information theory needed to understand MDL. Part II treats universal coding, the information-theoretic notion on which MDL is built, and part III gives a formal treatment of MDL theory as a theory of inductive inference based on universal coding. Part IV provides a comprehensive overview of the statistical theory of exponential families with an emphasis on their information-theoretic properties. The text includes a number of summaries, paragraphs offering the reader a "fast track" through the material, and boxes highlighting the most important concepts.

A comprehensive introduction and reference guide to the minimum description length (MDL) Principle that is accessible to researchers dealing with inductive reference in diverse areas including statistics, pattern classification, machine learning, data mining, biology, econometrics, and experimental psychology, as well as philosophers interested in the foundations of statistics.

A comprehensive introduction and reference guide to the minimum description length (MDL) Principle that is accessible to researchers dealing with inductive reference in diverse areas including statistics, pattern classification, machine learning, data mining, biology, econometrics, and experimental psychology, as well as philosophers interested in the foundations of statistics.

List of Figures

xix

Series Foreword

xxi

Foreword

xxiii

Preface

xxv

I Introductory Material

(164)

1 Learning, Regularity, and Compression

(38)

1.1 Regularity and Learning

(1)

1.2 Regularity and Compression

(4)

1.3 Solomonoff's Breakthrough -- Kolmogorov Complexity

(2)

1.4 Making the Idea Applicable

(2)

1.5 Crude MDL, Refined MDL and Universal Coding

(11)

1.5.1 From Crude to Refined MDL

(3)

1.5.2 Universal Coding and Refined MDL

(1)

1.5.3 Refined MDL for Model Selection

(2)

1.5.4 Refined MDL for Prediction and Hypothesis Selection

(3)

1.6 Some Remarks on Model Selection

(3)

1.6.1 Model Selection among Non-Nested Models

(2)

1.6.2 Goals of Model vs. Point Hypothesis Selection

(1)

1.7 The MDL Philosophy

(3)

1.8 MDL, Occam's Razor, and the "True Model"

(7)

1.8.1 Answer to Criticism No. 1

(2)

1.8.2 Answer to Criticism No. 2

(4)

1.9 History and Forms of MDL

(4)

1.9.1 What Is MDL?

(1)

1.9.2 MDL Literature

(2)

1.10 Summary and Outlook

(1)

2 Probabilistic and Statistical Preliminaries

(38)

2.1 General Mathematical Preliminaries

(5)

2.2 Probabilistic Preliminaries

(16)

2.2.1 Definitions; Notational Conventions

(7)

2.2.2 Probabilistic Sources

(2)

2.2.3 Limit Theorems and Statements

(2)

2.2.4 Probabilistic Models

(3)

2.2.5 Probabilistic Model Classes

(2)

2.3 Kinds of Probabilistic Models*

(7)

2.4 Terminological Preliminaries

(2)

2.5 Modeling Preliminaries: Goals and Methods for Inductive Inference

(7)

2.5.1 Consistency

(3)

2.5.2 Basic Concepts of Bayesian Statistics

(4)

2.6 Summary and Outlook

(1)

3 Information-Theoretic Preliminaries

(30)

3.1 Coding Preliminaries

(11)

3.1.1 Restriction to Prefix Coding Systems; Descriptions as Messages

(3)

3.1.2 Different Kinds of Codes

(4)

3.1.3 Assessing the Efficiency of Description Methods

(1)

3.2 The Most Important Section of This Book: Probabilities and Code Lengths

(11)

3.2.1 The Kraft Inequality

(4)

3.2.2 Code Lengths "Are" Probabilities

(4)

3.2.3 Immediate Insights and Consequences

(2)

3.3 Probabilities and Code Lengths, Part II

101

(5)

3.3.1 (Relative) Entropy and the Information Inequality

103

(3)

3.3.2 Uniform Codes, Maximum Entropy, and Minimax Codelength

106

(1)

3.4 Summary, Outlook, Further Reading

106

(3)

4 Information-Theoretic Properties of Statistical Models

109

(22)

4.1 Introduction

109

(2)

4.2 Likelihood and Observed Fisher Information

111

(6)

4.3 KL Divergence and Expected Fisher Information

117

(7)

4.4 Maximum Likelihood: Data vs. Parameters

124

(6)

4.5 Summary and Outlook

130

(1)

5 Crude Two-Part Code MDL

131

(34)

5.1 Introduction: Making Two-Part MDL Precise

132

(1)

5.2 Two-Part Code MDL for Markov Chain Selection

133

(6)

5.2.1 The Code C2

135

(2)

5.2.2 The Code C1

137

(1)

5.2.3 Crude Two-Part Code MDL for Markov Chains

138

(1)

5.3 Simplistic Two-Part Code MDL Hypothesis Selection

139

(2)

5.4 Two-Part MDL for Tasks Other Than Hypothesis Selection

141

(1)

5.5 Behavior of Two-Part Code MDL

142

(2)

5.6 Two-Part Code MDL and Maximum Likelihood

144

(6)

5.6.1 The Maximum Likelihood Principle

144

(3)

5.6.2 MDL vs. ML

147

(1)

5.6.3 MDL as a Maximum Probability Principle

148

(2)

5.7 Computing and Approximating Two-Part MDL in Practice

150

(2)

5.8 Justifying Crude MDL: Consistency and Code Design

152

(11)

5.8.1 A General Consistency Result

153

(4)

5.8.2 Code Design for Two-Part Code MDL

157

(6)

5.9 Summary and Outlook

163

(1)

5.A Appendix: Proof of Theorem 5.1

163

(2)

II Universal Coding

165

(238)

6 Universal Coding with Countable Models

171

(36)

6.1 Universal Coding: The Basic Idea

172

(6)

6.1.1 Two-Part Codes as Simple Universal Codes

174

(1)

6.1.2 From Universal Codes to Universal Models

175

(2)

6.1.3 Formal Definition of Universality

177

(1)

6.2 The Finite Case

178

(6)

6.2.1 Minimax Regret and Normalized ML

179

(3)

6.2.2 NML vs. Two-Part vs. Bayes

182

(2)

6.3 The Countably Infinite Case

184

(6)

6.3.1 The Two-Part and Bayesian Codes

184

(3)

6.3.2 The NML Code

187

(3)

6.4 Prequential Universal Models

190

(9)

6.4.1 Distributions as Prediction Strategies

190

(3)

6.4.2 Bayes Is Prequential; NML and Two-part Are Not

193

(4)

6.4.3 The Prequential Plug-In Model

197

(2)

6.5 Individual vs. Stochastic Universality*

199

(5)

6.5.1 Stochastic Redundancy

199

(2)

6.5.2 Uniformly Universal Models

201

(3)

6.6 Summary, Outlook and Further Reading

204

(3)

7 Parametric Models: Normalized Maximum Likelihood

207

(24)

7.1 Introduction

207

(4)

7.1.1 Preliminaries

208

(3)

7.2 Asymptotic Expansion of Parametric Complexity

211

(5)

7.3 The Meaning of ∫ det I(θ)dθ

216

(10)

7.3.1 Complexity and Functional Form

217

(2)

7.3.2 KL Divergence and Distinguishability

219

(3)

7.3.3 Complexity and Volume

222

(2)

7.3.4 Complexity and the Number of Distinguishable Distributions*

224

(2)

7.4 Explicit and Simplified Computations

226

(5)

8 Parametric Models: Bayes

231

(26)

8.1 The Bayesian Regret

231

(3)

8.1.1 Basic Interpretation of Theorem 8.1

233

(1)

8.2 Bayes Meets Minimax - Jeffreys' Prior

234

(5)

8.2.1 Jeffreys' Prior and the Boundary

237

(2)

8.3 How to Prove the Bayesian and NML Regret Theorems

239

(5)

8.3.1 Proof Sketch of Theorem 8.1

239

(2)

8.3.2 Beyond Exponential Families

241

(2)

8.3.3 Proof Sketch of Theorem 7.1

243

(1)

8.4 Stochastic Universality*

244

(4)

8.A Appendix: Proofs of Theorem 8.1 and Theorem 8.2

248

(9)

9 Parametric Models: Prequential Plug-in

257

(14)

9.1 Prequential Plug-in for Exponential Families

257

(5)

9.2 The Plug-in vs. the Bayes Universal Model

262

(3)

9.3 More Precise Asymptotics

265

(4)

9.4 Summary

269

(2)

10 Parametric Models: Two-Part

271

(24)

10.1 The Ordinary Two-Part Universal Model

271

(13)

10.1.1 Derivation of the Two-Part Code Regret

274

(3)

10.1.2 Proof Sketch of Theorem 10.1

277

(5)

10.1.3 Discussion

282

(2)

10.2 The Conditional Two-Part Universal Code*

284

(9)

10.2.1 Conditional Two-Part Codes for Discrete Exponential Families

286

(4)

10.2.2 Distinguishability and the Phase Transition*

290

(3)

10.3 Summary and Outlook

293

(2)

11 NML With Infinite Complexity

295

(40)

11.1 Introduction

295

(6)

11.1.1 Examples of Undefined NML Distribution

298

(1)

11.1.2 Examples of Undefined Jeffreys' Prior

299

(2)

11.2 Metauniversal Codes

301

(7)

11.2.1 Constrained Parametric Complexity

302

(1)

11.2.2 Meta-Two-Part Coding

303

(3)

11.2.3 Renormalized Maximum Likelihood*

306

(2)

11.3 NML with Luckiness

308

(8)

11.3.1 Asymptotic Expansion of LNML

312

(4)

11.4 Conditional Universal Models

316

(13)

11.4.1 Bayesian Approach with Jeffreys' Prior

317

(3)

11.4.2 Conditional NML

320

(5)

11.4.3 Liang and Barron's Approach

325

(4)

11.5 Summary and Remarks

329

(1)

11.A Appendix: Proof of Theorem 11.4

329

(6)

12 Linear Regression

335

(34)

12.1 Introduction

336

(4)

12.1.1 Prelude: The Normal Location Family

338

(2)

12.2 Least-Squares Estimation

340

(8)

12.2.1 The Normal Equations

342

(3)

12.2.2 Composition of Experiments

345

(1)

12.2.3 Penalized Least-Squares

346

(2)

12.3 The Linear Model

348

(15)

12.3.1 Bayesian Linear Model Mx with Gaussian Prior

354

(5)

12.3.2 Bayesian Linear Models Mx and Sx with Noninformative Priors

359

(4)

12.4 Universal Models for Linear Regression

363

(6)

12.4.1 NML

363

(1)

12.4.2 Bayes and LNML

364

(1)

12.4.3 Bayes-Jeffreys and CNML

365

(4)

13 Beyond Parametrics

369

(34)

13.1 Introduction

370

(2)

13.2 CUP: Unions of Parametric Models

372

(4)

13.2.1 CUP vs. Parametric Models

375

(1)

13.3 Universal Codes Based on Histograms

376

(7)

13.3.1 Redundancy of Universal CUP Histogram Codes

380

(3)

13.4 Nonparametric Redundancy

383

(7)

13.4.1 Standard CUP Universal Codes

384

(3)

13.4.2 Minimax Nonparametric Redundancy

387

(3)

13.5 Gaussian Process Regression*

390

(12)

13.5.1 Kernelization of Bayesian Linear Regression

390

(4)

13.5.2 Gaussian Processes

394

(2)

13.5.3 Gaussian Processes as Universal Models

396

(6)

13.6 Conclusion and Further Reading

402

(1)

III Refined MDL

403

(194)

14 MDL Model Selection

409

(50)

14.1 Introduction

409

(2)

14.2 Simple Refined MDL Model Selection

411

(9)

14.2.1 Compression Interpretation

415

(1)

14.2.2 Counting Interpretation

416

(2)

14.2.3 Bayesian Interpretation

418

(1)

14.2.4 Prequential Interpretation

419

(1)

14.3 General Parametric Model Selection

420

(8)

14.3.1 Models with Infinite Complexities

420

(2)

14.3.2 Comparing Many or Infinitely Many Models

422

(3)

14.3.3 The General Picture

425

(3)

14.4 Practical Issues in MDL Model Selection

428

(10)

14.4.1 Calculating Universal Codelengths

428

(1)

14.4.2 Computational Efficiency and Practical Quality of Non-NML Universal Codes

429

(2)

14.4.3 Model Selection with Conditional NML and Plug-in Codes

431

(4)

14.4.4 General Warnings about Model Selection

435

(3)

14.5 MDL Model Selection for Linear Regression

438

(13)

14.5.1 Rissanen's RNML Approach

439

(4)

14.5.2 Hansen and Yu's gMDL Approach

443

(3)

14.5.3 Liang and Barron's Approach

446

(2)

14.5.4 Discussion

448

(3)

14.6 Worst Case vs. Average Case*

451

(8)

15 MDL Prediction and Estimation

459

(42)

15.1 Introduction

459

(1)

15.2 MDL for Prediction and Predictive Estimation

460

(16)

15.2.1 Prequential MDL Estimators

461

(4)

15.2.2 Prequential MDL Estimators Are Consistent

465

(4)

15.2.3 Parametric and Nonparametric Examples

469

(3)

15.2.4 Cesaro KL consistency vs. KL consistency*

472

(4)

15.3 Two-Part Code MDL for Point Hypothesis Selection

476

(7)

15.3.1 Discussion of Two-Part Consistency Theorem

478

(5)

15.4 MDL Parameter Estimation

483

(15)

15.4.1 MDL Estimators vs. Luckiness ML Estimators

487

(4)

15.4.2 What Estimator To Use?

491

(2)

15.4.3 Comparison to Bayesian Estimators*

493

(5)

15.5 Summary and Outlook

498

(1)

15.A Appendix: Proof of Theorem 15.3

499

(2)

16 MDL Consistency and Convergence

501

(22)

16.1 Introduction

501

(1)

16.1.1 The Scenarios Considered

501

(1)

16.2 Consistency: Prequential and Two-Part MDL Estimators

502

(3)

16.3 Consistency: MDL Model Selection

505

(6)

16.3.1 Selection between a Union of Parametric Models

505

(3)

16.3.2 Nonparametric Model Selection Based on CUP Model Class

508

(3)

16.4 MDL Consistency Peculiarities

511

(4)

16.5 Risks and Rates

515

(5)

16.5.1 Relations between Divergences and Risk Measures

517

(2)

16.5.2 Minimax Rates

519

(1)

16.6 MDL Rates of Convergence

520

(3)

16.6.1 Prequential and Two-Part MDL Estimators

520

(2)

16.6.2 MDL Model Selection

522

(1)

17 MDL in Context

523

(74)

17.1 MDL and Frequentist Paradigms

524

(7)

17.1.1 Sanity Check or Design Principle?

525

(3)

17.1.2 The Weak Prequential Principle

528

(1)

17.1.3 MDL vs. Frequentist Principles: Remaining Issues

529

(2)

17.2 MDL and Bayesian Inference

531

(18)

17.2.1 Luckiness Functions vs. Prior Distributions

534

(5)

17.2.2 MDL, Bayes, and Occam

539

(5)

17.2.3 MDL and Brands of Bayesian Statistics

544

(4)

17.2.4 Conclusion: a Common Future after All?

548

(1)

17.3 MDL, AIC and BIC

549

(6)

17.3.1 BIC

549

(1)

17.3.2 AIC

550

(2)

17.3.3 Combining the Best of AIC and BIC

552

(3)

17.4 MDL and MML

555

(7)

17.4.1 Strict Minimum Message Length

556

(2)

17.4.2 Comparison to MDL

558

(2)

17.4.3 The Wallace-Freeman Estimator

560

(2)

17.5 MDL and Prequential Analysis

562

(3)

17.6 MDL and Cross-Validation

565

(2)

17.7 MDL and Maximum Entropy

567

(3)

17.8 Kolmogorov Complexity and Structure Function

570

(3)

17.9 MDL and Individual Sequence Prediction

573

(6)

17.10 MDL and Statistical Learning Theory

579

(13)

17.10.1 Structural Risk Minimization

581

(4)

17.10.2 PAC-Bayesian Approaches

585

(3)

17.10.3 PAC-Bayes and MDL

588

(4)

17.11 The Road Ahead

592

(5)

IV Additional Background

597

(54)

18 The Exponential or "Maximum Entropy" Families

599

(24)

18.1 Introduction

600

(1)

18.2 Definition and Overview

601

(4)

18.3 Basic Properties

605

(4)

18.4 Mean-Value, Canonical, and Other Parameterizations

609

(8)

18.4.1 The Mean Value Parameterization

609

(2)

18.4.2 Other Parameterizations

611

(2)

18.4.3 Relating Mean-Value and Canonical Parameters

613

(4)

18.5 Exponential Families of General Probabilistic Sources*

617

(2)

18.6 Fisher Information Definitions and Characterizations*

619

(4)

19 Information-Theoretic Properties of Exponential Families

623

(28)

19.1 Introduction

624

(1)

19.2 Robustness of Exponential Family Codes

624

(5)

19.2.1 If mean Does Not Contain the Mean

627

(2)

19.3 Behavior at the ML Estimate β

629

(3)

19.4 Behavior of the ML Estimate β

632

(5)

19.4.1 Central Limit Theorem

633

(1)

19.4.2 Large Deviations

634

(3)

19.5 Maximum Entropy and Minimax Codelength

637

(8)

19.5.1 Exponential Families and Maximum Entropy

638

(3)

19.5.2 Exponential Families and Minimax Codelength

641

(2)

19.5.3 The Compression Game

643

(2)

19.6 Likelihood Ratio Families and Renyi Divergences*

645

(5)

19.6.1 The Likelihood Ratio Family

647

(3)

19.7 Summary

650

(1)

References

651

(24)

List of Symbols

675

(4)

Subject Index

679

Minimum Description Length Principle [Pehme köide]

Konto & seaded

Otsing

Otsingu andmebaas

Filtreeri tulemusi

Teemad Ingliskeelsed raamatud

Vali ostukorv