Klienditugi: 7440010 (E-R 10-18)

Abi | Registreeri | Logi sisse

E-raamat: The Minimum Description Length Principle

4.08/5 (24 hinnangut Goodreads-ist)

Foreword by Jorma Rissanen, Peter D. Grunwald

Formaat: 736 pages
Sari: Adaptive Computation and Machine Learning Series
Ilmumisaeg: 14-May-2014
Kirjastus: MIT Press
ISBN-13: 9780262256292

Teised raamatud teemal:

Formaat - PDF+DRM
Hind: 83,62 €*
* hind on lõplik, st. muud allahindlused enam ei rakendu
Lisa ostukorvi
Lisa soovinimekirja
See e-raamat on mõeldud ainult isiklikuks kasutamiseks. E-raamatuid ei saa tagastada.

Formaat: 736 pages
Sari: Adaptive Computation and Machine Learning Series
Ilmumisaeg: 14-May-2014
Kirjastus: MIT Press
ISBN-13: 9780262256292

Teised raamatud teemal:

DRM piirangud

Kopeerimine (copy/paste):

ei ole lubatud
Printimine:

ei ole lubatud
Kasutamine:

Digitaalõiguste kaitse (DRM)
Kirjastus on väljastanud selle e-raamatu krüpteeritud kujul, mis tähendab, et selle lugemiseks peate installeerima spetsiaalse tarkvara. Samuti peate looma endale Adobe ID Rohkem infot siin. E-raamatut saab lugeda 1 kasutaja ning alla laadida kuni 6'de seadmesse (kõik autoriseeritud sama Adobe ID-ga).

Vajalik tarkvara
Mobiilsetes seadmetes (telefon või tahvelarvuti) lugemiseks peate installeerima selle tasuta rakenduse: PocketBook Reader (iOS / Android)

PC või Mac seadmes lugemiseks peate installima Adobe Digital Editionsi (Seeon tasuta rakendus spetsiaalselt e-raamatute lugemiseks. Seda ei tohi segamini ajada Adober Reader'iga, mis tõenäoliselt on juba teie arvutisse installeeritud )

Seda e-raamatut ei saa lugeda Amazon Kindle's.

The minimum description length (MDL) principle is a powerful method of inductive inference, the basis of statistical modeling, pattern recognition, and machine learning. It holds that the best explanation, given a limited set of observed data, is the one that permits the greatest compression of the data. MDL methods are particularly well-suited for dealing with model selection, prediction, and estimation problems in situations where the models under consideration can be arbitrarily complex, and overfitting the data is a serious concern.

This extensive, step-by-step introduction to the MDL Principle provides a comprehensive reference (with an emphasis on conceptual issues) that is accessible to graduate students and researchers in statistics, pattern classification, machine learning, and data mining, to philosophers interested in the foundations of statistics, and to researchers in other applied sciences that involve model selection, including biology, econometrics, and experimental psychology. Part I provides a basic introduction to MDL and an overview of the concepts in statistics and information theory needed to understand MDL. Part II treats universal coding, the information-theoretic notion on which MDL is built, and part III gives a formal treatment of MDL theory as a theory of inductive inference based on universal coding. Part IV provides a comprehensive overview of the statistical theory of exponential families with an emphasis on their information-theoretic properties. The text includes a number of summaries, paragraphs offering the reader a "fast track" through the material, and boxes highlighting the most important concepts.

List of Figures

xix

Series Foreword

xxi

Foreword

xxiii

Preface

xxv

I. Introductory Material

(164)

Learning, Regularity, and Compression

(38)

Regularity and Learning

(1)

Regularity and Compression

(4)

Solomonoff's Breakthrough -- Kolmogorov Complexity

(2)

Making the Idea Applicable

(2)

Crude MDL, Refined MDL and Universal Coding

(11)

From Crude to Refined MDL

(3)

Universal Coding and Refined MDL

(1)

Refined MDL for Model Selection

(2)

Refined MDL for Prediction and Hypothesis Selection

(3)

Some Remarks on Model Selection

(3)

Model Selection among Non-Nested Models

(2)

Goals of Model vs. Point Hypothesis Selection

(1)

The MDL Philosophy

(3)

MDL, Occam's Razor, and the ``True Model''

(7)

Answer to Criticism No. 1

(2)

Answer to Criticism No. 2

(4)

History and Forms of MDL

(4)

What Is MDL?

(1)

MDL Literature

(2)

Summary and Outlook

(1)

Probabilistic and Statistical Preliminaries

(38)

General Mathematical Preliminaries

(5)

Probabilistic Preliminaries

(16)

Definitions; Notational Conventions

(7)

Probabilistic Sources

(2)

Limit Theorems and Statements

(2)

Probabilistic Models

(3)

Probabilistic Model Classes

(2)

Kinds of Probabilistic Models

(7)

Terminological Preliminaries

(2)

Modeling Preliminaries: Goals and Methods for Inductive Inference

(7)

Consistency

(3)

Basic Concepts of Bayesian Statistics

(4)

Summary and Outlook

(1)

Information-Theoretic Preliminaries

(30)

Coding Preliminaries

(11)

Restriction to Prefix Coding Systems; Descriptions as Messages

(3)

Different Kinds of Codes

(4)

Assessing the Efficiency of Description Methods

(1)

The Most Important Section of This Book: Probabilities and Code Lengths

(11)

The Kraft Inequality

(4)

Code Lengths ``Are'' Probabilities

(4)

Immediate Insights and Consequences

(2)

Probabilities and Code Lengths, Part II

101

(5)

(Relative) Entropy and the Information Inequality

103

(3)

Uniform Codes, Maximum Entropy, and Minimax Codelength

106

(1)

Summary, Outlook, Further Reading

106

(3)

Information-Theoretic Properties of Statistical Models

109

(22)

Introduction

109

(2)

Likelihood and Observed Fisher Information

111

(6)

KL Divergence and Expected Fisher Information

117

(7)

Maximum Likelihood: Data vs. Parameters

124

(6)

Summary and Outlook

130

(1)

Crude Two-Part Code MDL

131

(34)

Introduction: Making Two-Part MDL Precise

132

(1)

Two-Part Code MDL for Markov Chain Selection

133

(6)

The Code C2

135

(2)

The Code C1

137

(1)

Crude Two-Part Code MDL for Markov Chains

138

(1)

Simplistic Two-Part Code MDL Hypothesis Selection

139

(2)

Two-Part MDL for Tasks Other Than Hypothesis Selection

141

(1)

Behavior of Two-Part Code MDL

142

(2)

Two-Part Code MDL and Maximum Likelihood

144

(6)

The Maximum Likelihood Principle

144

(3)

MDL vs. ML

147

(1)

MDL as a Maximum Probability Principle

148

(2)

Computing and Approximating Two-Part MDL in Practice

150

(2)

Justifying Crude MDL: Consistency and Code Design

152

(11)

A General Consistency Result

153

(4)

Code Design for Two-Part Code MDL

157

(6)

Summary and Outlook

163

(1)

Appendix: Proof of Theorem 5.1

163

(2)

II. Universal Coding

165

(238)

Universal Coding with Countable Models

171

(36)

Universal Coding: The Basic Idea

172

(6)

Two-Part Codes as Simple Universal Codes

174

(1)

From Universal Codes to Universal Models

175

(2)

Formal Definition of Universality

177

(1)

The Finite Case

178

(6)

Minimax Regret and Normalized ML

179

(3)

NML vs. Two-Part vs. Bayes

182

(2)

The Countably Infinite Case

184

(6)

The Two-Part and Bayesian Codes

184

(3)

The NML Code

187

(3)

Prequential Universal Models

190

(9)

Distributions as Prediction Strategies

190

(3)

Bayes Is Prequential; NML and Two-part Are Not

193

(4)

The Prequential Plug-In Model

197

(2)

Individual vs. Stochastic Universality

199

(5)

Stochastic Redundancy

199

(2)

Uniformly Universal Models

201

(3)

Summary, Outlook and Further Reading

204

(3)

Parametric Models: Normalized Maximum Likelihood

207

(24)

Introduction

207

(4)

Preliminaries

208

(3)

Asymptotic Expansion of Parametric Complexity

211

(5)

The Meaning of ∫Θ √det I(theta;)dθ

216

(10)

Complexity and Functional Form

217

(2)

KL Divergence and Distinguishability

219

(3)

Complexity and Volume

222

(2)

Complexity and the Number of Distinguishable Distributions

224

(2)

Explicit and Simplified Computations

226

(5)

Parametric Models: Bayes

231

(26)

The Bayesian Regret

231

(3)

Basic Interpretation of Theorem 8.1

233

(1)

Bayes Meets Minimax -- Jeffreys' Prior

234

(5)

Jeffreys' Prior and the Boundary

237

(2)

How to Prove the Bayesian and NML Regret Theorems

239

(5)

Proof Sketch of Theorem 8.1

239

(2)

Beyond Exponential Families

241

(2)

Proof Sketch of Theorem 7.1

243

(1)

Stochastic Universality

244

(4)

Appendix: Proofs of Theorem 8.1 and Theorem 8.2

248

(9)

Parametric Models: Prequential Plug-in

257

(14)

Prequential Plug-in for Exponential Families

257

(5)

The Plug-in vs. the Bayes Universal Model

262

(3)

More Precise Asymptotics

265

(4)

Summary

269

(2)

Parametric Models: Two-Part

271

(24)

The Ordinary Two-Part Universal Model

271

(13)

Derivation of the Two-Part Code Regret

274

(3)

Proof Sketch of Theorem 10.1

277

(5)

Discussion

282

(2)

The Conditional Two-Part Universal Code

284

(9)

Conditional Two-Part Codes for Discrete Exponential Families

286

(4)

Distinguishability and the Phase Transition

290

(3)

Summary and Outlook

293

(2)

NML With Infinite Complexity

295

(40)

Introduction

295

(6)

Examples of Undefined NML Distribution

298

(1)

Examples of Undefined Jeffreys' Prior

299

(2)

Metauniversal Codes

301

(7)

Constrained Parametric Complexity

302

(1)

Meta-Two-Part Coding

303

(3)

Renormalized Maximum Likelihood

306

(2)

NML with Luckiness

308

(8)

Asymptotic Expansion of LNML

312

(4)

Conditional Universal Models

316

(13)

Bayesian Approach with Jeffrey's Prior

317

(3)

Conditional NML

320

(5)

Liang and Barron's Approach

325

(4)

Summary and Remarks

329

(1)

Appendix: Proof of Theorem 11.4

329

(6)

Linear Regression

335

(34)

Introduction

336

(4)

Prelude: The Normal Location Family

338

(2)

Least-Squares Estimation

340

(8)

The Normal Equations

342

(3)

Composition of Experiments

345

(1)

Penalized Least-Squares

346

(2)

The Linear Model

348

(15)

Bayesian Linear Model Mx with Gaussian Prior

354

(5)

Bayesian Linear Models Mx and Sx with Noninformative Priors

359

(4)

Universal Models for Linear Regression

363

(6)

NML

363

(1)

Bayes and LNML

364

(1)

Bayes-Jeffreys and CNML

365

(4)

Beyond Parametrics

369

(34)

Introduction

370

(2)

CUP: Unions of Parametric Models

372

(4)

CUP vs. Parametric Models

375

(1)

Universal Codes Based on Histograms

376

(7)

Redundancy of Universal CUP Histogram Codes

380

(3)

Nonparametric Redundancy

383

(7)

Standard CUP Universal Codes

384

(3)

Minimax Nonparametric Redundancy

387

(3)

Gaussian Process Regression

390

(12)

Kernelization of Bayesian Linear Regression

390

(4)

Gaussian Processes

394

(2)

Gaussian Processes as Universal Models

396

(6)

Conclusion and Further Reading

402

(1)

III. Refined MDL

403

(194)

MDL Model Selection

409

(50)

Introduction

409

(2)

Simple Refined MDL Model Selection

411

(9)

Compression Interpretation

415

(1)

Counting Interpretation

416

(2)

Bayesian Interpretation

418

(1)

Prequential Interpretation

419

(1)

General Parametric Model Selection

420

(8)

Models with Infinite Complexities

420

(2)

Comparing Many or Infinitely Many Models

422

(3)

The General Picture

425

(3)

Practical Issues in MDL Model Selection

428

(10)

Calculating Universal Codelengths

428

(1)

Computational Efficiency and Practical Quality of Non-NML Universal Codes

429

(2)

Model Selection with Conditional NML and Plug-in Codes

431

(4)

General Warnings about Model Selection

435

(3)

MDL Model Selection for Linear Regression

438

(13)

Rissanen's RNML Approach

439

(4)

Hansen and Yu's gMDL Approach

443

(3)

Liang and Barron's Approach

446

(2)

Discussion

448

(3)

Worst Case vs. Average Case

451

(8)

MDL Prediction and Estimation

459

(42)

Introduction

459

(1)

MDL for Prediction and Predictive Estimation

460

(16)

Prequential MDL Estimators

461

(4)

Prequential MDL Estimators Are Consistent

465

(4)

Parametric and Nonparametric Examples

469

(3)

Cesaro KL consistency vs. KL consistency

472

(4)

Two-Part Code MDL for Point Hypothesis Selection

476

(7)

Discussion of Two-Part Consistency Theorem

478

(5)

MDL Parameter Estimation

483

(15)

MDL Estimators vs. Luckiness ML Estimators

487

(4)

What Estimator To Use?

491

(2)

Comparison to Bayesian Estimators

493

(5)

Summary and Outlook

498

(1)

Appendix: Proof of Theorem 15.3

499

(2)

MDL Consistency and Convergence

501

(22)

Introduction

501

(1)

The Scenarios Considered

501

(1)

Consistency: Prequential and Two-Part MDL Estimators

502

(3)

Consistency: MDL Model Selection

505

(6)

Selection between a Union of Parametric Models

505

(3)

Nonparametric Model Selection Based on CUP Model Class

508

(3)

MDL Consistency Peculiarities

511

(4)

Risks and Rates

515

(5)

Relations between Divergences and Risk Measures

517

(2)

Minimax Rates

519

(1)

MDL Rates of Convergence

520

(3)

Prequential and Two-Part MDL Estimators

520

(2)

MDL Model Selection

522

(1)

MDL in Context

523

(74)

MDL and Frequentist Paradigms

524

(7)

Sanity Check or Design Principle?

525

(3)

The Weak Prequential Principle

528

(1)

MDL vs. Frequentist Principles: Remaining Issues

529

(2)

MDL and Bayesian Inference

531

(18)

Luckiness Functions vs. Prior Distributions

534

(5)

MDL, Bayes, and Occam

539

(5)

MDL and Brands of Bayesian Statistics

544

(4)

Conclusion: a Common Future after All?

548

(1)

MDL, AIC and BIC

549

(6)

BIC

549

(1)

AIC

550

(2)

Combining the Best of AIC and BIC

552

(3)

MDL and MML

555

(7)

Strict Minimum Message Length

556

(2)

Comparison to MDL

558

(2)

The Wallace-Freeman Estimator

560

(2)

MDL and Prequential Analysis

562

(3)

MDL and Cross-Validation

565

(2)

MDL and Maximum Entropy

567

(3)

Kolmogorov Complexity and Structure Function

570

(3)

MDL and Individual Sequence Prediction

573

(6)

MDL and Statistical Learning Theory

579

(13)

Structural Risk Minimization

581

(4)

PAC-Bayesian Approaches

585

(3)

PAC-Bayes and MDL

588

(4)

The Road Ahead

592

(5)

IV. Additional Background

597

(54)

The Exponential or ``Maximum Entropy'' Families

599

(24)

Introduction

600

(1)

Definition and Overview

601

(4)

Basic Properties

605

(4)

Mean-Value, Canonical, and Other Parameterizations

609

(8)

The Mean Value Parameterization

609

(2)

Other Parameterizations

611

(2)

Relating Mean-Value and Canonical Parameters

613

(4)

Exponential Families of General Probabilistic Sources

617

(2)

Fisher Information Definitions and Characterizations

619

(4)

Information-Theoretic Properties of Exponential Families

623

(28)

Introduction

624

(1)

Robustness of Exponential Family Codes

624

(5)

If Θmean Does Not Contain the Mean

627

(2)

Behavior at the ML Estimate β

629

(3)

Behavior of the ML Estimate β

632

(5)

Central Limit Theorem

633

(1)

Large Deviations

634

(3)

Maximum Entropy and Minimax Codelength

637

(8)

Exponential Families and Maximum Entropy

638

(3)

Exponential Families and Minimax Codelength

641

(2)

The Compression Game

643

(2)

Likelihood Ratio Families and Renyi Divergences

645

(5)

The Likelihood Ratio Family

647

(3)

Summary

650

(1)

References

651

(24)

List of Symbols

675

(4)

Subject Index

679

Lisainfo e-raamatute kohta

Püsilink: https://www.kriso.ee/db/97802622562922e.html

Märksõnad:

Minimum description length (Information theory)

E-raamat: The Minimum Description Length Principle

DRM piirangud

Kopeerimine (copy/paste):

Printimine:

Kasutamine:

Konto & seaded

Otsing

Otsingu andmebaas

Filtreeri tulemusi

Teemad E-raamatute teemad

Vali ostukorv