Muutke küpsiste eelistusi

E-raamat: Statistical Learning for Biomedical Data

  • Formaat - PDF+DRM
  • Hind: 48,15 €*
  • * hind on lõplik, st. muud allahindlused enam ei rakendu
  • Lisa ostukorvi
  • Lisa soovinimekirja
  • See e-raamat on mõeldud ainult isiklikuks kasutamiseks. E-raamatuid ei saa tagastada.

DRM piirangud

  • Kopeerimine (copy/paste):

    ei ole lubatud

  • Printimine:

    ei ole lubatud

  • Kasutamine:

    Digitaalõiguste kaitse (DRM)
    Kirjastus on väljastanud selle e-raamatu krüpteeritud kujul, mis tähendab, et selle lugemiseks peate installeerima spetsiaalse tarkvara. Samuti peate looma endale  Adobe ID Rohkem infot siin. E-raamatut saab lugeda 1 kasutaja ning alla laadida kuni 6'de seadmesse (kõik autoriseeritud sama Adobe ID-ga).

    Vajalik tarkvara
    Mobiilsetes seadmetes (telefon või tahvelarvuti) lugemiseks peate installeerima selle tasuta rakenduse: PocketBook Reader (iOS / Android)

    PC või Mac seadmes lugemiseks peate installima Adobe Digital Editionsi (Seeon tasuta rakendus spetsiaalselt e-raamatute lugemiseks. Seda ei tohi segamini ajada Adober Reader'iga, mis tõenäoliselt on juba teie arvutisse installeeritud )

    Seda e-raamatut ei saa lugeda Amazon Kindle's. 

This book is for anyone who has biomedical data and needs to identify variables that predict an outcome, for two-group outcomes such as tumor/not-tumor, survival/death, or response from treatment. Statistical learning machines are ideally suited to these types of prediction problems, especially if the variables being studied may not meet the assumptions of traditional techniques. Learning machines come from the world of probability and computer science but are not yet widely used in biomedical research. This introduction brings learning machine techniques to the biomedical world in an accessible way, explaining the underlying principles in nontechnical language and using extensive examples and figures. The authors connect these new methods to familiar techniques by showing how to use the learning machine models to generate smaller, more easily interpretable traditional models. Coverage includes single decision trees, multiple-tree techniques such as Random Forests™, neural nets, support vector machines, nearest neighbors and boosting.

Biomedical researchers need machine learning techniques to make predictions such as survival/death or response to treatment when data sets are large and complex. This highly motivating introduction to these machines explains underlying principles in nontechnical language, using many examples and figures, and connects these new methods to familiar techniques.

Arvustused

'The book is well written and provides nice graphics and numerous applications.' Michael R. Chernick, Technometrics

Muu info

This highly motivating introduction to statistical learning machines explains underlying principles in nontechnical language, using many examples and figures.
Preface xi
Acknowledgments xii
Part I Introduction 1(88)
1 Prologue
3(11)
1.1 Machines that learn - some recent history
3(4)
1.2 Twenty canonical questions
7(2)
1.3 Outline of the book
9(2)
1.4 A comment about example datasets
11(1)
1.5 Software
12(1)
Note
13(1)
2 The landscape of learning machines
14(27)
2.1 Introduction
14(1)
2.2 Types of data for learning machines
15(2)
2.3 Will that be supervised or unsupervised?
17(1)
2.4 An unsupervised example
18(2)
2.5 More lack of supervision - where are the parents?
20(1)
2.6 Engines, complex and primitive
20(2)
2.7 Model richness means what, exactly?
22(3)
2.8 Membership or probability of membership?
25(2)
2.9 A taxonomy of machines?
27(3)
2.10 A note of caution - one of many
30(1)
2.11 Highlights from the theory
30(6)
Notes
36(5)
3 A mangle of machines
41(16)
3.1 Introduction
41(1)
3.2 Linear regression
41(1)
3.3 Logistic regression
42(1)
3.4 Linear discriminant
43(2)
3.5 Bayes classifiers – regular and naive
45(2)
3.6 Logic regression
47(1)
3.7 k-Nearest neighbors
48(2)
3.8 Support vector machines
50(3)
3.9 Neural networks
53(1)
3.10 Boosting
54(1)
3.11 Evolutionary and genetic algorithms
55(1)
Notes
56(1)
4 Three examples and several machines
57(32)
4.1 Introduction
57(1)
4.2 Simulated cholesterol data
58(3)
4.3 Lupus data
61(1)
4.4 Stroke data
62(1)
4.5 Biomedical means unbalanced
63(1)
4.6 Measures of machine performance
64(2)
4.7 Linear analysis of cholesterol data
66(1)
4.8 Nonlinear analysis of cholesterol data
67(3)
4.9 Analysis of the lupus data
70(5)
4.10 Analysis of the stroke data
75(4)
4.11 Further analysis of the lupus and stroke data
79(8)
Notes
87(2)
Part II A machine toolkit 89(66)
5 Logistic regression
91(27)
5.1 Introduction
91(1)
5.2 Inside and around the model
92(1)
5.3 Interpreting the coefficients
93(1)
5.4 Using logistic regression as a decision rule
94(1)
5.5 Logistic regression applied to the cholesterol data
94(4)
5.6 A cautionary note
98(3)
5.7 Another cautionary note
101(1)
5.8 Probability estimates and decision rules
102(1)
5.9 Evaluating the goodness-of-fit of a logistic regression model
103(3)
5.10 Calibrating a logistic regression
106(5)
5.11 Beyond calibration
111(2)
5.12 Logistic regression and reference models
113(2)
Notes
115(3)
6 A single decision tree
118(19)
6.1 Introduction
118(1)
6.2 Dropping down trees
118(2)
6.3 Growing a tree
120(1)
6.4 Selecting features, making splits
120(1)
6.5 Good split, bad split
121(3)
6.6 Finding good features for making splits
124(1)
6.7 Misreading trees
125(2)
6.8 Stopping and pruning rules
127(1)
6.9 Using functions of the features
128(1)
6.10 Unstable trees?
129(3)
6.11 Variable importance - growing on trees?
132(2)
6.12 Permuting for importance
134(1)
6.13 The continuing mystery of trees
135(2)
7 Random Forests - trees everywhere
137(18)
7.1 Random Forests in less than five minutes
137(1)
7.2 Random treks through the data
138(1)
7.3 Random treks through the features
139(1)
7.4 Walking through the forest
140(1)
7.5 Weighted and unweighted voting
140(2)
7.6 Finding subsets in the data using proximities
142(2)
7.7 Applying Random Forests to the Stroke data
144(7)
7.8 Random Forests in the universe of machines
151(2)
Notes
153(2)
Part III Analysis fundamentals 155(90)
8 Merely two variables
157(14)
8.1 Introduction
157(1)
8.2 Understanding correlations
158(1)
8.3 Hazards of correlations
159(4)
8.4 Correlations big and small
163(5)
Notes
168(3)
9 More than two variables
171(27)
9.1 Introduction
171(1)
9.2 Tiny problems, large consequences
172(2)
9.3 Mathematics to the rescue?
174(2)
9.4 Good models need not be unique
176(3)
9.5 Contexts and coefficients
179(2)
9.6 Interpreting and testing coefficients in models
181(5)
9.7 Merging models, pooling lists, ranking features
186(4)
Notes
190(8)
10 Resampling methods
198(17)
10.1 Introduction
198(1)
10.2 The bootstrap
198(3)
10.3 When the bootstrap works
201(1)
10.4 When the bootstrap doesn't work
202(1)
10.5 Resampling from a single group in different ways
203(1)
10.6 Resampling from groups with unequal sizes
204(2)
10.7 Resampling from small datasets
206(1)
10.8 Permutation methods
207(3)
10.9 Still more on permutation methods
210(4)
Note
214(1)
11 Error analysis and model validation
215(30)
11.1 Introduction
215(2)
11.2 Errors? What errors?
217(1)
11.3 Unbalanced data, unbalanced errors
218(1)
11.4 Error analysis for a single machine
219(3)
11.5 Cross-validation error estimation
222(2)
11.6 Cross-validation or cross-training?
224(2)
11.7 The leave-one-out method
226(1)
11.8 The out-of-bag method
227(1)
11.9 Intervals for error estimates for a single machine
228(2)
11.10 Tossing random coins into the abyss
230(2)
11.11 Error estimates for unbalanced data
232(1)
11.12 Confidence intervals for comparing error values
233(3)
11.13 Other measures of machine accuracy
236(2)
11.14 Benchmarking and winning the lottery
238(1)
11.15 Error analysis for predicting continuous outcomes
239(1)
Notes
240(5)
Part IV Machine strategies 245(18)
12 Ensemble methods — let's take a vote
247(8)
12.1 Pools of machines
247(1)
12.2 Weak correlation with outcome can be good enough
247(3)
12.3 Model averaging
250(4)
Notes
254(1)
13 Summary and conclusions
255(8)
13.1 Where have we been?
255(2)
13.2 So many machines
257(2)
13.3 Binary decision or probability estimate?
259(1)
13.4 Survival machines? Risk machines?
259(1)
13.5 And where are we going?
260(3)
Appendix 263(8)
References 271(10)
Index 281
James D. Malley is a Research Mathematical Statistician in the Mathematical and Statistical Computing Laboratory, Division of Computational Bioscience, Center for Information Technology, at the National Institutes of Health. Karen G. Malley is president of Malley Research Programming, Inc. in Rockville, Maryland, providing statistical programming services to the pharmaceutical industry and the National Institutes of Health. She also serves on the global council of the Clinical Data Interchange Standards Consortium (CDISC) user network, and the steering committee of the Washington, DC area CDISC user network. Sinisa Pajevic is a Staff Scientist in the Mathematical and Statistical Computing Laboratory, Division of Computational Bioscience, Center for Information Technology, at the National Institutes of Health.