Muutke küpsiste eelistusi

Classification Analysis of DNA Microarrays [Multiple-component retail product, part(s) enclosed]

Series edited by (Department of Computer Science, Georgia State University), Series edited by (University of Western Australia), (Department of Medicine and Department of Molecular and Human Genetics, Baylor College of Medicine)
  • Formaat: Multiple-component retail product, part(s) enclosed, 736 pages, kõrgus x laius x paksus: 234x155x41 mm, kaal: 1157 g, Contains 1 Hardback and 1 CD-ROM
  • Sari: Wiley Series in Bioinformatics
  • Ilmumisaeg: 17-May-2013
  • Kirjastus: John Wiley & Sons Inc
  • ISBN-10: 0470170816
  • ISBN-13: 9780470170816
Teised raamatud teemal:
  • Multiple-component retail product, part(s) enclosed
  • Hind: 128,28 €
  • Raamatu kohalejõudmiseks kirjastusest kulub orienteeruvalt 2-4 nädalat
  • Kogus:
  • Lisa ostukorvi
  • Tasuta tarne
  • Tellimisaeg 2-4 nädalat
  • Lisa soovinimekirja
  • Formaat: Multiple-component retail product, part(s) enclosed, 736 pages, kõrgus x laius x paksus: 234x155x41 mm, kaal: 1157 g, Contains 1 Hardback and 1 CD-ROM
  • Sari: Wiley Series in Bioinformatics
  • Ilmumisaeg: 17-May-2013
  • Kirjastus: John Wiley & Sons Inc
  • ISBN-10: 0470170816
  • ISBN-13: 9780470170816
Teised raamatud teemal:
Wiley Series in Bioinformatics: Computational Techniques and Engineering Yi Pan and Albert Y. Zomaya, Series Editors

Wide coverage of traditional unsupervised and supervised methods and newer contemporary approaches that help researchers handle the rapid growth of classification methods in DNA microarray studies

Proliferating classification methods in DNA microarray studies have resulted in a body of information scattered throughout literature, conference proceedings, and elsewhere. This book unites many of these classification methods in a single volume. In addition to traditional statistical methods, it covers newer machine-learning approaches such as fuzzy methods, artificial neural networks, evolutionary-based genetic algorithms, support vector machines, swarm intelligence involving particle swarm optimization, and more.



Classification Analysis of DNA Microarrays provides highly detailed pseudo-code and rich, graphical programming features, plus ready-to-run source code. Along with primary methods that include traditional and contemporary classification, it offers supplementary tools and data preparation routines for standardization and fuzzification; dimensional reduction via crisp and fuzzy c-means, PCA, and non-linear manifold learning; and computational linguistics via text analytics and n-gram analysis, recursive feature extraction during ANN, kernel-based methods, ensemble classifier fusion.

This powerful new resource:





Provides information on the use of classification analysis for DNA microarrays used for large-scale high-throughput transcriptional studies Serves as a historical repository of general use supervised classification methods as well as newer contemporary methods Brings the reader quickly up to speed on the various classification methods by implementing the programming pseudo-code and source code provided in the book Describes implementation methods that help shorten discovery times

Classification Analysis of DNA Microarrays is useful for professionals and graduate students in computer science, bioinformatics, biostatistics, systems biology, and many related fields.
Preface xix
Abbreviations xxiii
1 Introduction
1(12)
1.1 Class Discovery
2(2)
1.2 Dimensional Reduction
4(1)
1.3 Class Prediction
4(1)
1.4 Classification Rules of Thumb
5(4)
1.5 DNA Microarray Datasets Used
9(2)
References
11(2)
Part I Class Discovery 13(146)
2 Crisp K-Means Cluster Analysis
15(32)
2.1 Introduction
15(1)
2.2 Algorithm
16(2)
2.3 Implementation
18(2)
2.4 Distance Metrics
20(4)
2.5 Cluster Validity
24(11)
2.5.1 Davies-Bouldin Index
25(1)
2.5.2 Dunn's Index
25(1)
2.5.3 Intracluster Distance
26(1)
2.5.4 Intercluster Distance
27(3)
2.5.5 Silhouette Index
30(1)
2.5.6 Hubert's F Statistic
31(1)
2.5.7 Randomization Tests for Optimal Value of K
31(4)
2.6 V-Fold Cross-Validation
35(2)
2.7 Cluster Initialization
37(7)
2.7.1 K Randomly Selected Microarrays
37(3)
2.7.2 K Random Partitions
40(1)
2.7.3 Prototype Splitting
41(3)
2.8 Cluster Outliers
44(1)
2.9 Summary
44(1)
References
45(2)
3 Fuzzy K-Means Cluster Analysis
47(10)
3.1 Introduction
47(1)
3.2 Fuzzy K-Means Algorithm
47(2)
3.3 Implementation
49(5)
3.4 Summary
54(1)
References
54(3)
4 Self-Organizing Maps
57(24)
4.1 Introduction
57(1)
4.2 Algorithm
57(6)
4.2.1 Feature Transformation and Reference Vector Initialization
59(1)
4.2.2 Learning
60(1)
4.2.3 Conscience
61(2)
4.3 Implementation
63(4)
4.3.1 Feature Transformation and Reference VectorInitialization
63(3)
4.3.2 Reference Vector Weight Learning
66(1)
4.4 Cluster Visualization
67(4)
4.4.1 Crisp K-Means Cluster Analysis
67(1)
4.4.2 Adjacency Matrix Method
68(1)
4.4.3 Cluster Connectivity Method
69(1)
4.4.4 Hue-Saturation-Value (HSV) Color Normalization
69(2)
4.5 Unified Distance Matrix (U Matrix)
71(1)
4.6 Component Map
71(2)
4.7 Map Quality
73(2)
4.8 Nonlinear Dimension Reduction
75(4)
References
79(2)
5 Unsupervised Neural Gas
81(10)
5.1 Introduction
81(1)
5.2 Algorithm
82(1)
5.3 Implementation
82(3)
5.3.1 Feature Transformation and Prototype Initialization
82(1)
5.3.2 Prototype Learning
83(2)
5.4 Nonlinear Dimension Reduction
85(2)
5.5 Summary
87(1)
References
88(3)
6 Hierarchical Cluster Analysis
91(16)
6.1 Introduction
91(1)
6.2 Methods
91(5)
6.2.1 General Programming Methods
91(1)
6.2.2 Step 1: Cluster-Analyzing Arrays as Objects with Genes as Attributes
92(2)
6.2.3 Step 2: Cluster-Analyzing Genes as Objects with Arrays as Attributes
94(2)
6.3 Algorithm
96(1)
6.4 Implementation
96(9)
6.4.1 Heatmap Color Control
96(1)
6.4.2 User Choices for Clustering Arrays and Genes
97(1)
6.4.3 Distance Matrices and Agglomeration Sequences
98(6)
6.4.4 Drawing Dendograms and Heatmaps
104(1)
References
105(2)
7 Model-Based Clustering
107(12)
7.1 Introduction
107(3)
7.2 Algorithm
110(1)
7.3 Implementation
111(5)
7.4 Summary
116(1)
References
117(2)
8 Text Mining: Document Clustering
119(20)
8.1 Introduction
119(1)
8.2 Duo-Mining
119(1)
8.3 Streams and Documents
120(1)
8.4 Lexical Analysis
120(1)
8.4.1 Automatic Indexing
120(1)
8.4.2 Removing Stopwords
121(1)
8.5 Stemming
121(1)
8.6 Term Weighting
121(3)
8.7 Concept Vectors
124(1)
8.8 Main Terms Representing Concept Vectors
124(1)
8.9 Algorithm
125(2)
8.10 Preprocessing
127(10)
8.11 Summary
137(1)
References
137(2)
9 Text Mining: N-Gram Analysis
139(20)
9.1 Introduction
139(1)
9.2 Algorithm
140(1)
9.3 Implementation
141(13)
9.4 Summary
154(2)
References
156(3)
Part II Dimension Reduction 159(46)
10 Principal Components Analysis
161(28)
10.1 Introduction
161(1)
10.2 Multivariate Statistical Theory
161(9)
10.2.1 Matrix Definitions
162(1)
10.2.2 Principal Component Solution of R
163(1)
10.2.3 Extraction of Principal Components
164(2)
10.2.4 Varimax Orthogonal Rotation of Components
166(2)
10.2.5 Principal Component Score Coefficients
168(1)
10.2.6 Principal Component Scores
169(1)
10.3 Algorithm
170(1)
10.4 When to Use Loadings and PC Scores
170(1)
10.5 Implementation
171(11)
10.5.1 Correlation Matrix R
171(1)
10.5.2 Eigenanalysis of Correlation Matrix R
172(2)
10.5.3 Determination of Loadings and Varimax Rotation
174(2)
10.5.4 Calculating Principal Component (PC) Scores
176(6)
10.6 Rules of Thumb For PCA
182(4)
10.7 Summary
186(1)
References
187(2)
11 Nonlinear Manifold Learning
189(16)
11.1 Introduction
189(1)
11.2 Correlation-Based PCA
190(1)
11.3 Kernel PCA
191(1)
11.4 Diffusion Maps
192(1)
11.5 Laplacian Eigenmaps
192(1)
11.6 Local Linear Embedding
193(1)
11.7 Locality Preserving Projections
194(1)
11.8 Sammon Mapping
195(1)
11.9 NLML Prior to Classification Analysis
195(2)
11.10 Classification Results
197(3)
11.11 Summary
200(3)
References
203(2)
Part III Class Prediction 205(420)
12 Feature Selection
207(66)
12.1 Introduction
207(1)
12.2 Filtering versus Wrapping
208(1)
12.3 Data
209(2)
12.3.1 Numbers
209(1)
12.3.2 Responses
209(1)
12.3.3 Measurement Scales
210(1)
12.3.4 Variables
211(1)
12.4 Data Arrangement
211(2)
12.5 Filtering
213(41)
12.5.1 Continuous Features
213(6)
12.5.2 Best Rank Filters
219(17)
12.5.3 Randomization Tests
236(1)
12.5.4 Multitesting Problem
237(5)
12.5.5 Filtering Qualitative Features
242(4)
12.5.6 Multiclass Gini Diversity Index
246(1)
12.5.7 Class Comparison Techniques
247(3)
12.5.8 Generation of Nonredundant Gene List
250(4)
12.6 Selection Methods
254(5)
12.6.1 Greedy Plus Takeaway (Greedy PTA)
254(4)
12.6.2 Best Ranked Genes
258(1)
12.7 Multicollinearity
259(11)
12.8 Summary
270(1)
References
270(3)
13 Classifier Performance
273(24)
13.1 Introduction
273(1)
13.2 Input-Output, Speed, and Efficiency
273(4)
13.3 Training, Testing, and Validation
277(3)
13.4 Ensemble Classifier Fusion
280(3)
13.5 Sensitivity and Specificity
283(1)
13.6 Bias
284(1)
13.7 Variance
285(1)
13.8 Receiver-Operator Characteristic (ROC) Curves
286(9)
References
295(2)
14 Linear Regression
297(14)
14.1 Introduction
297(2)
14.2 Algorithm
299(1)
14.3 Implementation
299(1)
14.4 Cross-Validation Results
300(3)
14.5 Bootstrap Bias
303(3)
14.6 Multiclass ROC Curves
306(2)
14.7 Decision Boundaries
308(2)
14.8 Summary
310(1)
References
310(1)
15 Decision Tree Classification
311(20)
15.1 Introduction
311(3)
15.2 Features Used
314(1)
15.3 Terminal Nodes and Stopping Criteria
315(1)
15.4 Algorithm
315(1)
15.5 Implementation
315(3)
15.6 Cross-Validation Results
318(8)
15.7 Decision Boundaries
326(1)
15.8 Summary
327(2)
References
329(2)
16 Random Forests
331(30)
16.1 Introduction
331(2)
16.2 Algorithm
333(1)
16.3 Importance Scores
334(4)
16.4 Strength and Correlation
338(4)
16.5 Proximity and Supervised Clustering
342(3)
16.6 Unsupervised Clustering
345(3)
16.7 Class Outlier Detection
348(2)
16.8 Implementation
350(1)
16.9 Parameter Effects
350(7)
16.10 Summary
357(1)
References
358(3)
17 K Nearest Neighbor
361(18)
17.1 Introduction
361(1)
17.2 Algorithm
362(1)
17.3 Implementation
363(1)
17.4 Cross-Validation Results
364(5)
17.5 Bootstrap Bias
369(4)
17.6 Multiclass ROC Curves
373(1)
17.7 Decision Boundaries
374(3)
17.8 Summary
377(1)
References
378(1)
18 Nave Bayes Classifier
379(14)
18.1 Introduction
379(1)
18.2 Algorithm
380(1)
18.3 Cross-Validation Results
380(4)
18.4 Bootstrap Bias
384(2)
18.5 Multiclass ROC Curves
386(1)
18.6 Decision Boundaries
386(3)
18.7 Summary
389(2)
References
391(2)
19 Linear Discriminant Analysis
393(22)
19.1 Introduction
393(1)
19.2 Multivariate Matrix Definitions
394(2)
19.3 Linear Discriminant Analysis
396(7)
19.3.1 Algorithm
397(1)
19.3.2 Cross-Validation Results
397(4)
19.3.3 Bootstrap Bias
401(1)
19.3.4 Multiclass ROC Curves
402(1)
19.3.5 Decision Boundaries
403(1)
19.4 Quadratic Discriminant Analysis
403(3)
19.5 Fisher's Discriminant Analysis
406(5)
19.6 Summary
411(1)
References
412(3)
20 Learning Vector Quantization
415(18)
20.1 Introduction
415(2)
20.2 Cross-Validation Results
417(1)
20.3 Bootstrap Bias
417(9)
20.4 Multiclass ROC Curves
426(2)
20.5 Decision Boundaries
428(1)
20.6 Summary
428(2)
References
430(3)
21 Logistic Regression
433(16)
21.1 Introduction
433(1)
21.2 Binary Logistic Regression
434(5)
21.3 Polytomous Logistic Regression
439(4)
21.4 Cross-Validation Results
443(1)
21.5 Decision Boundaries
444(1)
21.6 Summary
444(3)
References
447(2)
22 Support Vector Machines
449(38)
22.1 Introduction
449(1)
22.2 Hard-Margin SVM for Linearly Separable Classes
449(3)
22.3 Kernel Mapping into Nonlinear Feature Space
452(1)
22.4 Soft-Margin SVM for Nonlinearly Separable Classes
452(2)
22.5 Gradient Ascent Soft-Margin SVM
454(11)
22.5.1 Cross-Validation Results
455(2)
22.5.2 Bootstrap Bias
457(8)
22.5.3 Multiclass ROC Curves
465(1)
22.5.4 Decision Boundaries
465(1)
22.6 Least-Squares Soft-Margin SVM
465(16)
22.6.1 Cross-Validation Results
470(7)
22.6.2 Bootstrap Bias
477(1)
22.6.3 Multiclass ROC Curves
477(1)
22.6.4 Decision Boundaries
477(4)
22.7 Summary
481(2)
References
483(4)
23 Artificial Neural Networks
487(38)
23.1 Introduction
487(1)
23.2 ANN Architecture
488(1)
23.3 Basics of ANN Training
488(9)
23.3.1 Backpropagation Learning
493(3)
23.3.2 Resilient Backpropagation (RPROP) Learning
496(1)
23.3.3 Cycles and Epochs
496(1)
23.4 ANN Training Methods
497(5)
23.4.1 Method 1: Gene Dimensional Reduction and Recursive Feature Elimination for Large Gene Lists
497(5)
23.4.2 Method 2: Gene Filtering and Selection
502(1)
23.5 Algorithm
502(2)
23.6 Batch versus Online Training
504(1)
23.7 ANN Testing
504(1)
23.8 Cross-Validation Results
504(2)
23.9 Bootstrap Bias
506(1)
23.10 Multiclass ROC Curves
506(7)
23.11 Decision Boundaries
513(1)
23.12 RPROP versus Backpropagation
513(9)
23.13 Summary
522(1)
References
522(3)
24 Kernel Regression
525(18)
24.1 Introduction
525(2)
24.2 Algorithm
527(1)
24.3 Cross-Validation Results
527(1)
24.4 Bootstrap Bias
528(8)
24.5 Multiclass ROC Curves
536(1)
24.6 Decision Boundaries
537(3)
24.7 Summary
540(2)
References
542(1)
25 Neural Adaptive Learning with Metaheuristics
543(30)
25.1 Multilayer Perceptrons
544(1)
25.2 Genetic Algorithms
544(5)
25.3 Covariance Matrix Self-Adaptation-Evolution Strategies
549(7)
25.4 Particle Swarm Optimization
556(4)
25.5 ANT Colony Optimization
560(7)
25.5.1 Classification
560(2)
25.5.2 Continuous-Function Approximation
562(5)
25.6 Summary
567(1)
References
567(6)
26 Supervised Neural Gas
573(18)
26.1 Introduction
573(1)
26.2 Algorithm
574(1)
26.3 Cross-Validation Results
574(8)
26.4 Bootstrap Bias
582(1)
26.5 Multiclass ROC Curves
582(2)
26.6 Class Decision Boundaries
584(2)
26.7 Summary
586(2)
References
588(3)
27 Mixture of Experts
591(10)
27.1 Introduction
591(4)
27.2 Algorithm
595(1)
27.3 Cross-Validation Results
596(1)
27.4 Decision Boundaries
597(1)
27.5 Summary
597(2)
References
599(2)
28 Covariance Matrix Filtering
601(24)
28.1 Introduction
601(1)
28.2 Covariance and Correlation Matrices
601(1)
28.3 Random Matrices
602(6)
28.4 Component Subtraction
608(2)
28.5 Covariance Matrix Shrinkage
610(3)
28.6 Covariance Matrix Filtering
613(8)
28.7 Summary
621(1)
References
622(3)
Appendixes 625(78)
A Probability Primer
627(12)
A.1 Choices
627(1)
A.2 Permutations
628(2)
A.3 Combinations
630(2)
A.4 Probability
632(7)
A.4.1 Addition Rule
633(1)
A.4.2 Multiplication Rule and Conditional Probabilities
634(1)
A.4.3 Multiplication Rule for Independent Events
635(1)
A.4.4 Elimination Rule (Disease Prevalence)
636(1)
A.4.5 Bayes' Rule (Pathway Probabilities)
637(2)
B Matrix Algebra
639(16)
B.1 Vectors
639(3)
B.2 Matrices
642(5)
B.3 Sample Mean, Covariance, and Correlation
647(1)
B.4 Diagonal Matrices
648(1)
B.5 Identity Matrices
649(1)
B.6 Trace of a Matrix
650(1)
B.7 Eigenanalysis
650(1)
B.8 Symmetric Eigenvalue Problem
650(1)
B.9 Generalized Eigenvalue Problem
651(1)
B.10 Matrix Properties
652(3)
C Mathematical Functions
655(10)
C.1 Inequalities
655(1)
C.2 Laws of Exponents
655(1)
C.3 Laws of Radicals
656(1)
C.4 Absolute Value
656(1)
C.5 Logarithms
656(1)
C.6 Product and Summation Operators
657(1)
C.7 Partial Derivatives
657(1)
C.8 Likelihood Functions
658(7)
D Statistical Primitives
665(14)
D.1 Rules of Thumb
665(3)
D.2 Primitives
668(10)
References
678(1)
E Probability Distributions
679(20)
E.1 Basics of Hypothesis Testing
679(3)
E.2 Probability Functions: Source of p Values
682(1)
E.3 Normal Distribution
682(4)
E.4 Gamma Function
686(3)
E.5 Beta Function
689(3)
E.6 Pseudo-Random-Number Generation
692(6)
E.6.1 Standard Uniform Distribution
692(1)
E.6.2 Normal Distribution
693(1)
E.6.3 Lognormal Distribution
694(1)
E.6.4 Binomial Distribution
695(1)
E.6.5 Poisson Distribution
696(1)
E.6.6 Triangle Distribution
697(1)
E.6.7 Log-Triangle Distribution
698(1)
References
698(1)
F Symbols And Notation
699(4)
Index 703
LEIF E. PETERSON, PHD, is Associate Professor of Public Health, Weill Cornell Medical College, Cornell University, and is with the Center for Biostatistics, The Methodist Hospital Research Institute (Houston). He is a member of the IEEE Computational Intelligence Society, and Editor-in-Chief of the BioMed Central Source Code for Biology and Medicine.