Preface |
|
xiii | |
Acknowledgments |
|
xvii | |
|
|
1 | (16) |
|
|
2 | (12) |
|
1.1.1 The Central Dogma of Molecular Biology |
|
|
2 | (1) |
|
|
3 | (1) |
|
|
4 | (1) |
|
1.1.4 DNA (Deoxyribonucleic Acid) |
|
|
5 | (1) |
|
1.1.5 RNA (Ribonucleic Acid) |
|
|
6 | (1) |
|
1.1.6 mRNA (messenger RNA) |
|
|
7 | (1) |
|
|
7 | (2) |
|
|
9 | (3) |
|
1.1.9 Gene Expression and the Gene Expression Level |
|
|
12 | (1) |
|
|
13 | (1) |
|
1.2 Overlapping Areas of Research |
|
|
14 | (3) |
|
|
14 | (1) |
|
|
14 | (1) |
|
|
14 | (1) |
|
1.2.4 Transcriptomics and Other-omics |
|
|
14 | (1) |
|
|
15 | (2) |
|
2 Basic Analysis Of Gene Expression Microarray Data |
|
|
17 | (78) |
|
|
17 | (1) |
|
2.2 Microarray Technology |
|
|
18 | (7) |
|
2.2.1 Spotted Microarrays |
|
|
19 | (1) |
|
2.2.2 Affymetrix GeneChip ® Microarrays |
|
|
20 | (4) |
|
2.2.3 Bead-Based Microarrays |
|
|
24 | (1) |
|
2.3 Low-Level Preprocessing of Affymetrix Microarrays |
|
|
25 | (9) |
|
|
27 | (4) |
|
|
31 | (2) |
|
|
33 | (1) |
|
|
34 | (1) |
|
2.4 Public Repositories of Microarray Data |
|
|
34 | (4) |
|
2.4.1 Microarray Gene Expression Data Society (MGED) Standards |
|
|
34 | (3) |
|
|
37 | (1) |
|
2.4.2.1 Gene Expression Omnibus (GEO) |
|
|
37 | (1) |
|
|
38 | (1) |
|
2.5 Gene Expression Matrix |
|
|
38 | (5) |
|
2.5.1 Elements of Gene Expression Microarray Data Analysis |
|
|
42 | (1) |
|
2.6 Additional Preprocessing, Quality Assessment, and Filtering |
|
|
43 | (9) |
|
|
45 | (5) |
|
|
50 | (2) |
|
2.7 Basic Exploratory Data Analysis |
|
|
52 | (12) |
|
|
54 | (1) |
|
2.7.1.1 t Test for Equal Variances |
|
|
55 | (1) |
|
2.7.1.2 t Test for Unequal Variances |
|
|
55 | (1) |
|
|
56 | (1) |
|
|
57 | (2) |
|
|
59 | (1) |
|
2.7.5 Adjustment for Multiple Comparisons |
|
|
59 | (2) |
|
2.7.5.1 Single-Step Bonferroni Procedure |
|
|
61 | (1) |
|
2.7.5.2 Single-Step Sidak Procedure |
|
|
61 | (1) |
|
2.7.5.3 Step-Down Holm Procedure |
|
|
61 | (1) |
|
2.7.5.4 Step-Up Benjamini and Hochberg Procedure |
|
|
62 | (1) |
|
2.7.5.5 Permutation Based Multiplicity Adjustment |
|
|
63 | (1) |
|
2.8 Unsupervised Learning (Taxonomy-Related Analysis) |
|
|
64 | (26) |
|
|
65 | (2) |
|
2.8.1.1 Measures of Similarity or Distance |
|
|
67 | (3) |
|
2.8.1.2 k-Means Clustering |
|
|
70 | (1) |
|
2.8.1.3 Hierarchical Clustering |
|
|
71 | (7) |
|
2.8.1.4 Two-Way Clustering and Related Methods |
|
|
78 | (2) |
|
2.8.2 Principal Component Analysis |
|
|
80 | (5) |
|
2.8.3 Self-Organizing Maps |
|
|
85 | (5) |
|
|
90 | (5) |
|
3 Biomarker Discovery and Classification |
|
|
95 | (106) |
|
|
95 | (24) |
|
3.1.1 Gene Expression Matrix...Again |
|
|
98 | (2) |
|
3.1.2 Biomarker Discovery |
|
|
100 | (5) |
|
3.1.3 Classification Systems |
|
|
105 | (1) |
|
3.1.3.1 Parametric and Nonparametric Learning Algorithms |
|
|
106 | (1) |
|
3.1.3.2 Terms Associated with Common Assumptions Underlying Parametric Learning Algorithms |
|
|
106 | (4) |
|
3.1.3.3 Visualization of Classification Results |
|
|
110 | (1) |
|
3.1.4 Validation of the Classification Model |
|
|
111 | (1) |
|
|
111 | (1) |
|
3.1.4.2 Leave-One-Out and K-Fold Cross-Validation |
|
|
111 | (1) |
|
3.1.4.3 External and Internal Cross-Validation |
|
|
112 | (1) |
|
3.1.4.4 Holdout Method of Validation |
|
|
113 | (1) |
|
3.1.4.5 Ensemble-Based Validation (Using Out-of-Bag Samples) |
|
|
113 | (1) |
|
3.1.4.6 Validation on an Independent Data Set |
|
|
114 | (1) |
|
3.1.5 Reporting Validation Results |
|
|
114 | (1) |
|
3.1.5.1 Binary Classifiers |
|
|
115 | (2) |
|
3.1.5.2 Multiclass Classifiers |
|
|
117 | (2) |
|
3.1.6 Identifying Biological Processes Underlying the Class Differentiation |
|
|
119 | (1) |
|
|
119 | (17) |
|
|
119 | (2) |
|
3.2.2 Univariate Versus Multivariate Approaches |
|
|
121 | (2) |
|
3.2.3 Supervised Versus Unsupervised Methods |
|
|
123 | (3) |
|
3.2.4 Taxonomy of Feature Selection Methods |
|
|
126 | (1) |
|
3.2.4.1 Filters, Wrappers, Hybrid, and Embedded Models |
|
|
126 | (5) |
|
3.2.4.2 Strategy: Exhaustive, Complete, Sequential, Random, and Hybrid Searches |
|
|
131 | (2) |
|
3.2.4.3 Subset Evaluation Criteria |
|
|
133 | (1) |
|
3.2.4.4 Search-Stopping Criteria |
|
|
133 | (1) |
|
3.2.5 Feature Selection for Multiclass Discrimination |
|
|
133 | (1) |
|
3.2.6 Regularization and Feature Selection |
|
|
134 | (1) |
|
3.2.7 Stability of Biomarkers |
|
|
135 | (1) |
|
3.3 Discriminant Analysis |
|
|
136 | (13) |
|
|
136 | (3) |
|
|
139 | (8) |
|
3.3.3 A Stepwise Hybrid Feature Selection with T2 |
|
|
147 | (2) |
|
3.4 Support Vector Machines |
|
|
149 | (19) |
|
3.4.1 Hard-Margin Support Vector Machines |
|
|
150 | (7) |
|
3.4.2 Soft-Margin Support Vector Machines |
|
|
157 | (3) |
|
|
160 | (5) |
|
3.4.4 SVMs and Multiclass Discrimination |
|
|
165 | (1) |
|
3.4.4.1 One-Versus-the-Rest Approach |
|
|
165 | (1) |
|
3.4.4.2 Pairwise Approach |
|
|
165 | (1) |
|
3.4.4.3 All-Classes-Simultaneously Approach |
|
|
166 | (1) |
|
3.4.5 SVMs and Feature Selection: Recursive Feature Elimination |
|
|
166 | (1) |
|
|
167 | (1) |
|
|
168 | (9) |
|
|
168 | (4) |
|
3.5.2 Random Forests Learning Algorithm |
|
|
172 | (2) |
|
3.5.3 Random Forests and Feature Selection |
|
|
174 | (2) |
|
|
176 | (1) |
|
3.6 Ensemble Classifiers, Bootstrap Methods, and The Modified Bagging Schema |
|
|
177 | (5) |
|
3.6.1 Ensemble Classifiers |
|
|
177 | (1) |
|
3.6.1.1 Parallel Approach |
|
|
177 | (1) |
|
|
177 | (1) |
|
3.6.1.3 Ensemble Classifiers and Biomarker Discovery |
|
|
177 | (1) |
|
|
178 | (1) |
|
3.6.3 Bootstrap and Linear Discriminant Analysis |
|
|
179 | (1) |
|
3.6.4 The Modified Bagging Schema |
|
|
180 | (2) |
|
3.7 Other Learning Algorithms |
|
|
182 | (15) |
|
3.7.1 k-Nearest Neighbor Classifiers |
|
|
183 | (2) |
|
3.7.2 Artificial Neural Networks |
|
|
185 | (1) |
|
|
186 | (1) |
|
3.7.2.2 Multilayer Feedforward Neural Networks |
|
|
187 | (5) |
|
3.7.2.3 Training the Network (Supervised Learning) |
|
|
192 | (5) |
|
3.8 Eight Commandments of Gene Expression Analysis (for Biomarker Discovery) |
|
|
197 | (1) |
|
|
198 | (3) |
|
4 The Informative Set of Genes |
|
|
201 | (18) |
|
|
201 | (1) |
|
|
202 | (1) |
|
|
202 | (9) |
|
4.3.1 Identification of the Informative Set of Genes |
|
|
203 | (5) |
|
4.3.2 Primary Expression Patterns of the informative Set of Genes |
|
|
208 | (3) |
|
4.3.3 The Most Frequently Used Genes of the Primary Expression Patterns |
|
|
211 | (1) |
|
4.4 Using the Informative Set of Genes to Identify Robust Multivariate Biomarkers |
|
|
211 | (1) |
|
|
212 | (3) |
|
|
215 | (4) |
|
5 Analysis of Protein Expression Data |
|
|
219 | (34) |
|
|
219 | (3) |
|
5.2 Protein Chip Technology |
|
|
222 | (4) |
|
5.2.1 Antibody Microarrays |
|
|
223 | (2) |
|
5.2.2 Peptide Microarrays |
|
|
225 | (1) |
|
5.2.3 Protein Microarrays |
|
|
225 | (1) |
|
5.2.4 Reverse Phase Microarrays |
|
|
226 | (1) |
|
5.3 Two-Dimensional Gel Electrophoresis |
|
|
226 | (2) |
|
5.4 MALDI-TOF and SELDI-TOF Mass Spectrometry |
|
|
228 | (4) |
|
5.4.1 MALDI-TOF Mass Spectrometry |
|
|
229 | (1) |
|
5.4.2 SELDI-TOF Mass Spectrometry |
|
|
230 | (2) |
|
5.5 Preprocessing of Mass Spectrometry Data |
|
|
232 | (5) |
|
|
232 | (2) |
|
5.5.2 Elements of Preprocessing of SELDI-TOF Mass Spectrometry Data |
|
|
234 | (1) |
|
5.5.2.1 Quality Assessment |
|
|
234 | (1) |
|
|
235 | (1) |
|
5.5.2.3 Baseline Correction |
|
|
235 | (1) |
|
5.5.2.4 Noise Reduction and Smoothing |
|
|
235 | (1) |
|
|
235 | (1) |
|
5.5.2.6 Intensity Normalization |
|
|
236 | (1) |
|
5.5.2.7 Peak Alignment Across Spectra |
|
|
237 | (1) |
|
5.6 Analysis of Protein Expression Data |
|
|
237 | (7) |
|
5.6.1 Additional Preprocessing |
|
|
239 | (1) |
|
5.6.2 Basic Exploratory Data Analysis |
|
|
239 | (1) |
|
5.6.3 Unsupervised Learning |
|
|
240 | (2) |
|
5.6.4 Supervised Learning---Feature Selection and Biomarker Discovery |
|
|
242 | (1) |
|
5.6.5 Supervised Learning---Classification Systems |
|
|
243 | (1) |
|
5.7 Associating Biomarker Peaks with Proteins |
|
|
244 | (7) |
|
|
244 | (2) |
|
5.7.2 The Universal Protein Resource (UniProt) |
|
|
246 | (1) |
|
|
247 | (2) |
|
5.7.4 Tandem Mass Spectrometry |
|
|
249 | (2) |
|
|
251 | (2) |
|
6 Sketches for Selected Exercises |
|
|
253 | (36) |
|
|
253 | (1) |
|
6.2 Multiclass Discrimination (Exercise 3.2) |
|
|
254 | (11) |
|
6.2.1 Data Set Selection, Downloading, and Consolidation |
|
|
254 | (2) |
|
6.2.2 Filtering Probe Sets |
|
|
256 | (1) |
|
6.2.3 Designing a Multistage Classification Schema |
|
|
257 | (8) |
|
6.3 Identifying the Informative Set of Genes (Exercises 4.2-4.6) |
|
|
265 | (6) |
|
6.3.1 The Informative Set of Genes |
|
|
266 | (1) |
|
6.3.2 Primary Expression Patterns of the Informative Set |
|
|
267 | (3) |
|
6.3.3 The Most Frequently Used Genes of the Primary Expression Patterns |
|
|
270 | (1) |
|
6.4 Using the Informative Set of Genes to Identify Robust Multivariate Markers (Exercise 4.8) |
|
|
271 | (1) |
|
6.5 Validating Biomarkers on an Independent Test Data Set (Exercise 4.8) |
|
|
272 | (2) |
|
6.6 Using a Training Set that Combines More than One Data Set (Exercises 3.5 and 4.1-4.8) |
|
|
274 | (15) |
|
6.6.1 Combining the Two Data Sets into a Single Training Set |
|
|
275 | (1) |
|
6.6.2 Filtering Probe Sets of the Combined Data |
|
|
276 | (1) |
|
6.6.3 Assessing the Discriminatory Power of the Biomarkers and Their Generalization |
|
|
276 | (1) |
|
6.6.4 Identifying the Informative Set of Genes |
|
|
276 | (4) |
|
6.6.5 Primary Expression Patterns of the Informative Set of Genes |
|
|
280 | (1) |
|
6.6.6 The Most Frequently Used Genes of the Primary Expression Patterns |
|
|
281 | (4) |
|
6.6.7 Using the Informative Set of Genes to Identify Robust Multivariate Markers |
|
|
285 | (2) |
|
6.6.8 Validating Biomarkers on an Independent Test Data Set |
|
|
287 | (2) |
References |
|
289 | (18) |
Index |
|
307 | |