Preface |
|
xv | |
Notation Used |
|
xviii | |
|
|
xix | |
|
|
1 | (18) |
|
1.1 Statistics and Machine Learning A |
|
|
1 | (5) |
|
1.2 Environmental Data Science A |
|
|
6 | (3) |
|
1.3 A Simple Example of Curve Fitting A |
|
|
9 | (3) |
|
1.4 Main Types of Data Problems A |
|
|
12 | (5) |
|
1.4.1 Supervised Learning A |
|
|
15 | (1) |
|
1.4.2 Unsupervised Learning A |
|
|
16 | (1) |
|
1.4.3 Reinforced Learning A |
|
|
17 | (1) |
|
1.5 Curse of Dimensionality A |
|
|
17 | (2) |
|
|
19 | (46) |
|
|
19 | (1) |
|
|
19 | (3) |
|
2.3 Probability Density A |
|
|
22 | (2) |
|
2.4 Expectation and Mean A |
|
|
24 | (1) |
|
2.5 Variance and Standard Deviation A |
|
|
25 | (1) |
|
|
26 | (1) |
|
2.7 Online Algorithms for Mean, Variance and Covariance C |
|
|
27 | (2) |
|
2.8 Median and Median Absolute Deviation A |
|
|
29 | (1) |
|
|
30 | (2) |
|
2.10 Skewness and Kurtosis B |
|
|
32 | (1) |
|
|
33 | (8) |
|
2.11.1 Pearson Correlation A |
|
|
33 | (3) |
|
2.11.2 Serial Correlation A |
|
|
36 | (2) |
|
2.11.3 Spearman Rank Correlation A |
|
|
38 | (1) |
|
2.11.4 Kendall Rank Correlation A |
|
|
39 | (1) |
|
2.11.5 Biweight Mid correlation B |
|
|
40 | (1) |
|
2.12 Exploratory Data Analysis A |
|
|
41 | (6) |
|
|
41 | (1) |
|
2.12.2 Quantile--Quantile (Q--Q) Plots B |
|
|
42 | (3) |
|
|
45 | (2) |
|
2.13 Mahalanobis Distance A |
|
|
47 | (2) |
|
2.13.1 Mahalanobis Distance and Principal Component Analysis B |
|
|
47 | (2) |
|
|
49 | (3) |
|
|
52 | (2) |
|
|
54 | (2) |
|
2.17 Information Theory B |
|
|
56 | (6) |
|
|
58 | (1) |
|
2.17.2 Joint Entropy and Conditional Entropy B |
|
|
59 | (1) |
|
2.17.3 Relative Entropy B |
|
|
60 | (1) |
|
2.17.4 Mutual Information B |
|
|
61 | (1) |
|
|
62 | (3) |
|
3 Probability Distributions |
|
|
65 | (36) |
|
3.1 Binomial Distribution A |
|
|
66 | (2) |
|
3.2 Poisson Distribution B |
|
|
68 | (1) |
|
3.3 Multinomial Distribution B |
|
|
68 | (1) |
|
3.4 Gaussian Distribution A |
|
|
69 | (4) |
|
3.5 Maximum Likelihood Estimation A |
|
|
73 | (2) |
|
3.6 Multivariate Gaussian Distribution B |
|
|
75 | (2) |
|
3.7 Conditional and Marginal Gaussian Distributions C |
|
|
77 | (1) |
|
|
78 | (2) |
|
|
80 | (2) |
|
3.10 Von Mises Distribution C |
|
|
82 | (1) |
|
3.11 Extreme Value Distributions C |
|
|
83 | (3) |
|
3.12 Gaussian Mixture Model B |
|
|
86 | (5) |
|
3.12.1 Expectation-Maximization (EM) Algorithm B |
|
|
90 | (1) |
|
3.13 Kernel Density Estimation B |
|
|
91 | (2) |
|
3.14 Re-expressing Data A |
|
|
93 | (2) |
|
3.15 Student t-distribution B |
|
|
95 | (2) |
|
3.16 Chi-squared Distribution B |
|
|
97 | (2) |
|
|
99 | (2) |
|
|
101 | (36) |
|
|
101 | (3) |
|
|
104 | (7) |
|
4.2.1 One-Sample t-test A |
|
|
105 | (1) |
|
4.2.2 Independent Two-Sample t-test A |
|
|
105 | (2) |
|
4.2.3 Dependent t-test for Paired Samples A |
|
|
107 | (1) |
|
4.2.4 Serial Correlation A |
|
|
107 | (2) |
|
4.2.5 Significance Test for Correlation A |
|
|
109 | (2) |
|
4.3 Non-parametric Alternatives to i-test B |
|
|
111 | (4) |
|
4.3.1 Wilcoxon--Mann--Whitney Test C |
|
|
111 | (3) |
|
4.3.2 Wilcoxon Signed-Rank Test C |
|
|
114 | (1) |
|
4.4 Confidence Interval A |
|
|
115 | (4) |
|
4.4.1 Confidence Interval for Population Mean B |
|
|
116 | (2) |
|
4.4.2 Confidence Interval for Correlation B |
|
|
118 | (1) |
|
4.5 Goodness-of-Fit Tests B |
|
|
119 | (5) |
|
4.5.1 One-Sample Goodness-of-Fit Tests B |
|
|
119 | (2) |
|
4.5.2 Two-Sample Goodness-of-Fit Tests B |
|
|
121 | (3) |
|
|
124 | (1) |
|
4.7 Mann--Kendall Trend Test B |
|
|
125 | (1) |
|
|
126 | (5) |
|
|
131 | (3) |
|
|
134 | (3) |
|
|
137 | (36) |
|
5.1 Simple Linear Regression A |
|
|
137 | (8) |
|
5.1.1 Partition of Sums of Squares A |
|
|
139 | (2) |
|
5.1.2 Confidence Interval for Regression Parameters B |
|
|
141 | (2) |
|
5.1.3 Confidence Interval and Prediction Interval for the Response Variable B |
|
|
143 | (1) |
|
5.1.4 Serial Correlation B |
|
|
144 | (1) |
|
5.2 Multiple Linear Regression A |
|
|
145 | (8) |
|
5.2.1 Gauss--Markov Theorem B |
|
|
147 | (1) |
|
5.2.2 Partition of Sums of Squares B |
|
|
148 | (1) |
|
5.2.3 Standardized Predictors A |
|
|
148 | (1) |
|
5.2.4 Analysis of Variance (ANOVA) B |
|
|
149 | (2) |
|
5.2.5 Confidence and Prediction Intervals B |
|
|
151 | (2) |
|
5.3 Multivariate Linear Regression B |
|
|
153 | (1) |
|
5.4 Online Learning with Linear Regression C |
|
|
154 | (2) |
|
5.5 Circular and Categorical Data A |
|
|
156 | (2) |
|
5.6 Predictor Selection C |
|
|
158 | (3) |
|
|
161 | (4) |
|
|
165 | (2) |
|
5.9 Quantile Regression C |
|
|
167 | (1) |
|
5.10 Generalized Least Squares C |
|
|
168 | (2) |
|
5.10.1 Optimal Fingerprinting in Climate Change C |
|
|
170 | (1) |
|
|
170 | (3) |
|
|
173 | (43) |
|
6.1 McCulloch and Pitts Model B |
|
|
174 | (1) |
|
|
175 | (5) |
|
6.2.1 Limitation of Perceptrons B |
|
|
178 | (2) |
|
6.3 Multi-layer Perceptrons A |
|
|
180 | (9) |
|
6.3.1 Comparison with Polynomials B |
|
|
185 | (1) |
|
|
186 | (1) |
|
6.3.3 Monotonic Multi-layer Perception Model C |
|
|
187 | (2) |
|
6.4 Extreme Learning Machines A |
|
|
189 | (6) |
|
|
193 | (1) |
|
6.4.2 Random Vector Functional Link C |
|
|
194 | (1) |
|
6.5 Radial Basis Functions B |
|
|
195 | (4) |
|
6.6 Modelling Conditional Distributions B |
|
|
199 | (5) |
|
6.6.1 Mixture Density Network B |
|
|
201 | (3) |
|
6.7 Quantile Regression C |
|
|
204 | (3) |
|
6.8 Historical Development of NN in Environmental Science B |
|
|
207 | (7) |
|
|
208 | (4) |
|
|
212 | (1) |
|
6.8.3 Atmospheric Science B |
|
|
213 | (1) |
|
|
213 | (1) |
|
|
214 | (2) |
|
7 Non-linear Optimization |
|
|
216 | (29) |
|
7.1 Extrema and Saddle Points A |
|
|
216 | (3) |
|
7.2 Gradient Vector in Optimization A |
|
|
219 | (2) |
|
|
221 | (3) |
|
|
224 | (1) |
|
7.5 Gradient Descent Method A |
|
|
225 | (2) |
|
7.6 Stochastic Gradient Descent B |
|
|
227 | (2) |
|
7.7 Conjugate Gradient Method C |
|
|
229 | (3) |
|
7.8 Quasi-Newton Methods C |
|
|
232 | (2) |
|
7.9 Non-linear Least Squares Methods C |
|
|
234 | (2) |
|
7.10 Evolutionary Algorithms B |
|
|
236 | (2) |
|
|
238 | (1) |
|
|
239 | (2) |
|
7.13 Differential Evolution C |
|
|
241 | (3) |
|
|
244 | (1) |
|
8 Learning and Generalization |
|
|
245 | (38) |
|
8.1 Mean Squared Error and Maximum Likelihood A |
|
|
245 | (2) |
|
8.2 Objective Functions and Robustness A |
|
|
247 | (3) |
|
8.3 Variance and Bias Errors A |
|
|
250 | (1) |
|
|
251 | (4) |
|
|
251 | (3) |
|
|
254 | (1) |
|
|
255 | (3) |
|
8.6 Hyperparameter Tuning A |
|
|
258 | (3) |
|
|
261 | (8) |
|
|
263 | (1) |
|
8.7.2 Error of Ensemble B |
|
|
263 | (3) |
|
8.7.3 Unequal Weighting of Ensemble Members C |
|
|
266 | (2) |
|
|
268 | (1) |
|
|
269 | (2) |
|
8.9 Maximum Norm Constraint B |
|
|
271 | (1) |
|
8.10 Bayesian Model Selection B |
|
|
272 | (1) |
|
8.11 Information Criterion B |
|
|
273 | (5) |
|
8.11.1 Bayesian Information Criterion C |
|
|
273 | (2) |
|
8.11.2 Akaike Information Criterion C |
|
|
275 | (3) |
|
8.12 Bayesian Model Averaging C |
|
|
278 | (2) |
|
|
280 | (1) |
|
|
281 | (2) |
|
9 Principal Components and Canonical Correlation |
|
|
283 | (47) |
|
9.1 Principal Component Analysis (PCA) A |
|
|
283 | (22) |
|
9.1.1 Geometric Approach to PCA A |
|
|
284 | (1) |
|
9.1.2 Eigenvector Approach to PCA A |
|
|
284 | (4) |
|
9.1.3 Real and Complex Data C |
|
|
288 | (1) |
|
9.1.4 Orthogonality Relations A |
|
|
289 | (1) |
|
9.1.5 PCA of the Tropical Pacific Climate Variability A |
|
|
290 | (8) |
|
9.1.6 Scaling the PCs and Eigenvectors A |
|
|
298 | (1) |
|
9.1.7 Degeneracy of Eigenvalues A |
|
|
299 | (1) |
|
9.1.8 A Smaller Covariance Matrix A |
|
|
299 | (1) |
|
9.1.9 How Many Modes to Retain B |
|
|
300 | (2) |
|
9.1.10 Temporal and Spatial Mean Removal B |
|
|
302 | (1) |
|
9.1.11 Singular Value Decomposition B |
|
|
302 | (2) |
|
|
304 | (1) |
|
|
305 | (12) |
|
|
308 | (3) |
|
|
311 | (4) |
|
9.2.3 Advantages and Disadvantages of Rotation A |
|
|
315 | (2) |
|
9.3 PCA for Two-Dimensional Vectors C |
|
|
317 | (3) |
|
9.4 Canonical Correlation Analysis (CCA) B |
|
|
320 | (6) |
|
|
320 | (4) |
|
9.4.2 Pre-filter with PCA B |
|
|
324 | (2) |
|
9.5 Maximum Covariance Analysis B |
|
|
326 | (1) |
|
|
327 | (3) |
|
|
330 | (42) |
|
|
330 | (7) |
|
10.1.1 Distance Measure B |
|
|
331 | (1) |
|
10.1.2 Model Evaluation B |
|
|
332 | (5) |
|
10.2 Non-hierarchical Clustering B |
|
|
337 | (2) |
|
10.2.1 K-means Clustering B |
|
|
337 | (1) |
|
10.2.2 Nucleated Agglomerative Clustering C |
|
|
338 | (1) |
|
10.2.3 Gaussian Mixture Model C |
|
|
339 | (1) |
|
10.3 Hierarchical Clustering C |
|
|
339 | (4) |
|
10.4 Self-Organizing Map C |
|
|
343 | (4) |
|
10.4.1 Applications of SOM C |
|
|
345 | (2) |
|
|
347 | (2) |
|
10.6 Non-linear Principal Component Analysis B |
|
|
349 | (14) |
|
|
358 | (2) |
|
|
360 | (3) |
|
10.7 Other Non-linear Dimensionality Reduction Methods C |
|
|
363 | (2) |
|
10.8 Non-linear Canonical Correlation Analysis C |
|
|
365 | (4) |
|
|
369 | (3) |
|
|
372 | (46) |
|
|
372 | (4) |
|
|
373 | (1) |
|
11.1.2 Discrete Fourier Transform A |
|
|
374 | (1) |
|
11.1.3 Continuous-Time Fourier Transform B |
|
|
375 | (1) |
|
11.1.4 Discrete-Time Fourier Transform B |
|
|
376 | (1) |
|
|
376 | (3) |
|
|
379 | (14) |
|
11.3.1 Effects of Window Functions A |
|
|
380 | (2) |
|
|
382 | (1) |
|
11.3.3 Nyquist Frequency and Aliasing A |
|
|
382 | (2) |
|
11.3.4 Smoothing the Spectrum A |
|
|
384 | (1) |
|
11.3.5 Confidence Interval B |
|
|
385 | (1) |
|
|
385 | (1) |
|
11.3.7 Fast Fourier Transform B |
|
|
386 | (1) |
|
11.3.8 Relation with Auto-covariance B |
|
|
387 | (1) |
|
11.3.9 Rotary Spectrum for 2-D Vectors C |
|
|
387 | (3) |
|
|
390 | (3) |
|
|
393 | (2) |
|
|
395 | (6) |
|
11.5.1 Periodic Signals A |
|
|
396 | (1) |
|
|
397 | (1) |
|
11.5.3 Finite Impulse Response Filters B |
|
|
398 | (3) |
|
|
401 | (4) |
|
11.6.1 Moving Average Filters A |
|
|
402 | (1) |
|
11.6.2 Grid-Scale Noise C |
|
|
402 | (2) |
|
11.6.3 Linearization from Time-Averaging B |
|
|
404 | (1) |
|
11.7 Singular Spectrum Analysis B |
|
|
405 | (5) |
|
11.7.1 Multichannel Singular Spectrum Analysis B |
|
|
407 | (3) |
|
11.8 Auto-regressive Process B |
|
|
410 | (4) |
|
|
411 | (1) |
|
|
412 | (1) |
|
|
413 | (1) |
|
11.9 Box-Jenkins Models C |
|
|
414 | (2) |
|
11.9.1 Moving Average (MA) Process C |
|
|
414 | (1) |
|
11.9.2 Auto-regressive Moving Average (ARMA) Model C |
|
|
414 | (1) |
|
11.9.3 Auto-regressive Integrated Moving Average (ARIMA) Model C |
|
|
415 | (1) |
|
|
416 | (2) |
|
|
418 | (22) |
|
12.1 Linear Discriminant Analysis A |
|
|
419 | (5) |
|
12.1.1 Fisher Linear Discriminant B |
|
|
421 | (3) |
|
12.2 Logistic Regression A |
|
|
424 | (3) |
|
12.2.1 Multiclass Logistic Regression B |
|
|
425 | (2) |
|
12.3 Naive Bayes Classifier B |
|
|
427 | (1) |
|
12.4 K-nearest Neighbours B |
|
|
428 | (2) |
|
12.5 Extreme Learning Machine Classifier A |
|
|
430 | (2) |
|
|
432 | (2) |
|
12.7 Multi-layer Perceptron Classifier A |
|
|
434 | (2) |
|
|
436 | (2) |
|
|
438 | (2) |
|
|
440 | (33) |
|
13.1 From Neural Networks to Kernel Methods B |
|
|
441 | (1) |
|
13.2 Primal and Dual Solutions for Linear Regression B |
|
|
442 | (2) |
|
|
444 | (4) |
|
13.4 Kernel Ridge Regression B |
|
|
448 | (1) |
|
13.5 Advantages and Disadvantages B |
|
|
449 | (2) |
|
|
451 | (2) |
|
13.7 Support Vector Machines (SVM) B |
|
|
453 | (10) |
|
13.7.1 Linearly Separable Case B |
|
|
454 | (4) |
|
13.7.2 Linearly Non-separable Case B |
|
|
458 | (2) |
|
13.7.3 Non-linear Classification by SVM B |
|
|
460 | (1) |
|
13.7.4 Multi-class Classification by SVM C |
|
|
461 | (1) |
|
13.7.5 Support Vector Regression C |
|
|
462 | (1) |
|
13.8 Gaussian Processes B |
|
|
463 | (6) |
|
13.8.1 Learning the Hyperparameters B |
|
|
466 | (1) |
|
13.8.2 Other Common Kernels C |
|
|
467 | (2) |
|
13.9 Kernel Principal Component Analysis C |
|
|
469 | (2) |
|
|
471 | (2) |
|
14 Decision Trees, Random Forests and Boosting |
|
|
473 | (21) |
|
14.1 Classification and Regression Trees (CART) A |
|
|
474 | (8) |
|
14.1.1 Relative Importance of Predictors B |
|
|
479 | (1) |
|
14.1.2 Surrogate Splits B |
|
|
480 | (2) |
|
|
482 | (5) |
|
14.2.1 Extremely Randomized Trees (Extra Trees) B |
|
|
487 | (1) |
|
|
487 | (5) |
|
14.3.1 Gradient Boosting B |
|
|
488 | (4) |
|
|
492 | (2) |
|
|
494 | (24) |
|
|
498 | (1) |
|
15.2 Convolutional Neural Network B |
|
|
499 | (8) |
|
15.2.1 Convolution Operation B |
|
|
499 | (3) |
|
|
502 | (1) |
|
|
503 | (2) |
|
15.2.4 Residual Neural Network (ResNet) B |
|
|
505 | (1) |
|
15.2.5 Data Augmentation C |
|
|
506 | (1) |
|
15.2.6 Applications in Environment Science B |
|
|
506 | (1) |
|
15.3 Encoder-Decoder Network B |
|
|
507 | (3) |
|
|
508 | (2) |
|
|
510 | (4) |
|
15.4.1 Long Short-Term Memory (LSTM) Network C |
|
|
510 | (3) |
|
15.4.2 Temporal Convolutional Network C |
|
|
513 | (1) |
|
15.5 Generative Adversarial Network C |
|
|
514 | (3) |
|
|
517 | (1) |
|
16 Forecast Verification and Post-processing |
|
|
518 | (31) |
|
|
519 | (8) |
|
16.1.1 Skill Scores for Binary Classes B |
|
|
524 | (3) |
|
|
527 | (1) |
|
16.3 Probabilistic Forecasts for Binary Classes B |
|
|
528 | (3) |
|
16.3.1 Reliability Diagram B |
|
|
529 | (2) |
|
16.4 Probabilistic Forecasts for Multiple Classes B |
|
|
531 | (1) |
|
16.5 Continuous Variables B |
|
|
532 | (3) |
|
|
532 | (1) |
|
|
533 | (2) |
|
16.6 Probabilistic Forecasts for Continuous Variables B |
|
|
535 | (1) |
|
|
535 | (1) |
|
|
536 | (2) |
|
|
538 | (3) |
|
|
541 | (2) |
|
|
543 | (4) |
|
16.11.1 Reduced Variance C |
|
|
546 | (1) |
|
|
547 | (2) |
|
17 Merging of Machine Learning and Physics |
|
|
549 | (20) |
|
17.1 Physics Emulation and Hybrid Models C |
|
|
550 | (6) |
|
17.1.1 Radiation in Atmospheric Models C |
|
|
550 | (1) |
|
|
551 | (2) |
|
17.1.3 Turbulent Fluxes C |
|
|
553 | (1) |
|
17.1.4 Hybrid Coupled Atmosphere-Ocean Modelling C |
|
|
554 | (1) |
|
17.1.5 Wind Wave Modelling C |
|
|
555 | (1) |
|
17.2 Physics-Informed Machine Learning A |
|
|
556 | (4) |
|
|
556 | (2) |
|
|
558 | (2) |
|
17.3 Data Assimilation and ML C |
|
|
560 | (8) |
|
|
562 | (1) |
|
|
563 | (1) |
|
17.3.3 Neural Networks in 4D-Var C |
|
|
564 | (4) |
|
|
568 | (1) |
|
|
569 | (4) |
|
|
569 | (1) |
|
|
569 | (4) |
References |
|
573 | (40) |
Index |
|
613 | |