Muutke küpsiste eelistusi

Machine Learning and Big Data with kdbplus/q [Kõva köide]

  • Formaat: Hardback, 640 pages, kõrgus x laius x paksus: 246x168x41 mm, kaal: 1225 g
  • Sari: Wiley Finance
  • Ilmumisaeg: 21-Nov-2019
  • Kirjastus: John Wiley & Sons Inc
  • ISBN-10: 1119404754
  • ISBN-13: 9781119404750
Teised raamatud teemal:
  • Formaat: Hardback, 640 pages, kõrgus x laius x paksus: 246x168x41 mm, kaal: 1225 g
  • Sari: Wiley Finance
  • Ilmumisaeg: 21-Nov-2019
  • Kirjastus: John Wiley & Sons Inc
  • ISBN-10: 1119404754
  • ISBN-13: 9781119404750
Teised raamatud teemal:

Upgrade your programming language to more effectively handle high-frequency data

Machine Learning and Big Data with KDB+/Q offers quants, programmers and algorithmic traders a practical entry into the powerful but non-intuitive kdb+ database and q programming language. Ideally designed to handle the speed and volume of high-frequency financial data at sell- and buy-side institutions, these tools have become the de facto standard; this book provides the foundational knowledge practitioners need to work effectively with this rapidly-evolving approach to analytical trading.

The discussion follows the natural progression of working strategy development to allow hands-on learning in a familiar sphere, illustrating the contrast of efficiency and capability between the q language and other programming approaches. Rather than an all-encompassing “bible”-type reference, this book is designed with a focus on real-world practicality ­to help you quickly get up to speed and become productive with the language.

  • Understand why kdb+/q is the ideal solution for high-frequency data
  • Delve into “meat” of q programming to solve practical economic problems
  • Perform everyday operations including basic regressions, cointegration, volatility estimation, modelling and more
  • Learn advanced techniques from market impact and microstructure analyses to machine learning techniques including neural networks

The kdb+ database and its underlying programming language q offer unprecedented speed and capability. As trading algorithms and financial models grow ever more complex against the markets they seek to predict, they encompass an ever-larger swath of data ­– more variables, more metrics, more responsiveness and altogether more “moving parts.”

Traditional programming languages are increasingly failing to accommodate the growing speed and volume of data, and lack the necessary flexibility that cutting-edge financial modelling demands. Machine Learning and Big Data with KDB+/Q opens up the technology and flattens the learning curve to help you quickly adopt a more effective set of tools.   

Preface xvii
About the Authors xxiii
PART ONE Language Fundamentals
Chapter 1 Fundamentals of the q Programming Language
3(38)
1.1 The (Not So Very) First Steps in q
3(2)
1.2 Atoms and Lists
5(9)
1.2.1 Casting Types
11(3)
1.3 Basic Language Constructs
14(5)
1.3.1 Assigning, Equality and Matching
14(3)
1.3.2 Arithmetic Operations and Right-to-Left Evaluation: Introduction to q Philosophy
17(2)
1.4 Basic Operators
19(12)
1.5 Difference between Strings and Symbols
31(2)
1.5.1 Enumeration
31(2)
1.6 Matrices and Basic Linear Algebra in q
33(2)
1.7 Launching the Session: Additional Options
35(3)
1.8 Summary and How-To's
38(3)
Chapter 2 Dictionaries and Tables: The q Fundamentals
41(16)
2.1 Dictionary
41(3)
2.2 Table
44(4)
2.3 The Truth about Tables
48(2)
2.4 Keyed Tables are Dictionaries
50(1)
2.5 From a Vector Language to an Algebraic Language
51(6)
Chapter 3 Functions
57(24)
3.1 Namespace
59(1)
3.1.0.1 .quantQ. Namespace
60(1)
3.2 The Six Adverbs
60(12)
3.2.1 Each
60(1)
3.2.1.1 Each
61(1)
3.2.1.2 Each-left \:
61(1)
3.2.1.3 Each-right /:
62(1)
3.2.1.4 Cross Product /: \:
62(1)
3.2.1.5 Each-both '
63(3)
3.2.2 Each-prior ':
66(1)
3.2.3 Compose (')
67(1)
3.2.4 Over and Fold
67(1)
3.2.5 Scan
68(1)
3.2.5.1 EMA: The Exponential Moving Average
69(1)
3.2.6 Converge
70(1)
3.2.6.1 Converge-repeat
70(1)
3.2.6.2 Converge-iterate
71(1)
3.3 Apply
72(3)
3.3.1 @ (apply)
72(1)
3.3.2 . (apply)
73(2)
3.4 Protected Evaluations
75(1)
3.5 Vector Operations
76(3)
3.5.1 Aggregators
76(1)
3.5.1.1 Simple Aggregators
76(1)
3.5.1.2 Weighted Aggregators
77(1)
3.5.2 Uniform Functions
77(1)
3.5.2.1 Running Functions
77(1)
3.5.2.2 Window Functions
78(1)
3.6 Convention for User-Defined Functions
79(2)
Chapter 4 Editors and Other Tools
81(12)
4.1 Console
81(1)
4.2 Jupyter Notebook
82(2)
4.3 GUIs
84(6)
4.3.1 qStudio
85(3)
4.3.2 Q Insight Pad
88(2)
4.4 IDEs: IntelliJ IDEA
90(2)
4.5 Conclusion
92(1)
Chapter 5 Debugging q Code
93(14)
5.1 Introduction to Making It Wrong: Errors
93(7)
5.1.1 Syntax Errors
94(1)
5.1.2 Runtime Errors
94(1)
5.1.2.1 The Type Error
95(3)
5.1.2.2 Other Errors
98(2)
5.2 Debugging the Code
100(2)
5.3 Debugging Server-Side
102(5)
PART TWO Data Operations
Chapter 6 Splayed and Partitioned Tables
107(14)
6.1 Introduction
107(1)
6.2 Saving a Table as a Single Binary File
108(2)
6.3 Splayed Tables
110(3)
6.4 Partitioned Tables
113(6)
6.5 Conclusion
119(2)
Chapter 7 Joins
121(30)
7.1 Comma Operator
121(4)
7.2 Join Functions
125(19)
7.2.1 ij
125(1)
7.2.2 ej
126(1)
7.2.3 Ij
126(1)
7.2.4 pj
127(1)
7.2.5 upsert
128(1)
7.2.6 uj
129(2)
7.2.7 aj
131(3)
7.2.8 ajO
134(1)
7.2.8.1 The Next Valid Join
135(3)
7.2.9 asof
138(2)
7.2.10 Wj
140(4)
7.3 Advanced Example: Running TWAP
144(7)
Chapter 8 Parallelisation
151(10)
8.1 Parallel Vector Operations
152(3)
8.2 Parallelisation over Processes
155(1)
8.3 Map-Reduce
155(3)
8.4 Advanced Topic: Parallel File/Directory Access
158(3)
Chapter 9 Data Cleaning and Filtering
161(4)
9.1 Predicate Filtering
161(2)
9.1.1 The Where Clause
161(2)
9.1.2 Aggregation Filtering
163(1)
9.2 Data Cleaning, Normalising and APIs
163(2)
Chapter 10 Parse Trees
165(16)
10.1 Definition
166(5)
10.1.1 Evaluation
166(4)
10.1.2 Parse Tree Creation
170(1)
10.1.3 Read-Only Evaluation
170(1)
10.2 Functional Queries
171(10)
10.2.1 Functional Select
174(4)
10.2.2 Functional Exec
178(1)
10.2.3 Functional Update
179(1)
10.2.4 Functional Delete
180(1)
Chapter 11 A Few Use Cases
181(10)
11.1 Rolling VWAP
181(2)
11.1.1 N Tick VWAP
181(1)
11.1.2 Time Window VWAP
182(1)
11.2 Weighted Mid for N Levels of an Order Book
183(2)
11.3 Consecutive Runs of a Rule
185(1)
11.4 Real-Time Signals and Alerts
186(5)
PART THREE Data Science
Chapter 12 Basic Overview of Statistics
191(38)
12.1 Histogram
191(5)
12.2 First Moments
196(2)
12.3 Hypothesis Testing
198(31)
12.3.1 Normal p-values
198(3)
12.3.2 Correlation
201(1)
12.3.2.1 Implementation
202(1)
12.3.3 t-test: One Sample
202(2)
12.3.3.1 Implementation
204(1)
12.3.4 t-test: Two Samples
204(1)
12.3.4.1 Implementation
205(1)
12.3.5 Sign Test
206(2)
12.3.5.1 Implementation of the Test
208(3)
12.3.5.2 Median Test
211(1)
12.3.6 Wilcoxon Signed-Rank Test
212(2)
12.3.7 Rank Correlation and Somers' D
214(2)
12.3.7.1 Implementation
216(5)
12.3.8 Multiple Hypothesis Testing
221(3)
12.3.8.1 Bonferroni Correction
224(1)
12.3.8.2 Sidak's Correction
224(1)
12.3.8.3 Holm's Method
225(1)
12.3.8.4 Example
226(3)
Chapter 13 Linear Regression
229(36)
13.1 Linear Regression
230(1)
13.2 Ordinary Least Squares
231(2)
13.3 The Geometric Representation of Linear Regression
233(7)
13.3.1 Moore-Penrose Pseudoinverse
235(2)
13.3.2 Adding Intercept
237(3)
13.4 Implementation of the OLS
240(3)
13.5 Significance of Parameters
243(1)
13.6 How Good is the Fit: R2
244(4)
13.6.1 Adjusted R-squared
247(1)
13.7 Relationship with Maximum Likelihood Estimation and AIC with Small Sample Correction
248(4)
13.8 Estimation Suite
252(2)
13.9 Comparing Two Nested Models: Towards a Stopping Rule
254(3)
13.9.1 Comparing Two General Models
256(1)
13.10 In-/Out-of-Sample Operations
257(5)
13.11 Cross-validation
262(2)
13.12 Conclusion
264(1)
Chapter 14 Time Series Econometrics
265(36)
14.1 Autoregressive and Moving Average Processes
265(20)
14.1.1 Introduction
265(1)
14.1.2 AR(p) Process
266(1)
14.1.2.1 Simulation
266(2)
14.1.2.2 Estimation of AR(p) Parameters
268(1)
14.1.2.3 Least Square Method
268(1)
14.1.2.4 Example
269(1)
14.1.2.5 Maximum Likelihood Estimator
269(1)
14.1.2.6 Yule-Walker Technique
269(2)
14.1.3 MA(q) Process
271(1)
14.1.3.1 Estimation of MA(q) Parameters
272(1)
14.1.3.2 Simulation
272(1)
14.1.3.3 Example
273(1)
14.1.4 ARMA(p, q) Process
273(1)
14.1.4.1 Invertibility of the ARMA(p, q) Process
274(1)
14.1.4.2 Hannan-Rissanen Algorithm: Two-Step Regression Estimation
274(1)
14.1.4.3 Yule-Walker Estimation
274(1)
14.1.4.4 Maximum Likelihood Estimation
275(1)
14.1.4.5 Simulation
275(1)
14.1.4.6 Forecast
276(1)
14.1.5 ARIMA(p, d, q) Process
276(1)
14.1.6 Code
276(1)
14.1.6.1 Simulation
277(1)
14.1.6.2 Estimation
278(4)
14.1.6.3 Forecast
282(3)
14.2 Stationarity and Granger Causality
285(2)
14.2.1 Stationarity
285(1)
14.2.2 Test of Stationarity -- Dickey-Fuller and Augmented Dickey-Fuller Tests
286(1)
14.2.3 Granger Causality
286(1)
14.3 Vector Autoregression
287(14)
14.3.1 VAR(p) Process
288(1)
14.3.1.1 Notation
288(1)
14.3.1.2 Estimator
288(1)
14.3.1.3 Example
289(4)
14.3.1.4 Code
293(4)
14.3.2 VARX(p, q) Process
297(1)
14.3.2.1 Estimator
297(1)
14.3.2.2 Code
298(3)
Chapter 15 Fourier Transform
301(24)
15.1 Complex Numbers
301(7)
15.1.1 Properties of Complex Numbers
302(6)
15.2 Discrete Fourier Transform
308(6)
15.3 Addendum: Quaternions
314(7)
15.4 Addendum: Fractals
321(4)
Chapter 16 Eigensystem and PCA
325(34)
16.1 Theory
325(2)
16.2 Algorithms
327(5)
16.2.1 QR Decomposition
328(2)
16.2.2 QR Algorithm for Eigenvalues
330(1)
16.2.3 Inverse Iteration
331(1)
16.3 Implementation of Eigensystem Calculation
332(9)
16.3.1 QR Decomposition
333(4)
16.3.2 Inverse Iteration
337(4)
16.4 The Data Matrix and the Principal Component Analysis
341(10)
16.4.1 The Data Matrix
341(3)
16.4.2 PCA: The First Principal Component
344(1)
16.4.3 Second Principal Component
345(2)
16.4.4 Terminology and Explained Variance
347(2)
16.4.5 Dimensionality Reduction
349(1)
16.4.6 PCA Regression (PCR)
350(1)
16.5 Implementation of PCA
351(3)
16.6 Appendix: Determinant
354(5)
16.6.1 Theory
354(1)
16.6.2 Techniques to Calculate a Determinant
355(1)
16.6.3 Implementation of the Determinant
356(3)
Chapter 17 Outlier Detection
359(10)
17.1 Local Outlier Factor
360(9)
Chapter 18 Simulating Asset Prices
369(12)
18.1 Stochastic Volatility Process with Price Jumps
369(2)
18.2 Towards the Numerical Example
371(7)
18.2.1 Numerical Preliminaries
371(3)
18.2.2 Implementing Stochastic Volatility Process with Jumps
374(4)
18.3 Conclusion
378(3)
PART FOUR Machine Learning
Chapter 19 Basic Principles of Machine Learning
381(10)
19.1 Non-Numeric Features and Normalisation
381(5)
19.1.1 Non-Numeric Features
381(1)
19.1.1.1 Ordinal Features
382(1)
19.1.1.2 Categorical Features
383(1)
19.1.2 Normalisation
383(1)
19.1.2.1 Normal Score
384(1)
19.1.2.2 Range Scaling
385(1)
19.2 Iteration: Constructing Machine Learning Algorithms
386(5)
19.2.1 Iteration
386(3)
19.2.2 Constructing Machine Learning Algorithms
389(2)
Chapter 20 Linear Regression with Regularisation
391(28)
20.1 Bias--Variance Trade-off
392(1)
20.2 Regularisation
393(1)
20.3 Ridge Regression
394(2)
20.4 Implementation of the Ridge Regression
396(7)
20.4.1 Optimisation of the Regularisation Parameter
401(2)
20.5 Lasso Regression
403(2)
20.6 Implementation of the Lasso Regression
405(14)
Chapter 21 Nearest Neighbours
419(18)
21.1 k-Nearest Neighbours Classifier
419(4)
21.2 Prototype Clustering
423(6)
21.3 Feature Selection: Local Nearest Neighbours Approach
429(8)
21.3.1 Implementation
430(7)
Chapter 22 Neural Networks
437(28)
22.1 Theoretical Introduction
437(8)
22.1.1 Calibration
440(1)
22.1.1.1 Backpropagation
441(2)
22.1.2 The Learning Rate Parameter
443(1)
22.1.3 Initialisation
443(1)
22.1.4 Overfitting
444(1)
22.1.5 Dimension of the Hidden Layer(s)
444(1)
22.2 Implementation of Neural Networks
445(6)
22.2.1 Multivariate Encoder
445(1)
22.2.2 Neurons
446(2)
22.2.3 Training the Neural Network
448(3)
22.3 Examples
451(12)
22.3.1 Binary Classification
451(3)
22.3.2 M-class Classification
454(3)
22.3.3 Regression
457(6)
22.4 Possible Suggestions
463(2)
Chapter 23 AdaBoost with Stumps
465(12)
23.1 Boosting
465(1)
23.2 Decision Stumps
466(1)
23.3 AdaBoost
467(1)
23.4 Implementation of AdaBoost
468(6)
23.5 Recommendation for Readers
474(3)
Chapter 24 Trees
477(18)
24.1 Introduction to Trees
477(2)
24.2 Regression Trees
479(3)
24.2.1 Cost-Complexity Pruning
481(1)
24.3 Classification Tree
482(2)
24.4 Miscellaneous
484(1)
24.5 Implementation of Trees
485(10)
Chapter 25 Forests
495(14)
25.1 Bootstrap
495(3)
25.2 Bagging
498(2)
25.2.1 Out-of-Bag
499(1)
25.3 Implementation
500(9)
25.3.1 Prediction
503(2)
25.3.2 Feature Selection
505(4)
Chapter 26 Unsupervised Machine Learning: The Apriori Algorithm
509(14)
26.1 Apriori Algorithm
510(1)
26.2 Implementation of the Apriori Algorithm
511(12)
Chapter 27 Processing Information
523(18)
27.1 Information Retrieval
523(9)
27.1.1 Corpus: Leonardo da Vinci
523(1)
27.1.2 Frequency Counting
524(4)
27.1.3 tf-idf
528(4)
27.2 Information as Features
532(9)
27.2.1 Sample: Simulated Proteins
533(2)
27.2.2 Kernels and Metrics for Proteins
535(1)
27.2.3 Implementation of Inner Products and Nearest Neighbours Principles
535(4)
27.2.4 Further Topics
539(2)
Chapter 28 Towards Al -- Monte Carlo Tree Search
541(42)
28.1 Multi-Armed Bandit Problem
541(17)
28.1.1 Analytic Solutions
543(1)
28.1.2 Greedy Algorithms
543(1)
28.1.3 Confidence-Based Algorithms
544(2)
28.1.4 Bayesian Algorithms
546(1)
28.1.5 Online Gradient Descent Algorithms
547(1)
28.1.6 Implementation of Some Learning Algorithms
547(11)
28.2 Monte Carlo Tree Search
558(7)
28.2.1 Selection Step
561(1)
28.2.2 Expansion Step
562(1)
28.2.3 Simulation Step
563(1)
28.2.4 Back Propagation Step
563(1)
28.2.5 Finishing the Algorithm
563(1)
28.2.6 Remarks and Extensions
564(1)
28.3 Monte Carlo Tree Search Implementation -- Tic-tac-toe
565(14)
28.3.1 Random Games
566(4)
28.3.2 Towards the MCTS
570(9)
28.3.3 Case Study
579(1)
28.4 Monte Carlo Tree Search -- Additional Comments
579(4)
28.4.1 Policy and Value Networks
579(2)
28.4.2 Reinforcement Learning
581(2)
Chapter 29 Econophysics: The Agent-Based Computational Models
583(12)
29.1 Agent-Based Modelling
584(3)
29.1.1 Agent-Based Models in Society
584(2)
29.1.2 Agent-Based Models in Finance
586(1)
29.2 Ising Agent-Based Model for Financial Markets
587(5)
29.2.1 Ising Model in Physics
587(1)
29.2.2 Ising Model of Interacting Agents
587(1)
29.2.3 Numerical Implementation
588(4)
29.3 Conclusion
592(3)
Chapter 30 Epilogue: Art
595(6)
Bibliography 601(6)
Index 607
JAN NOVOTNY is an eFX quant trader at Deutsche Bank. Previously, he worked at the Centre for Econometric Analysis on high-frequency econometric models. He holds a PhD from CERGE-EI, Charles University, Prague.

PAUL A. BILOKON is CEO and founder of Thalesians Ltd and an expert in algorithmic trading. He previously worked at Nomura, Lehman Brothers, and Morgan Stanley. Paul was educated at Christ Church College, Oxford, and Imperial College.

ARIS GALIOTOS is the global technical lead for the eFX kdb+ team at HSBC, where he helps develop a big data installation processing billions of real-time records per day. Aris holds an MSc in Financial Mathematics with Distinction from the University of Edinburgh.

FRÉDÉRIC DÉLÈZE is an independent algorithm trader and consultant. He has designed automated trading strategies for hedge funds and developed quantitative risk models for investment banks. He holds a PhD in Finance from Hanken School of Economics, Helsinki.