Muutke küpsiste eelistusi

Foundations of Reinforcement Learning with Applications in Finance [Kõva köide]

, (Stanford University, USA)
  • Formaat: Hardback, 500 pages, kõrgus x laius: 254x178 mm, kaal: 1300 g, 2 Tables, black and white; 83 Line drawings, black and white; 83 Illustrations, black and white
  • Sari: Chapman & Hall/CRC Mathematics and Artificial Intelligence Series
  • Ilmumisaeg: 16-Dec-2022
  • Kirjastus: Chapman & Hall/CRC
  • ISBN-10: 1032124121
  • ISBN-13: 9781032124124
  • Formaat: Hardback, 500 pages, kõrgus x laius: 254x178 mm, kaal: 1300 g, 2 Tables, black and white; 83 Line drawings, black and white; 83 Illustrations, black and white
  • Sari: Chapman & Hall/CRC Mathematics and Artificial Intelligence Series
  • Ilmumisaeg: 16-Dec-2022
  • Kirjastus: Chapman & Hall/CRC
  • ISBN-10: 1032124121
  • ISBN-13: 9781032124124
"Foundations of Reinforcement Learning with Applications in Finance aims to demystify Reinforcement Learning, and to make it a practically useful tool for those studying and working in applied areas - especially finance. Reinforcement Learning is emerging as a viable and powerful technique for solving a variety of complex problems across industries that involve Sequential Optimal Decisioning under Uncertainty. Its penetration in high-profile problems like self-driving cars, robotics, and strategy games points to a future where Reinforcement Learning algorithms will have decisioning abilities far superior to humans. But when it comes getting educated in this area, there seems to be a reluctance to jump right in, because Reinforcement Learning appears to have acquired a reputation for being mysterious and exotic. Even technical people will often claim that the subject involves "advanced math" and "complicated engineering", erecting a psychological barrier to entry against otherwise interested students. This book seeks to overcome that barrier, and to introduce the foundations of Reinforcement Learning in a way that balances depth of understanding with clear, minimally technical delivery. Features Focus on the foundational theory underpinning Reinforcement Learning Suitable as a primary text for courses in Reinforcement Learning, but also as supplementary reading for applied/financial mathematics, programming, and other related courses Suitable for a professional audience of quantitative analysts or industry specialists Blends theory/mathematics, programming/algorithms and real-world financial nuances while always striving to maintain simplicity and to build intuitive understanding"--

This book demystifies Reinforcement Learning, and makes it a practically useful tool for those studying and working in applied areas, especially finance. This book seeks to overcome that barrier, and to introduce the foundations of RL in a way that balances depth of understanding with clear, minimally technical delivery.



Foundations of Reinforcement Learning with Applications in Finance aims to demystify Reinforcement Learning, and to make it a practically useful tool for those studying and working in applied areas — especially finance.

Reinforcement Learning is emerging as a powerful technique for solving a variety of complex problems across industries that involve Sequential Optimal Decisioning under Uncertainty. Its penetration in high-profile problems like self-driving cars, robotics, and strategy games points to a future where Reinforcement Learning algorithms will have decisioning abilities far superior to humans. But when it comes getting educated in this area, there seems to be a reluctance to jump right in, because Reinforcement Learning appears to have acquired a reputation for being mysterious and technically challenging.

This book strives to impart a lucid and insightful understanding of the topic by emphasizing the foundational mathematics and implementing models and algorithms in well-designed Python code, along with robust coverage of several financial trading problems that can be solved with Reinforcement Learning. This book has been created after years of iterative experimentation on the pedagogy of these topics while being taught to university students as well as industry practitioners.

Features

  • Focus on the foundational theory underpinning Reinforcement Learning and software design of the corresponding models and algorithms
  • Suitable as a primary text for courses in Reinforcement Learning, but also as supplementary reading for applied/financial mathematics, programming, and other related courses
  • Suitable for a professional audience of quantitative analysts or data scientists
  • Blends theory/mathematics, programming/algorithms and real-world financial nuances while always striving to maintain simplicity and to build intuitive understanding
  • To access the code base for this book, please go to: https://github.com/TikhonJelvis/RL-book

Arvustused

This book is a nice addition to the literature on Reinforcement Learning (RL), offering comprehensive coverage of both foundational RL techniques and their applications in the field of finance. It has the potential to be a foundational reference for both practitioners and researchers in finance. The book delves into essential RL concepts such as Markov Decision Processes (MDPs), Dynamic Programming, Policy Optimization, Actor-Critic models, Multi-armed Bandits, and Regret Bounds.

Despite its finance-oriented approach, individuals without an extensive financial background but possessing a decent machine learning (ML) background will find it easy to read this book.

By encompassing all of the major asset classes including equities, fixed income and derivatives, the book caters to a broad range of readers, enabling them to apply RL techniques to diverse financial scenarios. In summary, this book is an outstanding resource that combines RL fundamentals with practical applications in finance. Natesh Pillai, Department of Statistics, Harvard University, Unites States of America

Preface xv
Author Biographies xix
Summary of Notation xxi
Chapter 1 Overview
1(18)
1.1 Learning Reinforcement Learning
1(1)
1.2 What You Will Learn From This Book
2(1)
1.3 Expected Background To Read This Book
3(1)
1.4 Decluttering The Jargon Linked To Reinforcement Learning
4(1)
1.5 Introduction To The Markov Decision Process (Mdp) Framework
5(3)
1.6 Real-World Problems That Fit The Mdp Framework
8(1)
1.7 The Inherent Difficulty In Solving Mdps
9(1)
1.8 Value Function, Bellman Equations, Dynamic Programming And Rl
10(2)
1.9 Outline Of
Chapters
12(7)
1.9.1 Module I: Processes and Planning Algorithms
13(1)
1.9.2 Module II: Modeling Financial Applications
13(2)
1.9.3 Module III: Reinforcement Learning Algorithms
15(1)
1.9.4 Module IV: Finishing Touches
16(1)
1.9.5 Short Appendix
Chapters
16(3)
Chapter 2 Programming and Design
19(22)
2.1 Code Design
19(1)
2.2 Environment Setup
20(1)
2.3 Classes and Interfaces
21(11)
2.3.1 A Distribution Interface
22(1)
2.3.2 A Concrete Distribution
22(2)
2.3.2.1 Dataclasses
24(2)
2.3.2.2 Immutability
26(1)
2.3.3 Checking Types
27(1)
2.3.3.1 Static Typing
28(1)
2.3.4 Type Variables
29(1)
2.3.5 Functionality
30(2)
2.4 Abstracting Over Computation
32(6)
2.4.1 First-Class Functions
32(2)
2.4.1.1 Lambdas
34(1)
2.4.2 Iterative Algorithms
34(1)
2.4.2.1 Iterators and Generators
35(3)
2.5 Key Takeaways From This
Chapter
38(3)
Module I Processes and Planning Algorithms
Chapter 3 Markov Processes
41(32)
3.1 The Concept Of State In A Process
41(1)
3.2 Understanding Markov Property From Stock Price Examples
41(7)
3.3 Formal Definitions For Markov Processes
48(4)
3.3.1 Starting States
49(1)
3.3.2 Terminal States
50(1)
3.3.3 Markov Process Implementation
50(2)
3.4 Stock Price Examples Modeled As Markov Processes
52(1)
3.5 Finite Markov Processes
53(2)
3.6 Simple Inventory Example
55(3)
3.7 Stationary Distribution Of A Markov Process
58(2)
3.8 Formalism Of Markov Reward Processes
60(3)
3.9 Simple Inventory Example As A Markov Reward Process
63(2)
3.10 Finite Markov Reward Processes
65(1)
3.11 Simple Inventory Example As A Finite Markov Reward Process
66(3)
3.12 Value Function Of A Markov Reward Process
69(2)
3.13 Summary Of Key Learnings From This
Chapter
71(2)
Chapter 4 Markov Decision Processes
73(30)
4.1 Simple Inventory Example: How Much To Order?
73(1)
4.2 The Difficulty Of Sequential Decisioning Under Uncertainty
74(1)
4.3 Formal Definition Of A Markov Decision Process
75(2)
4.4 Policy
77(3)
4.5 [ Markov Decision Process, Policy]: = Markov Reward Process
80(1)
4.6 Simple Inventory Example With Unlimited Capacity (Infinite State/Action Space)
81(2)
4.7 Finite Markov Decision Processes
83(2)
4.8 Simple Inventory Example As A Finite Markov Decision Process
85(3)
4.9 Mdp Value Function For A Fixed Policy
88(3)
4.10 Optimal Value Function And Optimal Policies
91(4)
4.11 Variants And Extensions Of Mdps
95(6)
4.11.1 Size of Spaces and Discrete versus Continuous
95(1)
4.11.1.1 State Space
95(2)
4.11.1.2 Action Space
97(1)
4.11.1.3 Time Steps
97(1)
4.11.2 Partially-Observable Markov Decision Processes (POMDPs)
98(3)
4.12 Summary Of Key Learnings From This
Chapter
101(2)
Chapter 5 Dynamic Programming Algorithms
103(36)
5.1 Planning Versus Learning
103(1)
5.2 Usage Of The Term Dynamic Programming
104(1)
5.3 Fixed-Point Theory
105(4)
5.4 Bellman Policy Operator And Policy Evaluation Algorithm
109(3)
5.5 Greedy Policy
112(1)
5.6 Policy Improvement
113(2)
5.7 Policy Iteration Algorithm
115(3)
5.8 Bellman Optimality Operator And Value Iteration Algorithm
118(3)
5.9 Optimal Policy From Optimal Value Function
121(1)
5.10 Revisiting The Simple Inventory Example
122(2)
5.11 Generalized Policy Iteration
124(3)
5.12 Asynchronous Dynamic Programming
127(1)
5.13 Finite-Horizon Dynamic Programming: Backward Induction
128(5)
5.14 Dynamic Pricing For End-Of-Life/End-Of-Season Of A Product
133(4)
5.15 Generalization To Non-Tabular Algorithms
137(1)
5.16 Summary Of Key Learnings From This
Chapter
138(1)
Chapter 6 Function Approximation and Approximate Dynamic Programming
139(34)
6.1 Function Approximation
140(4)
6.2 Linear Function Approximation
144(6)
6.3 Neural Network Function Approximation
150(11)
6.4 Tabular As A Form Of Functionapprox
161(3)
6.5 Approximate Policy Evaluation
164(1)
6.6 Approximate Value Iteration
165(1)
6.7 Finite-Horizon Approximate Policy Evaluation
166(1)
6.8 Finite-Horizon Approximate Value Iteration
167(1)
6.9 Finite-Horizon Approximate Q-Value Iteration
168(1)
6.10 How To Construct The Non-Terminal States Distribution
169(1)
6.11 Key Takeaways From This
Chapter
170(3)
Module II Modeling Financial Applications
Chapter 7 Utility Theory
173(12)
7.1 Introduction To The Concept Of Utility
173(1)
7.2 A Simple Financial Example
174(1)
7.3 The Shape Of The Utility Function
175(2)
7.4 Calculating The Risk-Premium
177(2)
7.5 Constant Absolute Risk-Aversion (Cara)
179(1)
7.6 A Portfolio Application Of Cara
180(1)
7.7 Constant Relative Risk-Aversion (Crra)
181(1)
7.8 A Portfolio Application Of Crra
182(1)
7.9 Key Takeaways From This
Chapter
183(2)
Chapter 8 Dynamic Asset-Allocation and Consumption
185(22)
8.1 Optimization Of Personal Finance
185(2)
8.2 Merton's Portfolio Problem and Solution
187(5)
8.3 Developing Intuition For The Solution To Merton's Portfolio Problem
192(3)
8.4 A Discrete-Time Asset-Allocation Example
195(4)
8.5 Porting To Real-World
199(7)
8.6 Key Takeaways From This
Chapter
206(1)
Chapter 9 Derivatives Pricing and Hedging
207(36)
9.1 A Brief Introduction To Derivatives
208(2)
9.1.1 Forwards
208(1)
9.1.2 European Options
209(1)
9.1.3 American Options
210(1)
9.2 Notation For The Single-Period Simple Setting
210(1)
9.3 Portfolios, Arbitrage And Risk-Neutral Probability Measure
211(1)
9.4 First Fundamental Theorem Of Asset Pricing (1st Ftap)
212(2)
9.5 Second Fundamental Theorem Of Asset Pricing (2nd Ftap)
214(1)
9.6 Derivatives Pricing In Single-Period Setting
215(12)
9.6.1 Derivatives Pricing When Market Is Complete
216(2)
9.6.2 Derivatives Pricing When Market Is Incomplete
218(8)
9.6.3 Derivatives Pricing When Market Has Arbitrage
226(1)
9.7 Derivatives Pricing In Multi-Period/Continuous-Time
227(2)
9.7.1 Multi-Period Complete-Market Setting
228(1)
9.7.2 Continuous-Time Complete-Market Setting
229(1)
9.8 Optimal Exercise Of American Options Cast As A Finite Mdp
229(7)
9.9 Generalizing To Optimal-Stopping Problems
236(2)
9.10 Pricing/Hedging In An Incomplete Market Cast As An Mdp
238(2)
9.11 Key Takeaways From This
Chapter
240(3)
Chapter 10 Order-Book Trading Algorithms
243(36)
10.1 Basics Of Order Book And Price Impact
243(9)
10.2 Optimal Execution Of A Market Order
252(12)
10.2.1 Simple Linear Price Impact Model with No Risk-Aversion
257(5)
10.2.2 Paper by Bertsimas and Lo on Optimal Order Execution
262(1)
10.2.3 Incorporating Risk-Aversion and Real-World Considerations
263(1)
10.3 Optimal Market-Making
264(11)
10.3.1 Avellaneda-Stoikov Continuous-Time Formulation
266(1)
10.3.2 Solving the Avellaneda-Stoikov Formulation
267(5)
10.3.3 Analytical Approximation to the Solution to Avellaneda-Stoikov Formulation
272(3)
10.3.4 Real-World Market-Making
275(1)
10.4 Key Takeaways From This
Chapter
275(4)
Module III Reinforcement Learning Algorithms
Chapter 11 Monte-Carlo and Temporal-Difference for Prediction
279(36)
11.1 Overview Of The Reinforcement Learning Approach
279(2)
11.2 Rl For Prediction
281(1)
11.3 Monte-Carlo (Mc) Prediction
282(6)
11.4 Temporal-Difference (Td) Prediction
288(5)
11.5 Td Versus Mc
293(12)
11.5.1 TD Learning Akin to Human Learning
293(1)
11.5.2 Bias, Variance and Convergence
294(2)
11.5.3 Fixed-Data Experience Replay on TD versus MC
296(6)
11.5.4 Bootstrapping and Experiencing
302(3)
11.6 Td(a) Prediction
305(9)
11.6.1 N-Step Bootstrapping Prediction Algorithm
305(1)
11.6.2 A-Return Prediction Algorithm
306(2)
11.6.3 Eligibility Traces
308(4)
11.6.4 Implementation of the TD(A) Prediction Algorithm
312(2)
11.7 Key Takeaways From This
Chapter
314(1)
Chapter 12 Monte-Carlo and Temporal-Difference for Control
315(34)
12.1 Refresher On Generalized Policy Iteration (Gpi)
315(1)
12.2 Gpi With Evaluation As Monte-Carlo
316(4)
12.3 Glie Monte-Control Control
320(5)
12.4 Sarsa
325(6)
12.5 Sarsa(λ)
331(1)
12.6 Off-Policy Control
332(12)
12.6.1 Q-Learning
333(2)
12.6.2 Windy Grid
335(7)
12.6.3 Importance Sampling
342(2)
12.7 Conceptual Linkage Between Dp And Td Algorithms
344(2)
12.8 Convergence Of Rl Algorithms
346(2)
12.9 Key Takeaways From This
Chapter
348(1)
Chapter 13 Batch RL, Experience-Replay, DQN, LSPI, Gradient TD
349(32)
13.1 Batch Rl And Experience-Replay
350(3)
13.2 A Generic Implementation Of Experience-Replay
353(1)
13.3 Least-Squares Rl Prediction
354(6)
13.3.1 Least-Squares Monte-Carlo (LSMC)
355(1)
13.3.2 Least-Squares Temporal-Difference (LSTD)
355(3)
13.3.3 LSTD(λ)
358(1)
13.3.4 Convergence of Least-Squares Prediction
359(1)
13.4 Q-Learning With Experience-Replay
360(3)
13.4.1 Deep Q-Networks (DON) Algorithm
362(1)
13.5 Least-Squares Policy Iteration (Lspi)
363(6)
13.5.1 Saving Your Village from a Vampire
365(3)
13.5.2 Least-Squares Control Convergence
368(1)
13.6 Rl For Optimal Exercise Of American Options
369(3)
13.6.1 LSPI for American Options Pricing
370(2)
13.6.2 Deep Q-Learning for American Options Pricing
372(1)
13.7 Value Function Geometry
372(6)
13.7.1 Notation and Definitions
373(1)
13.7.2 Bellman Policy Operator and Projection Operator
374(1)
13.7.3 Vectors of Interest in the Φ Subspace
374(4)
13.8 Gradient Temporal-Difference (Gradient Td)
378(1)
13.9 Key Takeaways From This
Chapter
379(2)
Chapter 14 Policy Gradient Algorithms
381(30)
14.1 Advantages And Disadvantages Of Policy Gradient Algorithms
382(1)
14.2 Policy Gradient Theorem
383(4)
14.2.1 Notation and Definitions
383(1)
14.2.2 Statement of the Policy Gradient Theorem
384(1)
14.2.3 Proof of the Policy Gradient Theorem
385(2)
14.3 Score Function For Canonical Policy Functions
387(1)
14.3.1 Canonical π(s, a; θ) for Finite Action Spaces
387(1)
14.3.2 Canonical π(s, a; θ) for Single-Dimensional Continuous Action Spaces
388(1)
14.4 Reinforce Algorithm (Monte-Carlo Policy Gradient)
388(3)
14.5 Optimal Asset Allocation (Revisited)
391(4)
14.6 Actor-Critic And Variance Reduction
395(5)
14.7 Overcoming Bias With Compatible Function Approximation
400(3)
14.8 Policy Gradient Methods In Practice
403(3)
14.8.1 Natural Policy Gradient
403(1)
14.8.2 Deterministic Policy Gradient
404(2)
14.9 Evolutionary Strategies
406(2)
14.10 Key Takeaways From This
Chapter
408(3)
Module IV Finishing Touches
Chapter 15 Multi-Armed Bandits: Exploration versus Exploitation
411(28)
15.1 Introduction To The Multi-Armed Bandit Problem
411(4)
15.1.1 Some Examples of Explore-Exploit Dilemma
412(1)
15.1.2 Problem Definition
412(1)
15.1.3 Regret
413(1)
15.1.4 Counts and Gaps
413(2)
15.2 Simple Algorithms
415(3)
15.2.1 Greedy and ε-Greedy
415(1)
15.2.2 Optimistic Initialization
415(1)
15.2.3 Decaying εt-Greedy Algorithm
416(2)
15.3 Lower Bound
418(1)
15.4 Upper Confidence Bound Algorithms
418(5)
15.4.1 Hoeffding's Inequality
421(1)
15.4.2 UCB1 Algorithm
421(1)
15.4.3 Bayesian UCB
422(1)
15.5 Probability Matching
423(5)
15.5.1 Thompson Sampling
425(3)
15.6 Gradient Bandits
428(3)
15.7 Horse Race
431(3)
15.8 Information State Space Mdp
434(1)
15.9 Extending To Contextual Bandits And Rl Control
435(2)
15.10 Key Takeaways From This
Chapter
437(2)
Chapter 16 Blending Learning and Planning
439(12)
16.1 Planning Versus Learning
439(4)
16.1.1 Planning the Solution of Prediction/Control
440(1)
16.1.2 Learning the Solution of Prediction/Control
441(1)
16.1.3 Advantages and Disadvantages of Planning versus Learning
441(1)
16.1.4 Blending Planning and Learning
442(1)
16.2 Decision-Time Planning
443(1)
16.3 Monte-Carlo Tree-Search (Mcts)
444(1)
16.4 Adaptive Multi-Stage Sampling
445(4)
16.5 Summary Of Key Learnings From This
Chapter
449(2)
Chapter 17 Summary and Real-World Considerations
451(8)
17.1 Summary Of Key Learnings From This Book
451(4)
17.2 Rl In The Real-World
455(4)
Appendix A Moment Generating Function and Its Applications
459(4)
A.1 The Moment Generating Function (Mgf)
459(1)
A.2 Mgf For Linear Functions Of Random Variables
460(1)
A.3 Mgf For The Normal Distribution
460(1)
A.4 Minimizing The Mgf
460(3)
A.4.1 Minimizing the MGF When x Follows a Normal Distribution
461(1)
A.4.2 Minimizing the MGF When x Follows a Symmetric Binary Distribution
461(2)
Appendix B Portfolio Theory
463(4)
B.1 Setting and Notation
463(1)
B.2 Portfolio Returns
463(1)
B.3 Derivation Of Efficient Frontier Curve
463(1)
B.4 Global Minimum Variance Portfolio (Gmvp)
464(1)
B.5 Orthogonal Efficient Portfolios
464(1)
B.6 Two-Fund Theorem
465(1)
B.7 An Example Of The Efficient Frontier For 16 Assets
465(1)
B.8 Capm: Linearity Of Covariance Vector W.R.T. Mean Returns
465(1)
B.9 Useful Corollaries Of Capm
466(1)
B.10 Cross-Sectional Variance
466(1)
B.11 Efficient Set With A Risk-Free Asset
466(1)
Appendix C Introduction to and Overview of Stochastic Calculus Basics
467(8)
C.1 Simple Random Walk
467(1)
C.2 Brownian Motion As Scaled Random Walk
468(1)
C.3 Continuous-Time Stochastic Processes
469(1)
C.4 Properties Of Brownian Motion Sample Traces
469(1)
C.5 Ito Integral
470(1)
C.6 Ito's Lemma
471(1)
C.7 A Lognormal Process
471(1)
C.8 A Mean-Reverting Process
472(3)
Appendix D The Hamilton-Jacobi-Bellman (HJB) Equation
475(2)
D.1 Hjb As A Continuous-Time Version Of Bellman Optimality Equation
475(1)
D.2 Hjb With State Transitions As An Ito Process
476(1)
Appendix E Black-Scholes Equation and Its Solution for Call/Put Options
477(4)
E.1 Assumptions
477(1)
E.2 Derivation Of The Black-Scholes Equation
478(1)
E.3 Solution Of The Black-Scholes Equation For Call/Put Options
479(2)
Appendix F Function Approximations as Affine Spaces
481(6)
F.1 Vector Space
481(1)
F.2 Function Space
481(1)
F.3 Linear Map Of Vector Spaces
481(1)
F.4 Affine Space
482(1)
F.5 Affine Map
482(1)
F.6 Function Approximations
483(1)
F.6.1 D[ R] as an Affine Space P
483(1)
F.6.2 Delegator Space R
483(1)
F.7 Stochastic Gradient Descent
484(1)
F.8 Sgd Update For Linear Function Approximations
485(2)
Appendix G Conjugate Priors for Gaussian and Bernoulli Distributions
487(2)
G.1 Conjugate Prior For Gaussian Distribution
487(1)
G.2 Conjugate Prior For Bernoulli Distribution
488(1)
Bibliography 489(4)
Index 493
Ashwin Rao is the Chief Science Officer of Wayfair, an e-commerce company where he and his team develop mathematical models and algorithms for supply-chain and logistics, merchandising, marketing, search, personalization, pricing and customer service. Ashwin is also an Adjunct Professor at Stanford University, focusing his research and teaching in the area of Stochastic Control, particularly Reinforcement Learning algorithms with applications in Finance and Retail. Previously, Ashwin was a Managing Director at Morgan Stanley and a Trading Strategist at Goldman Sachs. Ashwin holds a Bachelors degree in Computer Science and Engineering from IIT-Bombay and a Ph.D in Computer Science from University of Southern California, where he specialized in Algorithms Theory and Abstract Algebra.

Tikhon Jelvis is a programmer who specializes in bringing ideas from programming languages and functional programming to machine learning and data science. He has developed inventory optimization, simulation and demand forecasting systems as a Principal Scientist at Target and is a speaker and open-source contributor in the Haskell community where he serves on the board of directors for Haskell.org.