Klienditugi: 7440010 (E-R 10-18)

Abi | Registreeri | Logi sisse

Transformers for Machine Learning: A Deep Dive [Kõva köide]

4.00/5 (13 hinnangut Goodreads-ist)

Uday Kamath, Wael Emara, Kenneth Graham

Formaat: Hardback, 257 pages, kõrgus x laius: 234x156 mm, kaal: 521 g, 11 Tables, black and white; 13 Line drawings, color; 75 Line drawings, black and white; 13 Illustrations, color; 75 Illustrations, black and white
Sari: Chapman & Hall/CRC Machine Learning & Pattern Recognition
Ilmumisaeg: 25-May-2022
Kirjastus: Chapman & Hall/CRC
ISBN-10: 0367771659
ISBN-13: 9780367771652

Teised raamatud teemal:

Neural networks & fuzzy systems
Automatic control engineering
Computational linguistics - (Hetkel poes: 1 nimetust)
Algorithms & data structures
Information technology: general issues - (Hetkel poes: 1 nimetust)

Kõva köide
Hind: 144,55 €
Raamatu kohalejõudmiseks kirjastusest kulub orienteeruvalt 2-4 nädalat
Kogus:
- - 1
  - 2
  - 3
  - 4
  - 5
  - 6
  - 7
  - 8
  - 9
  - 10
Lisa ostukorvi
Tasuta tarne
Tellimisaeg 2-4 nädalat
Lisa soovinimekirja

Formaat: Hardback, 257 pages, kõrgus x laius: 234x156 mm, kaal: 521 g, 11 Tables, black and white; 13 Line drawings, color; 75 Line drawings, black and white; 13 Illustrations, color; 75 Illustrations, black and white
Sari: Chapman & Hall/CRC Machine Learning & Pattern Recognition
Ilmumisaeg: 25-May-2022
Kirjastus: Chapman & Hall/CRC
ISBN-10: 0367771659
ISBN-13: 9780367771652

Teised raamatud teemal:

Neural networks & fuzzy systems
Automatic control engineering
Computational linguistics - (Hetkel poes: 1 nimetust)
Algorithms & data structures
Information technology: general issues - (Hetkel poes: 1 nimetust)

Püsilink: https://www.kriso.ee/db/9780367771652.html

Märksõnad:

Transformers are becoming a core part of many neural network architectures, employed in a wide range of applications such as NLP, Speech Recognition, Time Series, and Computer Vision. Transformers have gone through many adaptations and alterations, resulting in newer techniques and methods. Transformers for Machine Learning: A Deep Dive is the first comprehensive book on transformers.

Key Features:

A comprehensive reference book for detailed explanations for every algorithm and techniques related to the transformers. 60+ transformer architectures covered in a comprehensive manner. A book for understanding how to apply the transformer techniques in speech, text, time series, and computer vision. Practical tips and tricks for each architecture and how to use it in the real world. Hands-on case studies and code snippets for theory and practical real-world analysis using the tools and libraries, all ready to run in Google Colab.

The theoretical explanations of the state-of-the-art transformer architectures will appeal to postgraduate students and researchers (academic and industry) as it will provide a single entry point with deep discussions of a quickly moving field. The practical hands-on case studies and code will appeal to undergraduate students, practitioners, and professionals as it allows for quick experimentation and lowers the barrier to entry into the field.

Foreword

xvii

Preface

xix

Authors

xxiii

Contributors

xxv

Chapter 1 Deep Learning and Transformers: An Introduction

(10)

1.1 Deep Learning: A Historic Perspective

(3)

1.2 Transformers And Taxonomy

(4)

1.2.1 Modified Transformer Architecture

(4)

1.2.1.1 Transformer block changes

(1)

1.2.1.2 Transformer sublayer changes

(3)

1.2.2 Pre-training Methods and Applications

(1)

1.3 Resources

(3)

1.3.1 Libraries and Implementations

(1)

1.3.2 Books

(1)

1.3.3 Courses, Tutorials, and Lectures

(1)

1.3.4 Case Studies and Details

(1)

Chapter 2 Transformers: Basics and Introduction

(32)

2.1 Encoder-Decoder Architecture

(1)

2.2 Sequence-To-Sequence

(2)

2.2.1 Encoder

(1)

2.2.2 Decoder

(1)

2.2.3 Training

(1)

2.2.4 Issues with RNN-Based Encoder-Decoder

(1)

2.3 Attention Mechanism

(5)

2.3.1 Background

(2)

2.3.2 Types of Score-Based Attention

(2)

2.3.2.1 Dot product (multiplicative)

(1)

2.3.2.2 Scaled dot product or multiplicative

(1)

2.3.2.3 Linear, MLP, or Additive

(1)

2.3.3 Attention-Based Sequence-to-Sequence

(1)

2.4 Transformer

(8)

2.4.1 Source and Target Representation

(2)

2.4.1.1 Word embedding

(1)

2.4.1.2 Positional encoding

(2)

2.4.2 Attention Layers

(4)

2.4.2.1 Self-attention

(2)

2.4.2.2 Multi-head attention

(1)

2.4.2.3 Masked multi-head attention

(1)

2.4.2.4 Encoder-decoder multi-head attention

(1)

2.4.3 Residuals and Layer Normalization

(1)

2.4.4 Positionwise Feed-forward Networks

(1)

2.4.5 Encoder

(1)

2.4.6 Decoder

(1)

2.5 Case Study: Machine Translation

(16)

2.5.1 Goal

(1)

2.5.2 Data, Tools, and Libraries

(1)

2.5.3 Experiments, Results, and Analysis

(15)

2.5.3.1 Exploratory data analysis

(1)

2.5.3.2 Attention

(6)

2.5.3.3 Transformer

(3)

2.5.3.4 Results and analysis

(1)

2.5.3.5 Explainability

(5)

Chapter 3 Bidirectional Encoder Representations from Transformers (BERT)

(28)

3.1 BERT

(5)

3.1.1 Architecture

(2)

3.1.2 Pre-Training

(1)

3.1.3 Fine-Tuning

(2)

3.2 BERT Variants

(1)

3.2.1 RoBERTa

(1)

3.3 Applications

(2)

3.3.1 TaBERT

(1)

3.3.2 BERTopic

(1)

3.4 BERT Insights

(2)

3.4.1 BERT Sentence Representation

(1)

3.4.2 BERTology

(1)

3.5 Case Study: Topic Modeling With Transformers

(10)

3.5.1 Goal

(1)

3.5.2 Data, Tools, and Libraries

(2)

3.5.2.1 Data

(1)

3.5.2.2 Compute embeddings

(1)

3.5.3 Experiments, Results, and Analysis

(8)

3.5.3.1 Building topics

(1)

3.5.3.2 Topic size distribution

(1)

3.5.3.3 Visualization of topics

(1)

3.5.3.4 Content of topics

(6)

3.6 Case Study: Fine-Tuning BERT

(8)

3.6.1 Goal

(1)

3.6.2 Data, Tools, and Libraries

(1)

3.6.3 Experiments, Results, and Analysis

(7)

Chapter 4 Multilingual Transformer Architectures

(38)

4.1 Multilingual Transformer Architectures

(18)

4.1.1 Basic Multilingual Transformer

(2)

4.1.2 Single-Encoder Multilingual NLU

(11)

4.1.2.1 mBERT

(1)

4.1.2.2 XLM

(2)

4.1.2.3 XLM-RoBERTa

(1)

4.1.2.4 ALM

(1)

4.1.2.5 Unicoder

(2)

4.1.2.6 INFOXLM

(1)

4.1.2.7 AMBER

(1)

4.1.2.8 ERNIE-M

(2)

4.1.2.9 HITCL

(1)

4.1.3 Dual-Encoder Multilingual NLU

(4)

4.1.3.1 LaBSE

(2)

4.1.3.2 mUSE

(2)

4.1.4 Multilingual NLG

(1)

4.2 Multilingual Data

(3)

4.2.1 Pre-Training Data

(1)

4.2.2 Multilingual Benchmarks

(2)

4.2.2.1 Classification

(1)

4.2.2.2 Structure prediction

(1)

4.2.2.3 Question answering

(1)

4.2.2.4 Semantic retrieval

(1)

4.3 Multilingual Transfer Learning Insights

(4)

4.3.1 Zero-Shot Cross-Lingual Learning

(3)

4.3.1.1 Data factors

(1)

4.3.1.2 Model architecture factors

(1)

4.3.1.3 Model tasks factors

(1)

4.3.2 Language-Agnostic Cross-Lingual Representations

(1)

4.4 Case Study

(12)

4.4.1 Goal

(1)

4.4.2 Data, Tools, and Libraries

(1)

4.4.3 Experiments, Results, and Analysis

(11)

4.4.3.1 Data preprocessing

(2)

4.4.3.2 Experiments

101

(8)

Chapter 5 Transformer Modifications

109

(46)

5.1 Transformer Block Modifications

109

(11)

5.1.1 Lightweight Transformers

109

(5)

5.1.1.1 Funnel-transformer

109

(3)

5.1.1.2 DeLighT

112

(2)

5.1.2 Connections between Transformer Blocks

114

(1)

5.1.2.1 RealFormer

114

(1)

5.1.3 Adaptive Computation Time

115

(1)

5.1.3.1 Universal transformers (UT)

115

(1)

5.1.4 Recurrence Relations between Transformer Blocks

116

(4)

5.1.4.1 Transformer-XL

116

(4)

5.1.5 Hierarchical Transformers

120

(1)

5.2 Transformers With Modified Multi-Head Self-Attention

120

(25)

5.2.1 Structure of Multi-Head Self-Attention

120

(4)

5.2.1.1 Multi-head self-attention

122

(1)

5.2.1.2 Space and time complexity

123

(1)

5.2.2 Reducing Complexity of Self-Attention

124

(13)

5.2.2.1 Longformer

124

(2)

5.2.2.2 Reformer

126

(5)

5.2.2.3 Performer

131

(1)

5.2.2.4 Big Bird

132

(5)

5.2.3 Improving Multi-Head-Attention

137

(3)

5.2.3.1 Talking-heads attention

137

(3)

5.2.4 Biasing Attention with Priors

140

(1)

5.2.5 Prototype Queries

140

(1)

5.2.5.1 Clustered attention

140

(1)

5.2.6 Compressed Key-Value Memory

141

(2)

5.2.6.1 Luna: Linear Unified Nested Attention

141

(2)

5.2.7 Low-Rank Approximations

143

(2)

5.2.7.1 Linformer

143

(2)

5.3 Modifications For Training Task Efficiency

145

(1)

5.3.1 ELECTRA

145

(1)

5.3.1.1 Replaced token detection

145

(1)

5.3.2 T5

146

(1)

5.4 Transformer Submodule Changes

146

(2)

5.4.1 Switch Transformer

146

(2)

5.5 Case Study: Sentiment Analysis

148

(7)

5.5.1 Goal

148

(1)

5.5.2 Data, Tools, and Libraries

148

(2)

5.5.3 Experiments, Results, and Analysis

150

(5)

5.5.3.1 Visualizing attention head weights

150

(2)

5.5.3.2 Analysis

152

(3)

Chapter 6 Pre-trained and Application-Specific Transformers

155

(32)

6.1 Text Processing

155

(8)

6.1.1 Domain-Specific Transformers

155

(2)

6.1.1.1 BioBERT

155

(1)

6.1.1.2 SciBERT

156

(1)

6.1.1.3 FinBERT

156

(1)

6.1.2 Text-to-Text Transformers

157

(1)

6.1.2.1 ByT5

157

(1)

6.1.3 Text Generation

158

(5)

6.1.3.1 GPT: Generative pre-training

158

(2)

6.1.3.2 GPT-2

160

(1)

6.1.3.3 GPT-3

161

(2)

6.2 Computer Vision

163

(1)

6.2.1 Vision Transformer

163

(1)

6.3 Automatic Speech Recognition

164

(2)

6.3.1 Wav2vec 2.0

165

(1)

6.3.2 Speech2Text2

165

(1)

6.3.3 HuBERT: Hidden Units BERT

166

(1)

6.4 Multimodal And Multitasking Transformer

166

(3)

6.4.1 Vision-and-Language BERT (VilBERT)

167

(1)

6.4.2 Unified Transformer (UniT)

168

(1)

6.5 Video Processing With Timesformer

169

(3)

6.5.1 Patch Embeddings

169

(1)

6.5.2 Self-Attention

170

(2)

6.5.2.1 Spatiotemporal self-attention

171

(1)

6.5.2.2 Spatiotemporal attention blocks

171

(1)

6.6 Graph Transformers

172

(5)

6.6.1 Positional Encodings in a Graph

173

(1)

6.6.1.1 Laplacian positional encodings

173

(1)

6.6.2 Graph Transformer Input

173

(4)

6.6.2.1 Graphs without edge attributes

174

(1)

6.6.2.2 Graphs with edge attributes

175

(2)

6.7 Reinforcement Learning

177

(3)

6.7.1 Decision Transformer

178

(2)

6.8 Case Study: Automatic Speech Recognition

180

(7)

6.8.1 Goal

180

(1)

6.8.2 Data, Tools, and Libraries

180

(1)

6.8.3 Experiments, Results, and Analysis

180

(10)

6.8.3.1 Preprocessing speech data

180

(1)

6.8.3.2 Evaluation

181

(6)

Chapter 7 Interpretability and Explainability Techniques for Transformers

187

(34)

7.1 Traits Of Explainable Systems

187

(2)

7.2 Related Areas That Impact Explainability

189

(1)

7.3 Explainable Methods Taxonomy

190

(12)

7.3.1 Visualization Methods

190

(5)

7.3.1.1 Backpropagation-based

190

(4)

7.3.1.2 Perturbation-based

194

(1)

7.3.2 Model Distillation

195

(3)

7.3.2.1 Local approximation

195

(3)

7.3.2.2 Model translation

198

(1)

7.3.3 Intrinsic Methods

198

(4)

7.3.3.1 Probing mechanism

198

(3)

7.3.3.2 Joint training

201

(1)

7.4 Attention And Explanation

202

(6)

7.4.1 Attention is Not an Explanation

202

(3)

7.4.1.1 Attention weights and feature importance

202

(2)

7.4.1.2 Counterfactual experiments

204

(1)

7.4.2 Attention is Not Not an Explanation

205

(3)

7.4.2.1 Is attention necessary for all tasks?

206

(1)

7.4.2.2 Searching for adversarial models

207

(1)

7.4.2.3 Attention probing

208

(1)

7.5 Quantifying Attention Flow

208

(2)

7.5.1 Information Flow as DAG

208

(1)

7.5.2 Attention Rollout

209

(1)

7.5.3 Attention Flow

209

(1)

7.6 Case Study: Text Classification With Explainability

210

(11)

7.6.1 Goal

210

(1)

7.6.2 Data, Tools, and Libraries

211

(1)

7.6.3 Experiments, Results, and Analysis

211

(10)

7.6.3.1 Exploratory data analysis

211

(1)

7.6.3.2 Experiments

211

(1)

7.6.3.3 Error analysis and explainability

212

(9)

Bibliography

221

(34)

Index

255

Uday Kamath has spent more than two decades developing analytics products and combines this experience with learning in statistics, optimization, machine learning, bioinformatics, and evolutionary computing. Uday has contributed to many journals, conferences, and books, is the author of books like XAI: An Introduction to Interpretable XAI, Deep Learning for NLP and Speech Recognition, Mastering Java Machine Learning, and Machine Learning: End-to-End guide for Java developers. He held many senior roles: Chief Analytics Officer for Digital Reasoning, Advisor for Falkonry, and Chief Data Scientist for BAE Systems Applied Intelligence. Uday has many patents and has built commercial products using AI in domains such as compliance, cybersecurity, financial crime, and bioinformatics. Uday currently works as the Chief Analytics Officer for Smarsh. He is responsible for data science, research of analytical products employing deep learning, transformers, explainable AI, and modern techniques in speech and text for the financial domain and healthcare.

Wael Emara has two decades of experience in academia and industry. Wael has a PhD in Computer Engineering and Computer Science with emphasis on machine learning and artificial intelligence. His technical background and research spans signal and image processing, computer vision, medical imaging, social media analytics, machine learning, and natural language processing. Wael has 10 research publications in various machine learning topics and he is active in the technical community in the greater New York area. Wael currently works as a Senior Research Engineer for Digital Reasoning where he is doing research on state-of-the-art artificial intelligence NLP systems.

Kenneth L. Graham has two decades experience solving quantitative problems in multiple domains, including Monte Carlo simulation, NLP, anomaly detection, cybersecurity, and behavioral profiling. For the past nine years, he has focused on building scalable solutions in NLP for government and industry, including entity coreference resolution, text classification, active learning, and temporal normalization. Kenneth currently works at Smarsh as a Principal Research Engineer, researching how to move state-of the-art NLP methods out of research and into production. Kenneth has five patents for his work in natural language processing, seven research publications, and a Ph.D. in Condensed Matter Physics.

Transformers for Machine Learning: A Deep Dive [Kõva köide]

Konto & seaded

Otsing

Otsingu andmebaas

Filtreeri tulemusi

Teemad Ingliskeelsed raamatud

Vali ostukorv