Muutke küpsiste eelistusi
  • Formaat - EPUB+DRM
  • Hind: 58,49 €*
  • * hind on lõplik, st. muud allahindlused enam ei rakendu
  • Lisa ostukorvi
  • Lisa soovinimekirja
  • See e-raamat on mõeldud ainult isiklikuks kasutamiseks. E-raamatuid ei saa tagastada.

DRM piirangud

  • Kopeerimine (copy/paste):

    ei ole lubatud

  • Printimine:

    ei ole lubatud

  • Kasutamine:

    Digitaalõiguste kaitse (DRM)
    Kirjastus on väljastanud selle e-raamatu krüpteeritud kujul, mis tähendab, et selle lugemiseks peate installeerima spetsiaalse tarkvara. Samuti peate looma endale  Adobe ID Rohkem infot siin. E-raamatut saab lugeda 1 kasutaja ning alla laadida kuni 6'de seadmesse (kõik autoriseeritud sama Adobe ID-ga).

    Vajalik tarkvara
    Mobiilsetes seadmetes (telefon või tahvelarvuti) lugemiseks peate installeerima selle tasuta rakenduse: PocketBook Reader (iOS / Android)

    PC või Mac seadmes lugemiseks peate installima Adobe Digital Editionsi (Seeon tasuta rakendus spetsiaalselt e-raamatute lugemiseks. Seda ei tohi segamini ajada Adober Reader'iga, mis tõenäoliselt on juba teie arvutisse installeeritud )

    Seda e-raamatut ei saa lugeda Amazon Kindle's. 

Transformers are becoming a core part of many neural network architectures, employed in a wide range of applications such as NLP, Speech Recognition, Time Series, and Computer Vision. Transformers have gone through many adaptations and alterations, resulting in newer techniques and methods. Transformers for Machine Learning: A Deep Dive is the first comprehensive book on transformers.

Key Features:





A comprehensive reference book for detailed explanations for every algorithm and techniques related to the transformers. 60+ transformer architectures covered in a comprehensive manner. A book for understanding how to apply the transformer techniques in speech, text, time series, and computer vision. Practical tips and tricks for each architecture and how to use it in the real world. Hands-on case studies and code snippets for theory and practical real-world analysis using the tools and libraries, all ready to run in Google Colab.

The theoretical explanations of the state-of-the-art transformer architectures will appeal to postgraduate students and researchers (academic and industry) as it will provide a single entry point with deep discussions of a quickly moving field. The practical hands-on case studies and code will appeal to undergraduate students, practitioners, and professionals as it allows for quick experimentation and lowers the barrier to entry into the field.
Foreword xvii
Preface xix
Authors xxiii
Contributors xxv
Chapter 1 Deep Learning and Transformers: An Introduction 1(10)
1.1 Deep Learning: A Historic Perspective
1(3)
1.2 Transformers And Taxonomy
4(4)
1.2.1 Modified Transformer Architecture
4(4)
1.2.1.1 Transformer block changes
4(1)
1.2.1.2 Transformer sublayer changes
5(3)
1.2.2 Pre-training Methods and Applications
8(1)
1.3 Resources
8(3)
1.3.1 Libraries and Implementations
8(1)
1.3.2 Books
9(1)
1.3.3 Courses, Tutorials, and Lectures
9(1)
1.3.4 Case Studies and Details
10(1)
Chapter 2 Transformers: Basics and Introduction 11(32)
2.1 Encoder-Decoder Architecture
11(1)
2.2 Sequence-To-Sequence
12(2)
2.2.1 Encoder
12(1)
2.2.2 Decoder
13(1)
2.2.3 Training
14(1)
2.2.4 Issues with RNN-Based Encoder-Decoder
14(1)
2.3 Attention Mechanism
14(5)
2.3.1 Background
14(2)
2.3.2 Types of Score-Based Attention
16(2)
2.3.2.1 Dot product (multiplicative)
17(1)
2.3.2.2 Scaled dot product or multiplicative
17(1)
2.3.2.3 Linear, MLP, or Additive
17(1)
2.3.3 Attention-Based Sequence-to-Sequence
18(1)
2.4 Transformer
19(8)
2.4.1 Source and Target Representation
20(2)
2.4.1.1 Word embedding
20(1)
2.4.1.2 Positional encoding
20(2)
2.4.2 Attention Layers
22(4)
2.4.2.1 Self-attention
22(2)
2.4.2.2 Multi-head attention
24(1)
2.4.2.3 Masked multi-head attention
25(1)
2.4.2.4 Encoder-decoder multi-head attention
26(1)
2.4.3 Residuals and Layer Normalization
26(1)
2.4.4 Positionwise Feed-forward Networks
26(1)
2.4.5 Encoder
27(1)
2.4.6 Decoder
27(1)
2.5 Case Study: Machine Translation
27(16)
2.5.1 Goal
27(1)
2.5.2 Data, Tools, and Libraries
27(1)
2.5.3 Experiments, Results, and Analysis
28(15)
2.5.3.1 Exploratory data analysis
28(1)
2.5.3.2 Attention
29(6)
2.5.3.3 Transformer
35(3)
2.5.3.4 Results and analysis
38(1)
2.5.3.5 Explainability
38(5)
Chapter 3 Bidirectional Encoder Representations from Transformers (BERT) 43(28)
3.1 BERT
43(5)
3.1.1 Architecture
43(2)
3.1.2 Pre-Training
45(1)
3.1.3 Fine-Tuning
46(2)
3.2 BERT Variants
48(1)
3.2.1 RoBERTa
48(1)
3.3 Applications
49(2)
3.3.1 TaBERT
49(1)
3.3.2 BERTopic
50(1)
3.4 BERT Insights
51(2)
3.4.1 BERT Sentence Representation
51(1)
3.4.2 BERTology
52(1)
3.5 Case Study: Topic Modeling With Transformers
53(10)
3.5.1 Goal
53(1)
3.5.2 Data, Tools, and Libraries
53(2)
3.5.2.1 Data
54(1)
3.5.2.2 Compute embeddings
54(1)
3.5.3 Experiments, Results, and Analysis
55(8)
3.5.3.1 Building topics
55(1)
3.5.3.2 Topic size distribution
55(1)
3.5.3.3 Visualization of topics
56(1)
3.5.3.4 Content of topics
57(6)
3.6 Case Study: Fine-Tuning BERT
63(8)
3.6.1 Goal
63(1)
3.6.2 Data, Tools, and Libraries
63(1)
3.6.3 Experiments, Results, and Analysis
64(7)
Chapter 4 Multilingual Transformer Architectures 71(38)
4.1 Multilingual Transformer Architectures
72(18)
4.1.1 Basic Multilingual Transformer
72(2)
4.1.2 Single-Encoder Multilingual NLU
74(11)
4.1.2.1 mBERT
74(1)
4.1.2.2 XLM
75(2)
4.1.2.3 XLM-RoBERTa
77(1)
4.1.2.4 ALM
77(1)
4.1.2.5 Unicoder
78(2)
4.1.2.6 INFOXLM
80(1)
4.1.2.7 AMBER
81(1)
4.1.2.8 ERNIE-M
82(2)
4.1.2.9 HITCL
84(1)
4.1.3 Dual-Encoder Multilingual NLU
85(4)
4.1.3.1 LaBSE
85(2)
4.1.3.2 mUSE
87(2)
4.1.4 Multilingual NLG
89(1)
4.2 Multilingual Data
90(3)
4.2.1 Pre-Training Data
90(1)
4.2.2 Multilingual Benchmarks
91(2)
4.2.2.1 Classification
91(1)
4.2.2.2 Structure prediction
92(1)
4.2.2.3 Question answering
92(1)
4.2.2.4 Semantic retrieval
92(1)
4.3 Multilingual Transfer Learning Insights
93(4)
4.3.1 Zero-Shot Cross-Lingual Learning
93(3)
4.3.1.1 Data factors
93(1)
4.3.1.2 Model architecture factors
94(1)
4.3.1.3 Model tasks factors
95(1)
4.3.2 Language-Agnostic Cross-Lingual Representations
96(1)
4.4 Case Study
97(12)
4.4.1 Goal
97(1)
4.4.2 Data, Tools, and Libraries
98(1)
4.4.3 Experiments, Results, and Analysis
98(11)
4.4.3.1 Data preprocessing
99(2)
4.4.3.2 Experiments
101(8)
Chapter 5 Transformer Modifications 109(46)
5.1 Transformer Block Modifications
109(11)
5.1.1 Lightweight Transformers
109(5)
5.1.1.1 Funnel-transformer
109(3)
5.1.1.2 DeLighT
112(2)
5.1.2 Connections between Transformer Blocks
114(1)
5.1.2.1 RealFormer
114(1)
5.1.3 Adaptive Computation Time
115(1)
5.1.3.1 Universal transformers (UT)
115(1)
5.1.4 Recurrence Relations between Transformer Blocks
116(4)
5.1.4.1 Transformer-XL
116(4)
5.1.5 Hierarchical Transformers
120(1)
5.2 Transformers With Modified Multi-Head Self-Attention
120(25)
5.2.1 Structure of Multi-Head Self-Attention
120(4)
5.2.1.1 Multi-head self-attention
122(1)
5.2.1.2 Space and time complexity
123(1)
5.2.2 Reducing Complexity of Self-Attention
124(13)
5.2.2.1 Longformer
124(2)
5.2.2.2 Reformer
126(5)
5.2.2.3 Performer
131(1)
5.2.2.4 Big Bird
132(5)
5.2.3 Improving Multi-Head-Attention
137(3)
5.2.3.1 Talking-heads attention
137(3)
5.2.4 Biasing Attention with Priors
140(1)
5.2.5 Prototype Queries
140(1)
5.2.5.1 Clustered attention
140(1)
5.2.6 Compressed Key-Value Memory
141(2)
5.2.6.1 Luna: Linear Unified Nested Attention
141(2)
5.2.7 Low-Rank Approximations
143(2)
5.2.7.1 Linformer
143(2)
5.3 Modifications For Training Task Efficiency
145(1)
5.3.1 ELECTRA
145(1)
5.3.1.1 Replaced token detection
145(1)
5.3.2 T5
146(1)
5.4 Transformer Submodule Changes
146(2)
5.4.1 Switch Transformer
146(2)
5.5 Case Study: Sentiment Analysis
148(7)
5.5.1 Goal
148(1)
5.5.2 Data, Tools, and Libraries
148(2)
5.5.3 Experiments, Results, and Analysis
150(5)
5.5.3.1 Visualizing attention head weights
150(2)
5.5.3.2 Analysis
152(3)
Chapter 6 Pre-trained and Application-Specific Transformers 155(32)
6.1 Text Processing
155(8)
6.1.1 Domain-Specific Transformers
155(2)
6.1.1.1 BioBERT
155(1)
6.1.1.2 SciBERT
156(1)
6.1.1.3 FinBERT
156(1)
6.1.2 Text-to-Text Transformers
157(1)
6.1.2.1 ByT5
157(1)
6.1.3 Text Generation
158(5)
6.1.3.1 GPT: Generative pre-training
158(2)
6.1.3.2 GPT-2
160(1)
6.1.3.3 GPT-3
161(2)
6.2 Computer Vision
163(1)
6.2.1 Vision Transformer
163(1)
6.3 Automatic Speech Recognition
164(2)
6.3.1 Wav2vec 2.0
165(1)
6.3.2 Speech2Text2
165(1)
6.3.3 HuBERT: Hidden Units BERT
166(1)
6.4 Multimodal And Multitasking Transformer
166(3)
6.4.1 Vision-and-Language BERT (VilBERT)
167(1)
6.4.2 Unified Transformer (UniT)
168(1)
6.5 Video Processing With Timesformer
169(3)
6.5.1 Patch Embeddings
169(1)
6.5.2 Self-Attention
170(2)
6.5.2.1 Spatiotemporal self-attention
171(1)
6.5.2.2 Spatiotemporal attention blocks
171(1)
6.6 Graph Transformers
172(5)
6.6.1 Positional Encodings in a Graph
173(1)
6.6.1.1 Laplacian positional encodings
173(1)
6.6.2 Graph Transformer Input
173(4)
6.6.2.1 Graphs without edge attributes
174(1)
6.6.2.2 Graphs with edge attributes
175(2)
6.7 Reinforcement Learning
177(3)
6.7.1 Decision Transformer
178(2)
6.8 Case Study: Automatic Speech Recognition
180(7)
6.8.1 Goal
180(1)
6.8.2 Data, Tools, and Libraries
180(1)
6.8.3 Experiments, Results, and Analysis
180(10)
6.8.3.1 Preprocessing speech data
180(1)
6.8.3.2 Evaluation
181(6)
Chapter 7 Interpretability and Explainability Techniques for Transformers 187(34)
7.1 Traits Of Explainable Systems
187(2)
7.2 Related Areas That Impact Explainability
189(1)
7.3 Explainable Methods Taxonomy
190(12)
7.3.1 Visualization Methods
190(5)
7.3.1.1 Backpropagation-based
190(4)
7.3.1.2 Perturbation-based
194(1)
7.3.2 Model Distillation
195(3)
7.3.2.1 Local approximation
195(3)
7.3.2.2 Model translation
198(1)
7.3.3 Intrinsic Methods
198(4)
7.3.3.1 Probing mechanism
198(3)
7.3.3.2 Joint training
201(1)
7.4 Attention And Explanation
202(6)
7.4.1 Attention is Not an Explanation
202(3)
7.4.1.1 Attention weights and feature importance
202(2)
7.4.1.2 Counterfactual experiments
204(1)
7.4.2 Attention is Not Not an Explanation
205(3)
7.4.2.1 Is attention necessary for all tasks?
206(1)
7.4.2.2 Searching for adversarial models
207(1)
7.4.2.3 Attention probing
208(1)
7.5 Quantifying Attention Flow
208(2)
7.5.1 Information Flow as DAG
208(1)
7.5.2 Attention Rollout
209(1)
7.5.3 Attention Flow
209(1)
7.6 Case Study: Text Classification With Explainability
210(11)
7.6.1 Goal
210(1)
7.6.2 Data, Tools, and Libraries
211(1)
7.6.3 Experiments, Results, and Analysis
211(10)
7.6.3.1 Exploratory data analysis
211(1)
7.6.3.2 Experiments
211(1)
7.6.3.3 Error analysis and explainability
212(9)
Bibliography 221(34)
Index 255
Uday Kamath has spent more than two decades developing analytics products and combines this experience with learning in statistics, optimization, machine learning, bioinformatics, and evolutionary computing. Uday has contributed to many journals, conferences, and books, is the author of books like XAI: An Introduction to Interpretable XAI, Deep Learning for NLP and Speech Recognition, Mastering Java Machine Learning, and Machine Learning: End-to-End guide for Java developers. He held many senior roles: Chief Analytics Officer for Digital Reasoning, Advisor for Falkonry, and Chief Data Scientist for BAE Systems Applied Intelligence. Uday has many patents and has built commercial products using AI in domains such as compliance, cybersecurity, financial crime, and bioinformatics. Uday currently works as the Chief Analytics Officer for Smarsh. He is responsible for data science, research of analytical products employing deep learning, transformers, explainable AI, and modern techniques in speech and text for the financial domain and healthcare.

Wael Emara has two decades of experience in academia and industry. Wael has a PhD in Computer Engineering and Computer Science with emphasis on machine learning and artificial intelligence. His technical background and research spans signal and image processing, computer vision, medical imaging, social media analytics, machine learning, and natural language processing. Wael has 10 research publications in various machine learning topics and he is active in the technical community in the greater New York area. Wael currently works as a Senior Research Engineer for Digital Reasoning where he is doing research on state-of-the-art artificial intelligence NLP systems.

Kenneth L. Graham has two decades experience solving quantitative problems in multiple domains, including Monte Carlo simulation, NLP, anomaly detection, cybersecurity, and behavioral profiling. For the past nine years, he has focused on building scalable solutions in NLP for government and industry, including entity coreference resolution, text classification, active learning, and temporal normalization. Kenneth currently works at Smarsh as a Principal Research Engineer, researching how to move state-of the-art NLP methods out of research and into production. Kenneth has five patents for his work in natural language processing, seven research publications, and a Ph.D. in Condensed Matter Physics.