Preface |
|
xi | |
Reading Guide |
|
xiii | |
|
|
1 | (64) |
|
1 The Translation Problem |
|
|
3 | (16) |
|
|
4 | (1) |
|
|
5 | (3) |
|
|
8 | (3) |
|
|
11 | (4) |
|
|
15 | (4) |
|
2 Uses of Machine Translation |
|
|
19 | (10) |
|
|
19 | (1) |
|
2.2 Aiding Human Translators |
|
|
20 | (3) |
|
|
23 | (3) |
|
2.4 Natural Language Processing Pipelines |
|
|
26 | (1) |
|
2.5 Multimodal Machine Translation |
|
|
27 | (2) |
|
|
29 | (12) |
|
|
30 | (3) |
|
|
33 | (8) |
|
|
41 | (24) |
|
4.1 Task-Based Evaluation |
|
|
41 | (4) |
|
|
45 | (7) |
|
|
52 | (7) |
|
|
59 | (6) |
|
|
65 | (104) |
|
|
67 | (22) |
|
|
67 | (1) |
|
|
68 | (1) |
|
|
69 | (1) |
|
|
70 | (2) |
|
5.5 Back-Propagation Training |
|
|
72 | (7) |
|
5.6 Exploiting Parallel Processing |
|
|
79 | (1) |
|
5.7 Hands On: Neural Networks in Python |
|
|
80 | (9) |
|
|
89 | (14) |
|
6.1 Neural Networks as Computation Graphs |
|
|
89 | (2) |
|
6.2 Gradient Computations |
|
|
91 | (4) |
|
6.3 Hands On: Deep Learning Frameworks |
|
|
95 | (8) |
|
|
103 | (22) |
|
7.1 Feed-Forward Neural Language Models |
|
|
103 | (4) |
|
|
107 | (2) |
|
7.3 Noise Contrastive Estimation |
|
|
109 | (1) |
|
7.4 Recurrent Neural Language Models |
|
|
110 | (2) |
|
7.5 Long Short-Term Memory Models |
|
|
112 | (3) |
|
7.6 Gated Recurrent Units |
|
|
115 | (1) |
|
|
116 | (2) |
|
7.8 Hands On: Neural Language Models in PyTorch |
|
|
118 | (5) |
|
|
123 | (2) |
|
8 Neural Translation Models |
|
|
125 | (18) |
|
8.1 Encoder-Decoder Approach |
|
|
125 | (1) |
|
8.2 Adding an Alignment Model |
|
|
126 | (4) |
|
|
130 | (3) |
|
|
133 | (3) |
|
8.5 Hands On: Neural Translation Models in PyTorch |
|
|
136 | (5) |
|
|
141 | (2) |
|
|
143 | (26) |
|
|
143 | (4) |
|
|
147 | (1) |
|
|
148 | (6) |
|
|
154 | (1) |
|
|
155 | (3) |
|
9.6 Hands On: Decoding in Python |
|
|
158 | (5) |
|
|
163 | (6) |
|
|
169 | (174) |
|
10 Machine Learning Tricks |
|
|
171 | (22) |
|
10.1 Failures in Machine Learning |
|
|
171 | (3) |
|
|
174 | (2) |
|
10.3 Adjusting the Learning Rate |
|
|
176 | (3) |
|
10.4 Avoiding Local Optima |
|
|
179 | (3) |
|
10.5 Addressing Vanishing and Exploding Gradients |
|
|
182 | (4) |
|
10.6 Sentence-Level Optimization |
|
|
186 | (3) |
|
|
189 | (4) |
|
11 Alternate Architectures |
|
|
193 | (20) |
|
11.1 Components of Neural Networks |
|
|
193 | (6) |
|
|
199 | (4) |
|
11.3 Convolutional Machine Translation |
|
|
203 | (2) |
|
11.4 Convolutional Neural Networks with Attention |
|
|
205 | (2) |
|
11.5 Self-Attention: Transformer |
|
|
207 | (4) |
|
|
211 | (2) |
|
|
213 | (26) |
|
|
214 | (6) |
|
12.2 Multilingual Word Embeddings |
|
|
220 | (4) |
|
|
224 | (5) |
|
12.4 Character-Based Models |
|
|
229 | (4) |
|
|
233 | (6) |
|
|
239 | (24) |
|
|
239 | (6) |
|
|
245 | (4) |
|
|
249 | (4) |
|
|
253 | (4) |
|
|
257 | (6) |
|
14 Beyond Parallel Corpora |
|
|
263 | (18) |
|
14.1 Using Monolingual Data |
|
|
264 | (5) |
|
14.2 Multiple Language Pairs |
|
|
269 | (3) |
|
14.3 Training on Related Tasks |
|
|
272 | (2) |
|
|
274 | (7) |
|
|
281 | (12) |
|
15.1 Guided Alignment Training |
|
|
282 | (2) |
|
|
284 | (3) |
|
15.3 Adding Linguistic Annotation |
|
|
287 | (3) |
|
|
290 | (3) |
|
|
293 | (18) |
|
|
294 | (1) |
|
16.2 Amount of Training Data |
|
|
295 | (1) |
|
|
296 | (3) |
|
|
299 | (5) |
|
|
304 | (2) |
|
|
306 | (1) |
|
|
307 | (4) |
|
17 Analysis and Visualization |
|
|
311 | (18) |
|
|
311 | (8) |
|
|
319 | (9) |
|
17.3 Probing Representations |
|
|
328 | (1) |
|
1 7.4 Identifying Neurons |
|
|
329 | (14) |
|
17.5 Tracing Decisions Back to Inputs |
|
|
332 | (5) |
|
|
337 | (6) |
Bibliography |
|
343 | (32) |
Author Index |
|
375 | (10) |
Index |
|
385 | |