Foreword |
|
xvii | |
|
Acknowledgments |
|
xix | |
Introduction |
|
xxi | |
What Is Data Science? |
|
xxii | |
Why Data Science Matters for Security |
|
xxii | |
Applying Data Science to Malware |
|
xxiii | |
Who Should Read This Book? |
|
xxiv | |
About This Book |
|
xxiv | |
How to Use the Sample Code and Data |
|
xxv | |
|
1 Basic Static Malware Analysis |
|
|
1 | (10) |
|
The Microsoft Windows Portable Executable Format |
|
|
2 | (3) |
|
|
3 | (1) |
|
|
3 | (1) |
|
|
4 | (1) |
|
Dissecting the PE Format Using pefile |
|
|
5 | (2) |
|
|
7 | (1) |
|
Examining Malware Strings |
|
|
8 | (2) |
|
Using the strings Program |
|
|
8 | (1) |
|
Analyzing Your strings Dump |
|
|
9 | (1) |
|
|
10 | (1) |
|
2 Beyond Basic Static Analysis: X86 Disassembly |
|
|
11 | (14) |
|
|
12 | (1) |
|
Basics of x86 Assembly Language |
|
|
12 | (8) |
|
|
13 | (2) |
|
|
15 | (1) |
|
Data Movement Instructions |
|
|
15 | (5) |
|
Disassembling ircbot.exe Using pefile and capstone |
|
|
20 | (1) |
|
Factors That Limit Static Analysis |
|
|
21 | (2) |
|
|
21 | (1) |
|
|
22 | (1) |
|
Anti-disassembly Techniques |
|
|
22 | (1) |
|
Dynamically Downloaded Data |
|
|
22 | (1) |
|
|
23 | (2) |
|
3 A Brief Introduction To Dynamic Analysis |
|
|
25 | (10) |
|
Why Use Dynamic Analysis? |
|
|
26 | (1) |
|
Dynamic Analysis for Malware Data Science |
|
|
26 | (1) |
|
Basic Tools for Dynamic Analysis |
|
|
27 | (6) |
|
Typical Malware Behaviors |
|
|
27 | (1) |
|
Loading a File on malwr.com |
|
|
27 | (1) |
|
AnalyzingResultsonmalwr.com |
|
|
28 | (5) |
|
Limitations of Basic Dynamic Analysis |
|
|
33 | (1) |
|
|
34 | (1) |
|
4 Identifying Attack Campaigns Using Malware Networks |
|
|
35 | (24) |
|
|
37 | (1) |
|
|
37 | (2) |
|
Visualizing Malware Networks |
|
|
39 | (1) |
|
|
39 | (1) |
|
Force-Directed Algorithms |
|
|
40 | (1) |
|
Building Networks with NetworkX |
|
|
40 | (1) |
|
|
41 | (2) |
|
|
42 | (1) |
|
|
42 | (1) |
|
Network Visualization with GraphViz |
|
|
43 | (8) |
|
Using Parameters to Adjust Networks |
|
|
44 | (1) |
|
The GraphViz Command Line Tools |
|
|
44 | (4) |
|
Adding Visual Attributes to Nodes and Edges |
|
|
48 | (3) |
|
Building Malware Networks |
|
|
51 | (3) |
|
Building a Shared Image Relationship Network |
|
|
54 | (4) |
|
|
58 | (1) |
|
|
59 | (30) |
|
Preparing Samples for Comparison by Extracting Features |
|
|
62 | (2) |
|
How Bag of Features Models Work |
|
|
62 | (1) |
|
|
63 | (1) |
|
Using the Jaccard Index to Quantify Similarity |
|
|
64 | (2) |
|
Using Similarity Matrices to Evaluate Malware Shared Code Estimation Methods |
|
|
66 | (7) |
|
Instruction Sequence-Based Similarity |
|
|
67 | (3) |
|
|
70 | (1) |
|
Import Address Table-Based Similarity |
|
|
71 | (1) |
|
Dynamic API Call-Based Similarity |
|
|
72 | (1) |
|
Building a Similarity Graph |
|
|
73 | (4) |
|
Scaling Similarity Comparisons |
|
|
77 | (2) |
|
|
77 | (1) |
|
|
78 | (1) |
|
Building a Persistent Malware Similarity Search System |
|
|
79 | (6) |
|
Running the Similarity Search System |
|
|
85 | (2) |
|
|
87 | (2) |
|
6 Understanding Machine Learning-Based Malware Detectors |
|
|
89 | (30) |
|
Steps for Building a Machine Learning-Based Detector |
|
|
90 | (3) |
|
Gathering Training Examples |
|
|
91 | (1) |
|
|
91 | (1) |
|
|
92 | (1) |
|
Training Machine Learning Systems |
|
|
92 | (1) |
|
Testing Machine Learning Systems |
|
|
93 | (1) |
|
Understanding Feature Spaces and Decision Boundaries |
|
|
93 | (5) |
|
What Makes Models Good or Bad: Overfitting and Underfitting |
|
|
98 | (3) |
|
Major Types of Machine Learning Algorithms |
|
|
101 | (16) |
|
|
102 | (3) |
|
|
105 | (4) |
|
|
109 | (6) |
|
|
115 | (2) |
|
|
117 | (2) |
|
7 Evaluating Malware Detection Systems |
|
|
119 | (1) |
|
Four Possible Detection Outcomes |
|
|
120 | (4) |
|
True and False Positive Rates |
|
|
120 | (1) |
|
Relationship Between True and False Positive Rates |
|
|
121 | (2) |
|
|
123 | (1) |
|
Considering Base Rates in Your Evaluation |
|
|
124 | (2) |
|
How Base Rate Affects Precision |
|
|
124 | (1) |
|
Estimating Precision in a Deployment Environment |
|
|
125 | (1) |
|
|
126 | (1) |
|
8 Building Machine Learning Detectors |
|
|
127 | (38) |
|
|
128 | (1) |
|
Building a Toy Decision Tree-Based Detector |
|
|
129 | (5) |
|
Training Your Decision Tree Classifier |
|
|
130 | (1) |
|
Visualizing the Decision Tree |
|
|
131 | (2) |
|
|
133 | (1) |
|
Building Real-World Machine Learning Detectors with sklearn |
|
|
134 | (7) |
|
Real-World Feature Extraction |
|
|
134 | (3) |
|
Why You Can't Use All Possible Features |
|
|
137 | (1) |
|
Using the Hashing Trick to Compress Features |
|
|
138 | (3) |
|
Building an Industrial-Strength Detector |
|
|
141 | (5) |
|
|
141 | (1) |
|
|
142 | (2) |
|
Running the Detector on New Binaries |
|
|
144 | (1) |
|
What We've Implemented So Far |
|
|
144 | (2) |
|
Evaluating Your Detector's Performance |
|
|
146 | (7) |
|
Using ROC Curves to Evaluate Detector Efficacy |
|
|
147 | (1) |
|
|
147 | (1) |
|
Splitting Data into Training and Test Sets |
|
|
148 | (1) |
|
|
149 | (1) |
|
|
150 | (3) |
|
|
153 | (1) |
|
|
154 | (1) |
|
9 Visualizing Malware Trends |
|
|
155 | (20) |
|
Why Visualizing Malware Data Is Important |
|
|
156 | (2) |
|
Understanding Our Malware Dataset |
|
|
158 | (4) |
|
|
158 | (1) |
|
Working with a pandas DataFrame |
|
|
159 | (2) |
|
Filtering Data Using Conditions |
|
|
161 | (1) |
|
Using matplotlib to Visualize Data |
|
|
162 | (6) |
|
Plotting the Relationship Between Malware Size and Detection |
|
|
162 | (2) |
|
Plotting Ransomware Detection Rates |
|
|
164 | (1) |
|
Plotting Ransomware and Worm Detection Rates |
|
|
165 | (3) |
|
Using seaborn to Visualize Data |
|
|
168 | (6) |
|
Plotting the Distribution of Antivirus Detections |
|
|
169 | (3) |
|
|
172 | (2) |
|
|
174 | (1) |
|
|
175 | (24) |
|
|
176 | (1) |
|
|
177 | (12) |
|
|
177 | (3) |
|
|
180 | (1) |
|
Universal Approximation Theorem |
|
|
181 | (1) |
|
Building Your Own Neural Network |
|
|
182 | (4) |
|
Adding Another Neuron to the Network |
|
|
186 | (2) |
|
Automatic Feature Generation |
|
|
188 | (1) |
|
|
189 | (4) |
|
Using Backpropagation to Optimize a Neural Network |
|
|
190 | (2) |
|
|
192 | (1) |
|
|
192 | (1) |
|
|
193 | (4) |
|
Feed-Forward Neural Network |
|
|
193 | (1) |
|
Convolutional Neural Network |
|
|
193 | (1) |
|
Autoencoder Neural Network |
|
|
194 | (1) |
|
Generative Adversarial Network |
|
|
195 | (1) |
|
|
196 | (1) |
|
|
196 | (1) |
|
|
197 | (2) |
|
11 Building A Neural Network Malware Detector With Keras |
|
|
199 | (16) |
|
Defining a Model's Architecture |
|
|
200 | (2) |
|
|
202 | (1) |
|
|
203 | (6) |
|
|
203 | (1) |
|
Creating a Data Generator |
|
|
204 | (3) |
|
Incorporating Validation Data |
|
|
207 | (2) |
|
Saving and Loading the Model |
|
|
209 | (1) |
|
|
209 | (2) |
|
Enhancing the Model Training Process with Callbacks |
|
|
211 | (3) |
|
Using a Built-in Callback |
|
|
212 | (1) |
|
|
213 | (1) |
|
|
214 | (1) |
|
12 Becoming A Data Scientist |
|
|
215 | (6) |
|
Paths to Becoming a Security Data Scientist |
|
|
216 | (1) |
|
A Day in the Life of a Security Data Scientist |
|
|
216 | (2) |
|
Traits of an Effective Security Data Scientist |
|
|
218 | (1) |
|
|
218 | (1) |
|
|
218 | (1) |
|
|
219 | (1) |
|
|
219 | (1) |
|
|
219 | (2) |
|
Appendix An Overview Of Datasets And Tools |
|
|
221 | (12) |
|
|
222 | (3) |
|
Chapter 1 Basic Static Malware Analysis |
|
|
222 | (1) |
|
Chapter 2 Beyond Basic Static Analysis: x86 Disassembly |
|
|
222 | (1) |
|
Chapter 3 A Brief Introduction to Dynamic Analysis |
|
|
222 | (1) |
|
Chapter 4 Identifying Attack Campaigns Using Malware Networks |
|
|
222 | (1) |
|
Chapter 5 Shared Code Analysis |
|
|
223 | (1) |
|
Chapter 6 Understanding Machine Learning-Based Malware Detectors and Chapter 7: Evaluating Malware Detection Systems |
|
|
223 | (1) |
|
Chapter 8 Building Machine Learning Detectors |
|
|
224 | (1) |
|
Chapter 9 Visualizing Malware Trends |
|
|
224 | (1) |
|
Chapter 10 Deep Learning Basics |
|
|
224 | (1) |
|
Chapter 11 Building a Neural Network Malware Detector with Keras |
|
|
224 | (1) |
|
Chapter 12 Becoming a Data Scientist |
|
|
224 | (1) |
|
Tool Implementation Guide |
|
|
225 | (8) |
|
Shared Hostname Network Visualization |
|
|
225 | (1) |
|
Shared Image Network Visualization |
|
|
226 | (1) |
|
Malware Similarity Visualization |
|
|
227 | (2) |
|
Malware Similarity Search System |
|
|
229 | (1) |
|
Machine Learning Malware Detection System |
|
|
230 | (3) |
Index |
|
233 | |