Preface |
|
xv | |
List of Figures |
|
xix | |
1 Introduction |
|
1 | (20) |
|
1.1 What is 'Knowledge Discovery'? |
|
|
2 | (4) |
|
1.1.1 Main Forms of Knowledge Discovery |
|
|
2 | (2) |
|
|
4 | (2) |
|
1.2 What is an Adversarial Setting? |
|
|
6 | (5) |
|
1.3 Algorithmic Knowledge Discovery |
|
|
11 | (5) |
|
1.3.1 What is Different about Adversarial Knowledge Discovery? |
|
|
14 | (2) |
|
|
16 | (5) |
2 Data |
|
21 | (20) |
|
|
21 | (5) |
|
|
22 | (1) |
|
|
22 | (1) |
|
2.1.3 Data about Connections |
|
|
23 | (1) |
|
|
24 | (1) |
|
|
25 | (1) |
|
|
26 | (1) |
|
2.2.1 Slow Changes in the Underlying Situation |
|
|
26 | (1) |
|
2.2.2 Change is the Important Property |
|
|
26 | (1) |
|
|
27 | (1) |
|
2.3 Fusion of Different Kinds of Data |
|
|
27 | (3) |
|
2.4 How is Data Collected? |
|
|
30 | (3) |
|
2.4.1 Transaction Endpoints |
|
|
30 | (1) |
|
2.4.2 Interaction Endpoints |
|
|
30 | (1) |
|
2.4.3 Observation Endpoints |
|
|
31 | (1) |
|
2.4.4 Human Data Collection |
|
|
31 | (1) |
|
2.4.5 Reasons for Data Collection |
|
|
32 | (1) |
|
|
33 | (2) |
|
2.5.1 The Problem of Noise |
|
|
35 | (1) |
|
|
35 | (6) |
|
2.6.1 Data Interoperability |
|
|
37 | (1) |
|
|
38 | (3) |
3 High-Level Principles |
|
41 | (26) |
|
|
41 | (7) |
|
3.2 Subverting Knowledge Discovery |
|
|
48 | (5) |
|
3.2.1 Subverting the Data-Collection Phase |
|
|
48 | (1) |
|
3.2.2 Subverting the Analysis Phase |
|
|
49 | (1) |
|
3.2.3 Subverting the Decision-and-Action Phase |
|
|
50 | (1) |
|
3.2.4 The Difficulty of Fabricating Data |
|
|
50 | (3) |
|
3.3 Effects of Technology Properties |
|
|
53 | (3) |
|
3.4 Sensemaking and Situational Awareness |
|
|
56 | (4) |
|
|
58 | (2) |
|
3.5 Taking Account of the Adversarial Setting over Time |
|
|
60 | (1) |
|
3.6 Does This Book Help Adversaries? |
|
|
61 | (1) |
|
|
62 | (5) |
4 Looking for Risk - Prediction and Anomaly Detection |
|
67 | (60) |
|
|
67 | (8) |
|
|
67 | (1) |
|
4.1.2 The Problem of Human Variability |
|
|
68 | (1) |
|
4.1.3 The Problem of Computational Difficulty |
|
|
68 | (1) |
|
4.1.4 The Problem of Rarity |
|
|
68 | (2) |
|
4.1.5 The Problem of Justifiable Preemption |
|
|
70 | (1) |
|
4.1.6 The Problem of Hindsight Bias |
|
|
71 | (1) |
|
4.1.7 What are the Real Goals? |
|
|
72 | (3) |
|
4.2 Outline of Prediction Technology |
|
|
75 | (7) |
|
4.2.1 Building Predictors |
|
|
75 | (2) |
|
|
77 | (1) |
|
|
77 | (1) |
|
4.2.4 Reasons for a Prediction |
|
|
78 | (1) |
|
|
78 | (2) |
|
|
80 | (1) |
|
|
80 | (2) |
|
4.2.8 Prediction with an Associated Confidence |
|
|
82 | (1) |
|
4.3 Concealment Opportunities |
|
|
82 | (2) |
|
|
84 | (24) |
|
|
84 | (4) |
|
4.4.2 Ensembles of Predictors |
|
|
88 | (5) |
|
|
93 | (2) |
|
4.4.4 Support Vector Machines |
|
|
95 | (6) |
|
|
101 | (2) |
|
|
103 | (2) |
|
4.4.7 Attribute Selection |
|
|
105 | (2) |
|
4.4.8 Distributed Prediction |
|
|
107 | (1) |
|
4.4.9 Symbiotic Prediction |
|
|
107 | (1) |
|
|
108 | (5) |
|
4.6 Extending the Process |
|
|
113 | (1) |
|
4.7 Special Case: Looking for Matches |
|
|
114 | (1) |
|
4.8 Special Case: Looking for Outliers |
|
|
115 | (4) |
|
4.9 Special Case: Frequency Ranking |
|
|
119 | (4) |
|
|
120 | (1) |
|
4.9.2 Records Seen Before |
|
|
121 | (1) |
|
4.9.3 Records Similar to Those Seen Before |
|
|
122 | (1) |
|
4.9.4 Records with Some Other Frequency |
|
|
123 | (1) |
|
4.10 Special Case: Discrepancy Detection |
|
|
123 | (4) |
5 Looking for Similarity - Clustering |
|
127 | (30) |
|
|
128 | (2) |
|
5.2 Outline of Clustering Technology |
|
|
130 | (4) |
|
5.3 Concealment Opportunities |
|
|
134 | (1) |
|
|
135 | (16) |
|
5.4.1 Distance-Based Clustering |
|
|
136 | (2) |
|
5.4.2 Density-Based Clustering |
|
|
138 | (1) |
|
5.4.3 Distribution-Based Clustering |
|
|
139 | (2) |
|
5.4.4 Decomposition-Based Clustering |
|
|
141 | (3) |
|
5.4.5 Hierarchical Clustering |
|
|
144 | (2) |
|
|
146 | (4) |
|
5.4.7 Clusters and Prediction |
|
|
150 | (1) |
|
5.4.8 Symbiotic Clustering |
|
|
150 | (1) |
|
|
151 | (1) |
|
5.6 Special Case - Looking for Outliers Revisited |
|
|
152 | (5) |
6 Looking Inside Groups - Relationship Discovery |
|
157 | (44) |
|
|
158 | (1) |
|
6.2 Outline of Relationship-Discovery Technology |
|
|
159 | (8) |
|
|
162 | (3) |
|
6.2.2 Selection in Graphs |
|
|
165 | (1) |
|
|
166 | (1) |
|
6.3 Concealment Opportunities |
|
|
167 | (3) |
|
|
170 | (24) |
|
6.4.1 Social Network Analysis |
|
|
170 | (2) |
|
|
172 | (4) |
|
6.4.3 Pattern Matching/Information Retrieval |
|
|
176 | (3) |
|
6.4.4 Single-Node Exploration |
|
|
179 | (3) |
|
6.4.5 Unusual-Region Detection |
|
|
182 | (1) |
|
|
183 | (2) |
|
|
185 | (4) |
|
|
189 | (1) |
|
6.4.9 Anomalous-Substructure Discovery |
|
|
189 | (5) |
|
6.4.10 Graphs and Prediction |
|
|
194 | (1) |
|
|
194 | (7) |
7 Discovery from Public Textual Data |
|
201 | (46) |
|
7.1 Text as it Reveals Internal State |
|
|
203 | (1) |
|
|
204 | (5) |
|
7.2.1 Finding Texts of Interest |
|
|
205 | (1) |
|
|
206 | (2) |
|
|
208 | (1) |
|
7.2.4 Finding Author Properties |
|
|
208 | (1) |
|
7.2.5 Finding Metainformation |
|
|
208 | (1) |
|
7.3 Outline of Textual-Analysis Technology |
|
|
209 | (5) |
|
|
210 | (3) |
|
7.3.2 Exploring Authorship |
|
|
213 | (1) |
|
7.3.3 Exploring Metainformation |
|
|
213 | (1) |
|
7.4 Concealment Opportunities |
|
|
214 | (2) |
|
|
216 | (22) |
|
7.5.1 Collecting Textual Data |
|
|
216 | (2) |
|
7.5.2 Extracting Interesting Documents from a Large Set |
|
|
218 | (2) |
|
7.5.3 Extracting Named Entities |
|
|
220 | (1) |
|
7.5.4 Extracting Concepts |
|
|
221 | (2) |
|
7.5.5 Extracting Relationships |
|
|
223 | (1) |
|
|
224 | (2) |
|
7.5.7 Extracting Narrative and Sensemaking |
|
|
226 | (1) |
|
|
227 | (2) |
|
7.5.9 Extracting Intention and Slant |
|
|
229 | (1) |
|
|
229 | (1) |
|
7.5.11 Authorship from Large Samples |
|
|
230 | (1) |
|
7.5.12 Authorship from Small Samples |
|
|
231 | (4) |
|
7.5.13 Detecting Author Properties |
|
|
235 | (2) |
|
7.5.14 Extracting Metainformation |
|
|
237 | (1) |
|
|
238 | (9) |
|
7.6.1 Playing the Adversaries |
|
|
238 | (2) |
|
7.6.2 Playing the Audience |
|
|
240 | (1) |
|
7.6.3 Fusing Data from Different Contexts |
|
|
241 | (6) |
8 Discovery in Private Communication |
|
247 | (26) |
|
8.1 The Impact of Obfuscation |
|
|
249 | (1) |
|
|
249 | (1) |
|
8.3 Concealment Opportunities |
|
|
249 | (4) |
|
|
250 | (1) |
|
|
251 | (2) |
|
|
253 | (16) |
|
8.4.1 Selection of Interesting Communication |
|
|
253 | (9) |
|
8.4.2 Content Extraction after Substitution |
|
|
262 | (6) |
|
8.4.3 Authorship after Substitution |
|
|
268 | (1) |
|
8.4.4 Metainformation after Substitution |
|
|
268 | (1) |
|
|
269 | (4) |
|
8.5.1 Using a Multifaceted Process for Selection |
|
|
269 | (4) |
9 Discovering Mental and Emotional State |
|
273 | (28) |
|
9.1 Frame Analysis for Intentions |
|
|
274 | (4) |
|
|
275 | (1) |
|
9.1.2 Frame-Analysis Detection Technology |
|
|
276 | (1) |
|
9.1.3 Concealment Opportunities |
|
|
277 | (1) |
|
9.1.4 Tactics and Process |
|
|
277 | (1) |
|
|
278 | (3) |
|
|
278 | (1) |
|
9.2.2 Sentiment-Analysis Technology |
|
|
279 | (2) |
|
9.2.3 Concealment Opportunities |
|
|
281 | (1) |
|
9.3 Mental-State Extraction |
|
|
281 | (11) |
|
|
282 | (1) |
|
9.3.2 Mental-State Extraction Technology |
|
|
283 | (8) |
|
9.3.3 Concealment Opportunities |
|
|
291 | (1) |
|
9.4 Systemic Functional Linguistics |
|
|
292 | (9) |
|
9.4.1 Introduction to Systemic Functional Linguistics |
|
|
293 | (3) |
|
9.4.2 SFL for Intention Detection |
|
|
296 | (1) |
|
9.4.3 SFL for Sentiment Analysis |
|
|
296 | (1) |
|
9.4.4 SFL for Mental-State Extraction |
|
|
297 | (4) |
10 The Bottom Line |
|
301 | (16) |
|
|
301 | (4) |
|
|
305 | (2) |
|
10.3 Applying the Process |
|
|
307 | (4) |
|
|
311 | (6) |
|
10.4.1 Process Improvements |
|
|
311 | (1) |
|
10.4.2 Straightforward Technical Problems |
|
|
312 | (2) |
|
10.4.3 More-Difficult Technical Advances |
|
|
314 | (3) |
Bibliography |
|
317 | |
Index |
|
32 | |