Preface |
|
xii | |
Acknowledgments |
|
xiv | |
About this book |
|
xv | |
About the authors |
|
xviii | |
About the cover illustration |
|
xx | |
|
Part 1 Basics of privacy-preserving machine learning with differential privacy |
|
|
1 | (92) |
|
1 Derations in machine learning |
|
|
3 | (22) |
|
1.1 Privacy complications in the AI era |
|
|
4 | (1) |
|
1.2 The threat of learning beyond the intended purpose |
|
|
5 | (3) |
|
Use of private data on the fly |
|
|
6 | (1) |
|
How data is processed inside ML algorithms |
|
|
6 | (1) |
|
Why privacy protection in ML is important |
|
|
7 | (1) |
|
Regulatory requirements and the utility vs. privacy tradeoff |
|
|
7 | (1) |
|
1.3 Threats and attacks for ML systems |
|
|
8 | (8) |
|
The problem of private data in the clear |
|
|
9 | (1) |
|
|
9 | (3) |
|
|
12 | (1) |
|
Membership inference attacks |
|
|
13 | (2) |
|
De-anonymization or re-identification attacks |
|
|
15 | (1) |
|
Challenges of privacy protection in big data analytics |
|
|
15 | (1) |
|
1.4 Securing privacy while learning from data: Privacy-preserving machine learning |
|
|
16 | (6) |
|
Use of differential privacy |
|
|
17 | (1) |
|
Local differential privacy |
|
|
18 | (1) |
|
Privacy-preserving synthetic data generation |
|
|
18 | (1) |
|
Privacy-preserving data mining techniques |
|
|
19 | (2) |
|
|
21 | (1) |
|
1.5 How is this book structured? |
|
|
22 | (3) |
|
2 Differential privacy for machine learning |
|
|
25 | (31) |
|
2.1 What is differential privacy? |
|
|
26 | (9) |
|
The concept of differential privacy |
|
|
27 | (3) |
|
How differential privacy works |
|
|
30 | (5) |
|
2.2 Mechanisms of differential privacy |
|
|
35 | (13) |
|
Binary mechanism (randomized response) |
|
|
35 | (3) |
|
|
38 | (5) |
|
|
43 | (5) |
|
2.3 Properties of differential privacy |
|
|
48 | (8) |
|
Postprocessing property of differential privacy |
|
|
48 | (2) |
|
Group privacy property of differential privacy |
|
|
50 | (1) |
|
Composition properties of differential privacy |
|
|
51 | (5) |
|
3 Advanced concepts of differential privacy for machine learning |
|
|
56 | (37) |
|
3.1 Applying differential privacy in machine learning |
|
|
57 | (5) |
|
|
58 | (1) |
|
|
59 | (1) |
|
|
60 | (1) |
|
|
60 | (2) |
|
3.2 Differentially private supervised learning algorithms |
|
|
62 | (12) |
|
Differentially private naive Bayes classification |
|
|
62 | (6) |
|
Differentially private logistic regression |
|
|
68 | (4) |
|
Differentially private linear regression |
|
|
72 | (2) |
|
3.3 Differentially private unsupervised learning algorithms |
|
|
74 | (3) |
|
Differentially private k-means clustering |
|
|
74 | (3) |
|
3.4 Case study: Differentially private principal component analysis |
|
|
77 | (16) |
|
The privacy of PCA over horizontally partitioned data |
|
|
78 | (2) |
|
Designing differentially private PCA over horizontally partitioned data |
|
|
80 | (5) |
|
Experimentally evaluating the performance of the protocol |
|
|
85 | (8) |
|
Part 2 Local differential privacy and synthetic data generation |
|
|
93 | (84) |
|
4 Local differential privacy for machine learning |
|
|
95 | (28) |
|
4.1 What is local differential privacy? |
|
|
96 | (8) |
|
The concept of local differential privacy |
|
|
97 | (4) |
|
Randomized response for local differential privacy |
|
|
101 | (3) |
|
4.2 The mechanisms of local differential privacy |
|
|
104 | (19) |
|
|
104 | (6) |
|
|
110 | (7) |
|
|
117 | (6) |
|
5 Advanced LDP mechanisms for machine learning |
|
|
123 | (23) |
|
5.1 A quick recap of local differential privacy |
|
|
124 | (1) |
|
5.2 Advanced LDP mechanisms |
|
|
125 | (6) |
|
The Laplace mechanism for LDP |
|
|
126 | (1) |
|
Duchi's mechanism for LDP |
|
|
127 | (2) |
|
The Piecewise mechanism for LDP |
|
|
129 | (2) |
|
5.3 A case study implementing LDP naive Bayes classification |
|
|
131 | (15) |
|
Using naive Bayes with ML classification |
|
|
131 | (1) |
|
Using LDP naive Bayes with discrete features |
|
|
132 | (5) |
|
Using LDP naive Bayes with continuous features |
|
|
137 | (3) |
|
Evaluating the performance of different LDP protocols |
|
|
140 | (6) |
|
6 Privacy-preserving synthetic data generation |
|
|
146 | (31) |
|
6.1 Overview of synthetic data generation |
|
|
148 | (3) |
|
What is synthetic data? Why is it important? |
|
|
148 | (1) |
|
Application aspects of using synthetic data for privacy preservation |
|
|
149 | (1) |
|
Generating synthetic data |
|
|
150 | (1) |
|
6.2 Assuring privacy via data anonymization |
|
|
151 | (4) |
|
Private information sharing vs. privacy concerns |
|
|
151 | (1) |
|
Using k-anonymity against re-identification attacks |
|
|
152 | (2) |
|
Anonymization beyond k-anonymity |
|
|
154 | (1) |
|
6.3 DP for privacy-preserving synthetic data generation |
|
|
155 | (13) |
|
DP synthetic histogram representation generation |
|
|
156 | (4) |
|
DP synthetic tabular data generation |
|
|
160 | (2) |
|
DP synthetic multi-marginal data generation |
|
|
162 | (6) |
|
6.4 Case study on private synthetic data release via feature4evel micro-aggregation |
|
|
168 | (9) |
|
Using hierarchical clustering and micro-aggregation |
|
|
168 | (1) |
|
Generating synthetic data |
|
|
169 | (2) |
|
Evaluating the performance of the generated synthetic data |
|
|
171 | (6) |
|
Part 3 Building privacy-assured machine learning applications |
|
|
177 | (114) |
|
7 Privacy-preserving data mining techniques |
|
|
179 | (23) |
|
7.1 The importance of privacy preservation in data mining and management |
|
|
180 | (3) |
|
7.2 Privacy protection in data processing and mining |
|
|
183 | (2) |
|
What is data mining and how is it used? |
|
|
183 | (1) |
|
Consequences of privacy regulatory requirements |
|
|
184 | (1) |
|
7.3 Protecting privacy by modifying the input |
|
|
185 | (1) |
|
Applications and limitations |
|
|
186 | (1) |
|
7.4 Protecting privacy when publishing data |
|
|
186 | (16) |
|
Implementing data sanitization operations in Python |
|
|
189 | (4) |
|
|
193 | (5) |
|
Implementing k-anonymity in Python |
|
|
198 | (4) |
|
8 Privacy-preserving data management and operations |
|
|
202 | (31) |
|
8.1 A quick recap of privacy protection in data processing and mining |
|
|
203 | (1) |
|
8.2 Privacy protection beyond k-anonymity |
|
|
204 | (12) |
|
|
205 | (3) |
|
|
208 | (3) |
|
Implementing privacy models with Python |
|
|
211 | (5) |
|
8.3 Protecting privacy by modifying the data mining output |
|
|
216 | (4) |
|
|
217 | (1) |
|
Reducing the accuracy of data mining operations |
|
|
218 | (1) |
|
Inference control in statistical databases |
|
|
219 | (1) |
|
8.4 Privacy protection in data management systems |
|
|
220 | (13) |
|
Database security and privacy: Threats and vulnerabilities |
|
|
221 | (1) |
|
How likely is a modern database system to leak private information? |
|
|
222 | (1) |
|
Attacks on database systems |
|
|
222 | (3) |
|
Privacy-preserving techniques in statistical database systems |
|
|
225 | (3) |
|
What to consider when designing a customizable privacy-preserving database system |
|
|
228 | (5) |
|
9 Compressive privacy for machine learning |
|
|
233 | (35) |
|
9.1 Introducing compressive privacy |
|
|
235 | (2) |
|
9.2 The mechanisms of compressive privacy |
|
|
237 | (2) |
|
Principal component analysis (PCA) |
|
|
237 | (1) |
|
Other dimensionality reduction methods |
|
|
238 | (1) |
|
9.3 Using compressive privacy for ML applications |
|
|
239 | (12) |
|
Implementing compressive privacy |
|
|
240 | (6) |
|
The accuracy of the utility task |
|
|
246 | (3) |
|
The effect of p' in DCA for privacy and utility |
|
|
249 | (2) |
|
9.4 Case study: Privacy-preserving PCA and DCA on horizontally partitioned data |
|
|
251 | (17) |
|
Achieving privacy preservation on horizontally partitioned data |
|
|
253 | (1) |
|
Recapping dimensionality reduction approaches |
|
|
254 | (1) |
|
Using additive homomorphic encryption |
|
|
255 | (1) |
|
Overview of the proposed approach |
|
|
256 | (2) |
|
How privacy-preserving computation works |
|
|
258 | (5) |
|
Evaluating the efficiency and accuracy of the privacy-preserving PCA and DCA |
|
|
263 | (5) |
|
10 Putting it all together: Designing a privacy-enhanced platform (DataHub) |
|
|
268 | (23) |
|
10.1 The significance of a research data protection and sharing platform |
|
|
270 | (2) |
|
The motivation behind the DataHub platform |
|
|
270 | (1) |
|
DataHub's important features |
|
|
271 | (1) |
|
10.2 Understanding the research collaboration workspace |
|
|
272 | (8) |
|
|
275 | (1) |
|
Blending different trust models |
|
|
276 | (2) |
|
Configuring access control mechanisms |
|
|
278 | (2) |
|
10.3 Integrating privacy and security technologies into DataHub |
|
|
280 | (11) |
|
Data storage with a cloud-based secure NoSQL database |
|
|
280 | (2) |
|
Privacy preserving data collection with local differential privacy |
|
|
282 | (2) |
|
Privacy-preserving machine learning |
|
|
284 | (2) |
|
Privacy preserving query processing |
|
|
286 | (1) |
|
Using synthetic data generation in the DataHub platform |
|
|
287 | (4) |
Appendix: More details about differential privacy |
|
291 | (8) |
References |
|
299 | (6) |
Index |
|
305 | |