Introduction |
|
xvii | |
|
Chapter 1 Users: The Who of Social Media |
|
|
1 | (40) |
|
Measuring Variations in User Behavior in Wikipedia |
|
|
2 | (1) |
|
The Diversity of User Activities |
|
|
3 | (9) |
|
The Origin of the User Activity Distribution |
|
|
12 | (8) |
|
The Consequences of the Power Law |
|
|
20 | (5) |
|
The Long Tail in Human Activities |
|
|
25 | (3) |
|
Long Tails Everywhere: The 80/20 Rule (p/q Rule) |
|
|
28 | (4) |
|
Online Behavior on Twitter |
|
|
32 | (1) |
|
Retrieving Tweets for Users |
|
|
33 | (3) |
|
|
36 | (1) |
|
User Activities on Twitter |
|
|
37 | (2) |
|
|
39 | (2) |
|
Chapter 2 Networks: The How of Social Media |
|
|
41 | (36) |
|
Types and Properties of Social Networks |
|
|
42 | (1) |
|
When Users Create the Connections: Explicit Networks |
|
|
43 | (2) |
|
Directed Versus Undirected Graphs |
|
|
45 | (1) |
|
|
45 | (1) |
|
|
46 | (2) |
|
Creating Graphs from Activities: Implicit Networks |
|
|
48 | (3) |
|
|
51 | (4) |
|
Degrees: The Winner Takes All |
|
|
55 | (2) |
|
Counting the Number of Connections |
|
|
57 | (1) |
|
The Long Tail in User Connections |
|
|
58 | (4) |
|
Beyond the Idealized Network Model |
|
|
62 | (2) |
|
Capturing Correlations: Triangles, Clustering, and Assortativity |
|
|
64 | (1) |
|
Local Triangles and Clustering |
|
|
64 | (6) |
|
|
70 | (5) |
|
|
75 | (2) |
|
Chapter 3 Temporal Processes: The When of Social Media |
|
|
77 | (46) |
|
What Traditional Models Tell You About Events in Time |
|
|
77 | (2) |
|
When Events Happen Uniformly in Time |
|
|
79 | (2) |
|
|
81 | (5) |
|
Comparing to a Memoryless Process |
|
|
86 | (3) |
|
|
89 | (2) |
|
Deviations from Memorylessness |
|
|
91 | (2) |
|
Periodicities in Time in User Activities |
|
|
93 | (6) |
|
Bursty Activities of Individuals |
|
|
99 | (6) |
|
|
105 | (1) |
|
|
106 | (4) |
|
Forecasting Metrics in Time |
|
|
110 | (2) |
|
|
112 | (3) |
|
|
115 | (2) |
|
Forecasting Time Series with ARIMA |
|
|
117 | (1) |
|
The Autoregressive Part ("AR") |
|
|
118 | (1) |
|
The Moving Average Part ("MA") |
|
|
119 | (1) |
|
The Full ARIMA (p, d, q) Model |
|
|
119 | (2) |
|
|
121 | (2) |
|
Chapter 4 Content: The What of Social Media |
|
|
123 | (48) |
|
Defining Content: Focus on Text and Unstructured Data |
|
|
123 | (2) |
|
Creating Features from Text: The Basics of Natural Language Processing |
|
|
125 | (3) |
|
The Basic Statistics of Term Occurrences in Text |
|
|
128 | (1) |
|
Using Content Features to Identify Topics |
|
|
129 | (9) |
|
|
138 | (3) |
|
How Diverse Are Individual Users' Interests? |
|
|
141 | (3) |
|
Extracting Low-Dimensional Information from High-Dimensional Text |
|
|
144 | (1) |
|
|
145 | (2) |
|
Unsupervised Topic Modeling |
|
|
147 | (8) |
|
Supervised Topic Modeling |
|
|
155 | (7) |
|
Relational Topic Modeling |
|
|
162 | (7) |
|
|
169 | (2) |
|
Chapter 5 Processing Large Datasets |
|
|
171 | (74) |
|
MapReduce: Structuring Parallel and Sequential Operations |
|
|
172 | (2) |
|
|
174 | (3) |
|
Skew: The Curse of the Last Reducer |
|
|
177 | (2) |
|
Multi-Stage MapReduce Flows |
|
|
179 | (1) |
|
|
180 | (1) |
|
|
181 | (2) |
|
|
183 | (3) |
|
Joining Against Small Datasets |
|
|
186 | (1) |
|
Models of Large-Scale MapReduce |
|
|
187 | (1) |
|
Patterns in MapReduce Programming |
|
|
188 | (1) |
|
|
188 | (7) |
|
|
195 | (1) |
|
PageRank for Ranking in Graphs |
|
|
195 | (4) |
|
|
199 | (4) |
|
Incremental MapReduce Jobs |
|
|
203 | (1) |
|
|
204 | (1) |
|
|
205 | (6) |
|
|
211 | (1) |
|
Challenges with Processing Long-Tailed Social Media Data |
|
|
212 | (2) |
|
Sampling and Approximations: Getting Results with Less Computation |
|
|
214 | (3) |
|
|
217 | (2) |
|
|
219 | (2) |
|
HyperLogLog on the Stack Exchange Dataset |
|
|
221 | (1) |
|
Performance of HLL on Large Datasets |
|
|
222 | (1) |
|
|
223 | (3) |
|
|
226 | (2) |
|
Bloom Filter as Pre-Computed Membership Knowledge |
|
|
228 | (1) |
|
Bloom Filters on Large Social Datasets |
|
|
229 | (2) |
|
|
231 | (2) |
|
Count-Min Sketch---Heavy Hitters Example |
|
|
233 | (2) |
|
Count-Min Sketch---Top Percentage Example |
|
|
235 | (1) |
|
Aggregating Approximate Data Structures |
|
|
235 | (1) |
|
Summary of Approximations |
|
|
236 | (1) |
|
Executing on a Hadoop Cluster (Amazon EC2) |
|
|
237 | (1) |
|
Installing a CDH Cluster on Amazon EC2 |
|
|
237 | (4) |
|
Providing IAM Access to Collaborators |
|
|
241 | (1) |
|
Adding On-Demand Cluster Capabilities |
|
|
242 | (1) |
|
|
243 | (2) |
|
Chapter 6 Learn, Map, and Recommend |
|
|
245 | (48) |
|
Social Media Services Online |
|
|
246 | (1) |
|
|
246 | (1) |
|
|
246 | (2) |
|
Interactions with the Real World |
|
|
248 | (1) |
|
|
249 | (2) |
|
|
251 | (2) |
|
|
253 | (2) |
|
|
255 | (2) |
|
|
257 | (1) |
|
|
257 | (2) |
|
Regularizing in Matrix Factorization |
|
|
259 | (1) |
|
Non-Negative Matrix Factorization and Sparsity |
|
|
260 | (1) |
|
Demonstration on Movie Ratings |
|
|
261 | (4) |
|
Interpreting the Learned Stereotypes |
|
|
265 | (4) |
|
|
269 | (5) |
|
Prediction and Recommendation |
|
|
274 | (3) |
|
|
277 | (1) |
|
Overview of Methodologies |
|
|
278 | (1) |
|
Nearest Neighbor-Based Approaches |
|
|
278 | (2) |
|
Approaches Based on Supervised Learning |
|
|
280 | (1) |
|
Predicting Movie Ratings with Logistic Regression |
|
|
280 | (8) |
|
Common Issues with Features |
|
|
288 | (1) |
|
Domain-Specific Applications |
|
|
289 | (1) |
|
|
290 | (3) |
|
|
293 | (16) |
|
The Surprising Stability of Human Interaction Patterns |
|
|
293 | (3) |
|
Averages, Standard Deviations, and Sampling |
|
|
296 | (7) |
|
|
303 | (6) |
Index |
|
309 | |