Muutke küpsiste eelistusi

E-raamat: Social Media Data Mining and Analytics [Wiley Online]

  • Formaat: 352 pages
  • Ilmumisaeg: 30-Nov-2018
  • Kirjastus: John Wiley & Sons Inc
  • ISBN-10: 1119183510
  • ISBN-13: 9781119183518
Teised raamatud teemal:
  • Wiley Online
  • Hind: 47,58 €*
  • * hind, mis tagab piiramatu üheaegsete kasutajate arvuga ligipääsu piiramatuks ajaks
  • Formaat: 352 pages
  • Ilmumisaeg: 30-Nov-2018
  • Kirjastus: John Wiley & Sons Inc
  • ISBN-10: 1119183510
  • ISBN-13: 9781119183518
Teised raamatud teemal:
Harness the power of social media to predict customer behavior and improve sales

Social media is the biggest source of Big Data. Because of this, 90% of Fortune 500 companies are investing in Big Data initiatives that will help them predict consumer behavior to produce better sales results. Written by Dr. Gabor Szabo, a Senior Data Scientist at Twitter, and Dr. Oscar Boykin, a Software Engineer at Twitter, Social Media Data Mining and Analytics shows analysts how to use sophisticated techniques to mine social media data, obtaining the information they need to generate amazing results for their businesses.

Social Media Data Mining and Analytics isn't just another book on the business case for social media. Rather, this book provides hands-on examples for applying state-of-the-art tools and technologies to mine social media - examples include Twitter, Facebook, Pinterest, Wikipedia, Reddit, Flickr, Web hyperlinks, and other rich data sources. In it, you will learn:

  • The four key characteristics of online services-users, social networks, actions, and content
  • The full data discovery lifecycle-data extraction, storage, analysis, and visualization
  • How to work with code and extract data to create solutions
  • How to use Big Data to make accurate customer predictions

Szabo and Boykin wrote this book to provide businesses with the competitive advantage they need to harness the rich data that is available from social media platforms.

Introduction xvii
Chapter 1 Users: The Who of Social Media
1(40)
Measuring Variations in User Behavior in Wikipedia
2(1)
The Diversity of User Activities
3(9)
The Origin of the User Activity Distribution
12(8)
The Consequences of the Power Law
20(5)
The Long Tail in Human Activities
25(3)
Long Tails Everywhere: The 80/20 Rule (p/q Rule)
28(4)
Online Behavior on Twitter
32(1)
Retrieving Tweets for Users
33(3)
Logarithmic Binning
36(1)
User Activities on Twitter
37(2)
Summary
39(2)
Chapter 2 Networks: The How of Social Media
41(36)
Types and Properties of Social Networks
42(1)
When Users Create the Connections: Explicit Networks
43(2)
Directed Versus Undirected Graphs
45(1)
Node and Edge Properties
45(1)
Weighted Graphs
46(2)
Creating Graphs from Activities: Implicit Networks
48(3)
Visualizing Networks
51(4)
Degrees: The Winner Takes All
55(2)
Counting the Number of Connections
57(1)
The Long Tail in User Connections
58(4)
Beyond the Idealized Network Model
62(2)
Capturing Correlations: Triangles, Clustering, and Assortativity
64(1)
Local Triangles and Clustering
64(6)
Assortativity
70(5)
Summary
75(2)
Chapter 3 Temporal Processes: The When of Social Media
77(46)
What Traditional Models Tell You About Events in Time
77(2)
When Events Happen Uniformly in Time
79(2)
Inter-Event Times
81(5)
Comparing to a Memoryless Process
86(3)
Autocorrelations
89(2)
Deviations from Memorylessness
91(2)
Periodicities in Time in User Activities
93(6)
Bursty Activities of Individuals
99(6)
Correlations and Bursts
105(1)
Reservoir Sampling
106(4)
Forecasting Metrics in Time
110(2)
Finding Trends
112(3)
Finding Seasonality
115(2)
Forecasting Time Series with ARIMA
117(1)
The Autoregressive Part ("AR")
118(1)
The Moving Average Part ("MA")
119(1)
The Full ARIMA (p, d, q) Model
119(2)
Summary
121(2)
Chapter 4 Content: The What of Social Media
123(48)
Defining Content: Focus on Text and Unstructured Data
123(2)
Creating Features from Text: The Basics of Natural Language Processing
125(3)
The Basic Statistics of Term Occurrences in Text
128(1)
Using Content Features to Identify Topics
129(9)
The Popularity of Topics
138(3)
How Diverse Are Individual Users' Interests?
141(3)
Extracting Low-Dimensional Information from High-Dimensional Text
144(1)
Topic Modeling
145(2)
Unsupervised Topic Modeling
147(8)
Supervised Topic Modeling
155(7)
Relational Topic Modeling
162(7)
Summary
169(2)
Chapter 5 Processing Large Datasets
171(74)
MapReduce: Structuring Parallel and Sequential Operations
172(2)
Counting Words
174(3)
Skew: The Curse of the Last Reducer
177(2)
Multi-Stage MapReduce Flows
179(1)
Fan-Out
180(1)
Merging Data Streams
181(2)
Joining Two Data Sources
183(3)
Joining Against Small Datasets
186(1)
Models of Large-Scale MapReduce
187(1)
Patterns in MapReduce Programming
188(1)
Static MapReduce Jobs
188(7)
Iterative MapReduce Jobs
195(1)
PageRank for Ranking in Graphs
195(4)
k-means Clustering
199(4)
Incremental MapReduce Jobs
203(1)
Temporal MapReduce Jobs
204(1)
Rollups and Data Cubing
205(6)
Expanding Rollup Jobs
211(1)
Challenges with Processing Long-Tailed Social Media Data
212(2)
Sampling and Approximations: Getting Results with Less Computation
214(3)
HyperLogLog
217(2)
HyperLogLog Example
219(2)
HyperLogLog on the Stack Exchange Dataset
221(1)
Performance of HLL on Large Datasets
222(1)
Bloom Filters
223(3)
A Bloom Filter Example
226(2)
Bloom Filter as Pre-Computed Membership Knowledge
228(1)
Bloom Filters on Large Social Datasets
229(2)
Count-Min Sketch
231(2)
Count-Min Sketch---Heavy Hitters Example
233(2)
Count-Min Sketch---Top Percentage Example
235(1)
Aggregating Approximate Data Structures
235(1)
Summary of Approximations
236(1)
Executing on a Hadoop Cluster (Amazon EC2)
237(1)
Installing a CDH Cluster on Amazon EC2
237(4)
Providing IAM Access to Collaborators
241(1)
Adding On-Demand Cluster Capabilities
242(1)
Summary
243(2)
Chapter 6 Learn, Map, and Recommend
245(48)
Social Media Services Online
246(1)
Search Engines
246(1)
Content Engagement
246(2)
Interactions with the Real World
248(1)
Interactions with People
249(2)
Problem Formulation
251(2)
Learning and Mapping
253(2)
Matrix Factorization
255(2)
Learning, Training
257(1)
Under- and Overfitting
257(2)
Regularizing in Matrix Factorization
259(1)
Non-Negative Matrix Factorization and Sparsity
260(1)
Demonstration on Movie Ratings
261(4)
Interpreting the Learned Stereotypes
265(4)
Exploratory Analysis
269(5)
Prediction and Recommendation
274(3)
Evaluation
277(1)
Overview of Methodologies
278(1)
Nearest Neighbor-Based Approaches
278(2)
Approaches Based on Supervised Learning
280(1)
Predicting Movie Ratings with Logistic Regression
280(8)
Common Issues with Features
288(1)
Domain-Specific Applications
289(1)
Summary
290(3)
Chapter 7 Conclusions
293(16)
The Surprising Stability of Human Interaction Patterns
293(3)
Averages, Standard Deviations, and Sampling
296(7)
Removing Outliers
303(6)
Index 309
GABOR SZABO, PHD, is a Senior Staff Software Engineer at Tesla and a former data scientist at Twitter, where he focused on predicting user behavior and content popularity in crowdsourced online services, and on modeling large-scale content dynamics. He also authored the PyCascading data processing library.

GUNGOR POLATKAN, PHD, is a Tech Lead/Engineering Manager designing and implementing end-to-end machine learning and artificial intelligence offline/online pipelines for the LinkedIn Learning relevance backend. He was previously a machine learning scientist at Twitter, where he worked on topics such as ad targeting and user modeling.

P. OSCAR BOYKIN, PHD, is a software engineer at Stripe where he works on machine learning infrastructure. He was previously a Senior Staff Engineer at Twitter, where he worked on data infrastructure problems. He is coauthor of the Scala big-data libraries Algebird, Scalding and Summingbird.

ANTONIOS CHALKIOPOULOS, MSC, is a Distributed Systems Specialist. A system engineer who has delivered fast/big data projects in media, betting, and finance, he is now leading the effort on the Lenses platform for data streaming as a co-founder and CEO at https://lenses.stream.