Klienditugi: 7440010 (E-R 10-18)

Abi | Registreeri | Logi sisse

E-raamat: Social Media Data Mining and Analytics [Wiley Online]

Antonios Chalkiopoulos, Gabor Szabo, P. Oscar Boykin, Gungor Polatkan

Formaat: 352 pages
Ilmumisaeg: 30-Nov-2018
Kirjastus: John Wiley & Sons Inc
ISBN-10: 1119183510
ISBN-13: 9781119183518

Teised raamatud teemal:

Wiley Online
Hind: 47,58 €*
* hind, mis tagab piiramatu üheaegsete kasutajate arvuga ligipääsu piiramatuks ajaks

Formaat: 352 pages
Ilmumisaeg: 30-Nov-2018
Kirjastus: John Wiley & Sons Inc
ISBN-10: 1119183510
ISBN-13: 9781119183518

Teised raamatud teemal:

Rohkem infot Wiley Online kohta

Raamatu kodulehekülg: https://onlinelibrary.wiley.com/doi/book/10.1002/9781119183518

Harness the power of social media to predict customer behavior and improve sales

Social media is the biggest source of Big Data. Because of this, 90% of Fortune 500 companies are investing in Big Data initiatives that will help them predict consumer behavior to produce better sales results. Written by Dr. Gabor Szabo, a Senior Data Scientist at Twitter, and Dr. Oscar Boykin, a Software Engineer at Twitter, Social Media Data Mining and Analytics shows analysts how to use sophisticated techniques to mine social media data, obtaining the information they need to generate amazing results for their businesses.

Social Media Data Mining and Analytics isn't just another book on the business case for social media. Rather, this book provides hands-on examples for applying state-of-the-art tools and technologies to mine social media - examples include Twitter, Facebook, Pinterest, Wikipedia, Reddit, Flickr, Web hyperlinks, and other rich data sources. In it, you will learn:

The four key characteristics of online services-users, social networks, actions, and content
The full data discovery lifecycle-data extraction, storage, analysis, and visualization
How to work with code and extract data to create solutions
How to use Big Data to make accurate customer predictions

Szabo and Boykin wrote this book to provide businesses with the competitive advantage they need to harness the rich data that is available from social media platforms.

Introduction

xvii

Chapter 1 Users: The Who of Social Media

(40)

Measuring Variations in User Behavior in Wikipedia

(1)

The Diversity of User Activities

(9)

The Origin of the User Activity Distribution

(8)

The Consequences of the Power Law

(5)

The Long Tail in Human Activities

(3)

Long Tails Everywhere: The 80/20 Rule (p/q Rule)

(4)

Online Behavior on Twitter

(1)

Retrieving Tweets for Users

(3)

Logarithmic Binning

(1)

User Activities on Twitter

(2)

Summary

(2)

Chapter 2 Networks: The How of Social Media

(36)

Types and Properties of Social Networks

(1)

When Users Create the Connections: Explicit Networks

(2)

Directed Versus Undirected Graphs

(1)

Node and Edge Properties

(1)

Weighted Graphs

(2)

Creating Graphs from Activities: Implicit Networks

(3)

Visualizing Networks

(4)

Degrees: The Winner Takes All

(2)

Counting the Number of Connections

(1)

The Long Tail in User Connections

(4)

Beyond the Idealized Network Model

(2)

Capturing Correlations: Triangles, Clustering, and Assortativity

(1)

Local Triangles and Clustering

(6)

Assortativity

(5)

Summary

(2)

Chapter 3 Temporal Processes: The When of Social Media

(46)

What Traditional Models Tell You About Events in Time

(2)

When Events Happen Uniformly in Time

(2)

Inter-Event Times

(5)

Comparing to a Memoryless Process

(3)

Autocorrelations

(2)

Deviations from Memorylessness

(2)

Periodicities in Time in User Activities

(6)

Bursty Activities of Individuals

(6)

Correlations and Bursts

105

(1)

Reservoir Sampling

106

(4)

Forecasting Metrics in Time

110

(2)

Finding Trends

112

(3)

Finding Seasonality

115

(2)

Forecasting Time Series with ARIMA

117

(1)

The Autoregressive Part ("AR")

118

(1)

The Moving Average Part ("MA")

119

(1)

The Full ARIMA (p, d, q) Model

119

(2)

Summary

121

(2)

Chapter 4 Content: The What of Social Media

123

(48)

Defining Content: Focus on Text and Unstructured Data

123

(2)

Creating Features from Text: The Basics of Natural Language Processing

125

(3)

The Basic Statistics of Term Occurrences in Text

128

(1)

Using Content Features to Identify Topics

129

(9)

The Popularity of Topics

138

(3)

How Diverse Are Individual Users' Interests?

141

(3)

Extracting Low-Dimensional Information from High-Dimensional Text

144

(1)

Topic Modeling

145

(2)

Unsupervised Topic Modeling

147

(8)

Supervised Topic Modeling

155

(7)

Relational Topic Modeling

162

(7)

Summary

169

(2)

Chapter 5 Processing Large Datasets

171

(74)

MapReduce: Structuring Parallel and Sequential Operations

172

(2)

Counting Words

174

(3)

Skew: The Curse of the Last Reducer

177

(2)

Multi-Stage MapReduce Flows

179

(1)

Fan-Out

180

(1)

Merging Data Streams

181

(2)

Joining Two Data Sources

183

(3)

Joining Against Small Datasets

186

(1)

Models of Large-Scale MapReduce

187

(1)

Patterns in MapReduce Programming

188

(1)

Static MapReduce Jobs

188

(7)

Iterative MapReduce Jobs

195

(1)

PageRank for Ranking in Graphs

195

(4)

k-means Clustering

199

(4)

Incremental MapReduce Jobs

203

(1)

Temporal MapReduce Jobs

204

(1)

Rollups and Data Cubing

205

(6)

Expanding Rollup Jobs

211

(1)

Challenges with Processing Long-Tailed Social Media Data

212

(2)

Sampling and Approximations: Getting Results with Less Computation

214

(3)

HyperLogLog

217

(2)

HyperLogLog Example

219

(2)

HyperLogLog on the Stack Exchange Dataset

221

(1)

Performance of HLL on Large Datasets

222

(1)

Bloom Filters

223

(3)

A Bloom Filter Example

226

(2)

Bloom Filter as Pre-Computed Membership Knowledge

228

(1)

Bloom Filters on Large Social Datasets

229

(2)

Count-Min Sketch

231

(2)

Count-Min Sketch---Heavy Hitters Example

233

(2)

Count-Min Sketch---Top Percentage Example

235

(1)

Aggregating Approximate Data Structures

235

(1)

Summary of Approximations

236

(1)

Executing on a Hadoop Cluster (Amazon EC2)

237

(1)

Installing a CDH Cluster on Amazon EC2

237

(4)

Providing IAM Access to Collaborators

241

(1)

Adding On-Demand Cluster Capabilities

242

(1)

Summary

243

(2)

Chapter 6 Learn, Map, and Recommend

245

(48)

Social Media Services Online

246

(1)

Search Engines

246

(1)

Content Engagement

246

(2)

Interactions with the Real World

248

(1)

Interactions with People

249

(2)

Problem Formulation

251

(2)

Learning and Mapping

253

(2)

Matrix Factorization

255

(2)

Learning, Training

257

(1)

Under- and Overfitting

257

(2)

Regularizing in Matrix Factorization

259

(1)

Non-Negative Matrix Factorization and Sparsity

260

(1)

Demonstration on Movie Ratings

261

(4)

Interpreting the Learned Stereotypes

265

(4)

Exploratory Analysis

269

(5)

Prediction and Recommendation

274

(3)

Evaluation

277

(1)

Overview of Methodologies

278

(1)

Nearest Neighbor-Based Approaches

278

(2)

Approaches Based on Supervised Learning

280

(1)

Predicting Movie Ratings with Logistic Regression

280

(8)

Common Issues with Features

288

(1)

Domain-Specific Applications

289

(1)

Summary

290

(3)

Chapter 7 Conclusions

293

(16)

The Surprising Stability of Human Interaction Patterns

293

(3)

Averages, Standard Deviations, and Sampling

296

(7)

Removing Outliers

303

(6)

Index

309

GABOR SZABO, PHD, is a Senior Staff Software Engineer at Tesla and a former data scientist at Twitter, where he focused on predicting user behavior and content popularity in crowdsourced online services, and on modeling large-scale content dynamics. He also authored the PyCascading data processing library.

GUNGOR POLATKAN, PHD, is a Tech Lead/Engineering Manager designing and implementing end-to-end machine learning and artificial intelligence offline/online pipelines for the LinkedIn Learning relevance backend. He was previously a machine learning scientist at Twitter, where he worked on topics such as ad targeting and user modeling.

P. OSCAR BOYKIN, PHD, is a software engineer at Stripe where he works on machine learning infrastructure. He was previously a Senior Staff Engineer at Twitter, where he worked on data infrastructure problems. He is coauthor of the Scala big-data libraries Algebird, Scalding and Summingbird.

ANTONIOS CHALKIOPOULOS, MSC, is a Distributed Systems Specialist. A system engineer who has delivered fast/big data projects in media, betting, and finance, he is now leading the effort on the Lenses platform for data streaming as a co-founder and CEO at https://lenses.stream.

Püsilink: https://www.kriso.ee/db/9781119183518_pe.html

Märksõnad:

E-raamat: Social Media Data Mining and Analytics [Wiley Online]

Konto & seaded

Otsing

Otsingu andmebaas

Filtreeri tulemusi

Teemad Kirjastuste teemad

Vali ostukorv