Klienditugi: 7440010 (E-R 10-18)

Abi | Registreeri | Logi sisse

E-raamat: Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data

3.95/5 (64 hinnangut Goodreads-ist)

Bing Liu

Formaat: PDF+DRM
Sari: Data-Centric Systems and Applications
Ilmumisaeg: 25-Jun-2011
Kirjastus: Springer-Verlag Berlin and Heidelberg GmbH & Co. K
Keel: eng
ISBN-13: 9783642194603

Teised raamatud teemal:

Formaat - PDF+DRM
Hind: 61,74 €*
* hind on lõplik, st. muud allahindlused enam ei rakendu
Lisa ostukorvi
Lisa soovinimekirja
See e-raamat on mõeldud ainult isiklikuks kasutamiseks. E-raamatuid ei saa tagastada.

Formaat: PDF+DRM
Sari: Data-Centric Systems and Applications
Ilmumisaeg: 25-Jun-2011
Kirjastus: Springer-Verlag Berlin and Heidelberg GmbH & Co. K
Keel: eng
ISBN-13: 9783642194603

Teised raamatud teemal:

DRM piirangud

Kopeerimine (copy/paste):

ei ole lubatud
Printimine:

ei ole lubatud
Kasutamine:

Digitaalõiguste kaitse (DRM)
Kirjastus on väljastanud selle e-raamatu krüpteeritud kujul, mis tähendab, et selle lugemiseks peate installeerima spetsiaalse tarkvara. Samuti peate looma endale Adobe ID Rohkem infot siin. E-raamatut saab lugeda 1 kasutaja ning alla laadida kuni 6'de seadmesse (kõik autoriseeritud sama Adobe ID-ga).

Vajalik tarkvara
Mobiilsetes seadmetes (telefon või tahvelarvuti) lugemiseks peate installeerima selle tasuta rakenduse: PocketBook Reader (iOS / Android)

PC või Mac seadmes lugemiseks peate installima Adobe Digital Editionsi (Seeon tasuta rakendus spetsiaalselt e-raamatute lugemiseks. Seda ei tohi segamini ajada Adober Reader'iga, mis tõenäoliselt on juba teie arvutisse installeeritud )

Seda e-raamatut ei saa lugeda Amazon Kindle's.

Now in its second, updated edition, this authoritative and coherent text contains a rich blend of theory and practice and covers all the essential concepts and algorithms from relevant fields such as data mining, machine learning, and text processing.

Web mining aims to discover useful information and knowledge from Web hyperlinks, page contents, and usage data. Although Web mining uses many conventional data mining techniques, it is not purely an application of traditional data mining due to the semi-structured and unstructured nature of the Web data. The field has also developed many of its own algorithms and techniques.

Liu has written a comprehensive text on Web mining, which consists of two parts. The first part covers the data mining and machine learning foundations, where all the essential concepts and algorithms of data mining and machine learning are presented. The second part covers the key topics of Web mining, where Web crawling, search, social network analysis, structured data extraction, information integration, opinion mining and sentiment analysis, Web usage mining, query log mining, computational advertising, and recommender systems are all treated both in breadth and in depth. His book thus brings all the related concepts and algorithms together to form an authoritative and coherent text.

The book offers a rich blend of theory and practice. It is suitable for students, researchers and practitioners interested in Web mining and data mining both as a learning text and as a reference book. Professors can readily use it for classes on data mining, Web mining, and text mining. Additional teaching materials such as lecture slides, datasets, and implemented algorithms are available online.

Arvustused

From the reviews:

"This is a textbook about data mining and its application to the Web. [ ] Liu succeeds in helping readers appreciate the key role that data mining and machine learning play in Web applications. [ ] It also motivates the student by adding immediacy and relevance to the concepts and algorithms described. I liked the way the concepts are introduced in a stepwise manner. [ ] I also appreciated the bibliographical notes at the end of each chapter." ACM Computing Reviews, W. Hu, , January 2009

From the reviews of the second edition:

Liu (Univ. of Illinois, Chicago) discusses all three types of Web mining--structure, content, and usage--in the technologys efforts to glean information from hyperlinks, Web page content, and usage logs. [ ] Practical examples complement the discussions throughout the text, and each chapter includes useful Bibliographic Notes and an extensive bibliography. [ ] Liu states that his intended audience includes bothundergraduate and graduate students, but notes that researchers and Web programmers could benefit from this text as well. Summing Up: Recommended. Upper-division undergraduates through professionals. J. Johnson, Choice, Vol. 49 (5), January 2012

"[ ...] Liu's book provides a comprehensive, self-contained introduction to the major data mining techniques and their use in Web data mining. [ ...] Professionals and researchers alike will find this excellent book handy as a reference. Its extensive lists of references at the end of each chapter provide hundreds of pointers for further reading. As a textbook, it is also suitable for advanced undergraduate and graduate courses on Web mining; it is highly selfcontained and includes many easy-to-understand examples that will help readers grasp the key ideas behind current Web data mining techniques." ACM Computing Reviews, Fernando Berzal, February 2012

1 Introduction

(16)

1.1 What is the World Wide Web?

(1)

1.2 A Brief History of the Web and the Internet

(2)

1.3 Web Data Mining

(4)

1.3.1 What is Data Mining?

(1)

1.3.2 What is Web Mining?

(1)

1.4 Summary of
Chapters

(3)

1.5 How to Read this Book

(1)

Bibliographic Notes

(1)

Bibliography

(4)

Part I Data Mining Foundations

2 Association Rules and Sequential Patterns

(46)

2.1 Basic Concepts of Association Rules

(3)

2.2 Apriori Algorithm

(6)

2.2.1 Frequent Itemset Generation

(4)

2.2.2 Association Rule Generation

(2)

2.3 Data Formats for Association Rule Mining

(1)

2.4 Mining with Multiple Minimum Supports

(10)

2.4.1 Extended Model

(2)

2.4.2 Mining Algorithm

(5)

2.4.3 Rule Generation

(1)

2.5 Mining Class Association Rules

(5)

2.5.1 Problem Definition

(2)

2.5.2 Mining Algorithm

(3)

2.5.3 Mining with Multiple Minimum Supports

(1)

2.6 Basic Concepts of Sequential Patterns

(2)

2.7 Mining Sequential Patterns Based on GSP

(6)

2.7.1 GSP Algorithm

(2)

2.7.2 Mining with Multiple Minimum Supports

(4)

2.8 Mining Sequential Patterns Based on PrefixSpan

(4)

2.8.1 PrefixSpan Algorithm

(2)

2.8.2 Mining with Multiple Minimum Supports

(1)

2.9 Generating Rules from Sequential Patterns

(3)

2.9.1 Sequential Rules

(1)

2.9.2 Label Sequential Rules

(1)

2.9.3 Class Sequential Rules

(1)

Bibliographic Notes

(2)

Bibliography

(5)

3 Supervised Learning

(70)

3.1 Basic Concepts

(4)

3.2 Decision Tree Induction

(12)

3.2.1 Learning Algorithm

(1)

3.2.2 Impurity Function

(4)

3.2.3 Handling of Continuous Attributes

(1)

3.2.4 Some Other Issues

(3)

3.3 Classifier Evaluation

(8)

3.3.1 Evaluation Methods

(2)

3.3.2 Precision, Recall, F-score and Breakeven Point

(2)

3.3.3 Receiver Operating Characteristic Curve

(3)

3.3.4 Lift Curve

(1)

3.4 Rule Induction

(6)

3.4.1 Sequential Covering

(3)

3.4.2 Rule Learning: Learn-One-Rule Function

(3)

3.4.3 Discussion

(1)

3.5 Classification Based on Associations

(7)

3.5.1 Classification Using Class Association Rules

(4)

3.5.2 Class Association Rules as Features

(1)

3.5.3 Classification Using Normal Association Rules

(1)

3.6 Naive Bayesian Classification

100

(3)

3.7 Naive Bayesian Text Classification

103

(6)

3.7.1 Probabilistic Framework

104

(1)

3.7.2 Naive Bayesian Model

105

(3)

3.7.3 Discussion

108

(1)

3.8 Support Vector Machines

109

(15)

3.8.1 Linear SVM: Separable Case

111

(6)

3.8.2 Linear SVM: Non-Separable Case

117

(3)

3.8.3 Nonlinear SVM: Kernel Functions

120

(4)

3.9 K-Nearest Neighbor Learning

124

(2)

3.10 Ensemble of Classifiers

126

(2)

3.10.1 Bagging

126

(1)

3.10.2 Boosting

126

(2)

Bibliographic Notes

128

(1)

Bibliography

129

(4)

4 Unsupervised Learning

133

(38)

4.1 Basic Concepts

133

(3)

4.2 K-means Clustering

136

(8)

4.2.1 K-means Algorithm

136

(3)

4.2.2 Disk Version of the K-means Algorithm

139

(1)

4.2.3 Strengths and Weaknesses

140

(4)

4.3 Representation of Clusters

144

(3)

4.3.1 Common Ways of Representing Clusters

145

(1)

4.3.2 Clusters of Arbitrary Shapes

146

(1)

4.4 Hierarchical Clustering

147

(4)

4.4.1 Single-Link Method

149

(1)

4.4.2 Complete-Link Method

149

(1)

4.4.3 Average-Link Method

150

(1)

4.4.4 Strengths and Weaknesses

150

(1)

4.5 Distance Functions

151

(4)

4.5.1 Numeric Attributes

151

(1)

4.5.2 Binary and Nominal Attributes

152

(2)

4.5.3 Text Documents

154

(1)

4.6 Data Standardization

155

(2)

4.7 Handling of Mixed Attributes

157

(2)

4.8 Which Clustering Algorithm to Use?

159

(1)

4.9 Cluster Evaluation

159

(3)

4.10 Discovering Holes and Data Regions

162

(3)

Bibliographic Notes

165

(1)

Bibliography

166

(5)

5 Partially Supervised Learning

171

(40)

5.1 Learning from Labeled and Unlabeled Examples

171

(13)

5.1.1 EM Algorithm with Naive Bayesian Classification

173

(3)

5.1.2 Co-Training

176

(2)

5.1.3 Self-Training

178

(1)

5.1.4 Transductive Support Vector Machines

179

(1)

5.1.5 Graph-Based Methods

180

(3)

5.1.6 Discussion

183

(1)

5.2 Learning from Positive and Unlabeled Examples

184

(18)

5.2.1 Applications of PU Learning

185

(2)

5.2.2 Theoretical Foundation

187

(3)

5.2.3 Building Classifiers: Two-Step Approach

190

(7)

5.2.4 Building Classifiers: Biased-SVM

197

(2)

5.2.5 Building Classifiers: Probability Estimation

199

(2)

5.2.6 Discussion

201

(1)

Appendix: Derivation of EM for Naive Bayesian Classification

202

(2)

Bibliographic Notes

204

(2)

Bibliography

206

(5)

Part II Web Mining

6 Information Retrieval and Web Search

211

(58)

6.1 Basic Concepts of Information Retrieval

212

(3)

6.2 Information Retrieval Models

215

(5)

6.2.1 Boolean Model

216

(1)

6.2.2 Vector Space Model

217

(2)

6.2.3 Statistical Language Model

219

(1)

6.3 Relevance Feedback

220

(3)

6.4 Evaluation Measures

223

(4)

6.5 Text and Web Page Pre-Processing

227

(5)

6.5.1 Stopword Removal

227

(1)

6.5.2 Stemming

228

(1)

6.5.3 Other Pre-Processing Tasks for Text

228

(1)

6.5.4 Web Page Pre-Processing

229

(2)

6.5.5 Duplicate Detection

231

(1)

6.6 Inverted Index and Its Compression

232

(10)

6.6.1 Inverted Index

232

(2)

6.6.2 Search Using an Inverted Index

234

(1)

6.6.3 Index Construction

235

(1)

6.6.4 Index Compression

236

(6)

6.7 Latent Semantic Indexing

242

(7)

6.7.1 Singular Value Decomposition

243

(2)

6.7.2 Query and Retrieval

245

(1)

6.7.3 An Example

246

(3)

6.7.4 Discussion

249

(1)

6.8 Web Search

249

(3)

6.9 Meta-Search: Combining Multiple Rankings

252

(5)

6.9.1 Combination Using Similarity Scores

254

(1)

6.9.2 Combination Using Rank Positions

255

(2)

6.10 Web Spamming

257

(6)

6.10.1 Content Spamming

258

(1)

6.10.2 Link Spamming

259

(1)

6.10.3 Hiding Techniques

260

(1)

6.10.4 Combating Spam

261

(2)

Bibliographic Notes

263

(1)

Bibliography

264

(5)

7 Social Network Analysis

269

(42)

7.1 Social Network Analysis

270

(5)

7.1.1 Centrality

270

(3)

7.1.2 Prestige

273

(2)

7.2 Co-Citation and Bibliographic Coupling

275

(2)

7.2.1 Co-Citation

276

(1)

7.2.2 Bibliographic Coupling

277

(1)

7.3 PageRank

277

(11)

7.3.1 PageRank Algorithm

278

(7)

7.3.2 Strengths and Weaknesses of PageRank

285

(1)

7.3.3 Timed PageRank and Recency Search

286

(2)

7.4 HITS

288

(6)

7.4.1 HITS Algorithm

289

(2)

7.4.2 Finding Other Eigenvectors

291

(1)

7.4.3 Relationships with Co-Citation and Bibliographic Coupling

292

(1)

7.4.4 Strengths and Weaknesses of HITS

293

(1)

7.5 Community Discovery

294

(10)

7.5.1 Problem Definition

295

(2)

7.5.2 Bipartite Core Communities

297

(1)

7.5.3 Maximum Flow Communities

298

(3)

7.5.4 Email Communities Based on Betweenness

301

(2)

7.5.5 Overlapping Communities of Named Entities

303

(1)

Bibliographic Notes

304

(1)

Bibliography

305

(6)

8 Web Crawling

311

(52)

8.1 A Basic Crawler Algorithm

312

(3)

8.1.1 Breadth-First Crawlers

313

(1)

8.1.2 Preferential Crawlers

314

(1)

8.2 Implementation Issues

315

(8)

8.2.1 Fetching

315

(1)

8.2.2 Parsing

316

(2)

8.2.3 Stopword Removal and Stemming

318

(1)

8.2.4 Link Extraction and Canonicalization

318

(2)

8.2.5 Spider Traps

320

(1)

8.2.6 Page Repository

321

(1)

8.2.7 Concurrency

322

(1)

8.3 Universal Crawlers

323

(4)

8.3.1 Scalability

324

(2)

8.3.2 Coverage vs. Freshness vs. Importance

326

(1)

8.4 Focused Crawlers

327

(3)

8.5 Topical Crawlers

330

(18)

8.5.1 Topical Locality and Cues

332

(6)

8.5.2 Best-First Variations

338

(3)

8.5.3 Adaptation

341

(7)

8.6 Evaluation

348

(5)

8.7 Crawler Ethics and Conflicts

353

(3)

8.8 Some New Developments

356

(2)

Bibliographic Notes

358

(1)

Bibliography

359

(4)

9 Structured Data Extraction: Wrapper Generation

363

(62)

9.1 Preliminaries

364

(6)

9.1.1 Two Types of Data Rich Pages

364

(2)

9.1.2 Data Model

366

(2)

9.1.3 HTML Mark-Up Encoding of Data Instances

368

(2)

9.2 Wrapper Induction

370

(8)

9.2.1 Extraction from a Page

370

(3)

9.2.2 Learning Extraction Rules

373

(4)

9.2.3 Identifying Informative Examples

377

(1)

9.2.4 Wrapper Maintenance

378

(1)

9.3 Instance-Based Wrapper Learning

378

(3)

9.4 Automatic Wrapper Generation: Problems

381

(3)

9.4.1 Two Extraction Problems

382

(1)

9.4.2 Patterns as Regular Expressions

383

(1)

9.5 String Matching and Tree Matching

384

(6)

9.5.1 String Edit Distance

384

(2)

9.5.2 Tree Matching

386

(4)

9.6 Multiple Alignment

390

(6)

9.6.1 Center Star Method

390

(1)

9.6.2 Partial Tree Alignment

391

(5)

9.7 Building DOM Trees

396

(1)

9.8 Extraction Based on a Single List Page: Flat Data Records

397

(10)

9.8.1 Two Observations about Data Records

398

(1)

9.8.2 Mining Data Regions

399

(5)

9.8.3 Identifying Data Records in Data Regions

404

(1)

9.8.4 Data Item Alignment and Extraction

405

(1)

9.8.5 Making Use of Visual Information

406

(1)

9.8.6 Some Other Techniques

406

(1)

9.9 Extraction Based on a Single List Page: Nested Data Records

407

(6)

9.10 Extraction Based on Multiple Pages

413

(2)

9.10.1 Using Techniques in Previous Sections

413

(1)

9.10.2 RoadRunner Algorithm

414

(1)

9.11 Some Other Issues

415

(4)

9.11.1 Extraction from Other Pages

416

(1)

9.11.2 Disjunction or Optional

416

(1)

9.11.3 A Set Type or a Tuple Type

417

(1)

9.11.4 Labeling and Integration

418

(1)

9.11.5 Domain Specific Extraction

418

(1)

9.12 Discussion

419

(1)

Bibliographic Notes

419

(2)

Bibliography

421

(4)

10 Information Integration

425

(34)

10.1 Introduction to Schema Matching

426

(2)

10.2 Pre-Processing for Schema Matching

428

(1)

10.3 Schema-Level Matching

429

(2)

10.3.1 Linguistic Approaches

429

(1)

10.3.2 Constraint Based Approaches

430

(1)

10.4 Domain and Instance-Level Matching

431

(3)

10.5 Combining Similarities

434

(1)

10.6 1:m Match

435

(1)

10.7 Some Other Issues

436

(2)

10.7.1 Reuse of Previous Match Results

436

(1)

10.7.2 Matching a Large Number of Schemas

437

(1)

10.7.3 Schema Match Results

437

(1)

10.7.4 User Interactions

438

(1)

10.8 Integration of Web Query Interfaces

438

(12)

10.8.1 A Clustering Based Approach

441

(3)

10.8.2 A Correlation Based Approach

444

(3)

10.8.3 An Instance Based Approach

447

(3)

10.9 Constructing a Unified Global Query Interface

450

(4)

10.9.1 Structural Appropriateness and the Merge Algorithm

451

(2)

10.9.2 Lexical Appropriateness

453

(1)

10.9.3 Instance Appropriateness

454

(1)

Bibliographic Notes

454

(1)

Bibliography

455

(4)

11 Opinion Mining and Sentiment Analysis

459

(68)

1.1.1 The Problem of Opinion Mining

460

(1)

11.1.1 Problem Definitions

460

(7)

11.1.2 Aspect-Based Opinion Summary

467

(2)

11.2 Document Sentiment Classification

469

(5)

11.2.1 Classification Based on Supervised Learning

470

(2)

11.2.2 Classification Based on Unsupervised Learning

472

(2)

11.3 Sentence Subjectivity and Sentiment Classification

474

(3)

11.4 Opinion Lexicon Expansion

477

(3)

11.5 Aspect-Based Opinion Mining

480

(13)

11.5.1 Aspect Sentiment Classification

481

(2)

11.5.2 Basic Rules of Opinions

483

(3)

11.5.3 Aspect Extraction

486

(4)

11.5.4 Simultaneous Opinion Lexicon Expansion and Aspect Extraction

490

(3)

11.6 Mining Comparative Opinions

493

(5)

11.6.1 Problem Definitions

493

(2)

11.6.2 Identification of Comparative Sentences

495

(1)

11.6.3 Identification of Preferred Entities

496

(2)

11.7 Some Other Problems

498

(5)

11.8 Opinion Search and Retrieval

503

(3)

11.9 Opinion Spam Detection

506

(8)

11.9.1 Types of Spam and Spammers

506

(2)

11.9.2 Hiding Techniques

508

(1)

11.9.3 Spam Detection Based on Supervised Learning

509

(2)

11.9.4 Spam Detection Based on Abnormal Behaviors

511

(2)

11.9.5 Group Spam Detection

513

(1)

11.10 Utility of Reviews

514

(1)

Bibliographic Notes

515

(2)

Bibliography

517

(10)

12 Web Usage Mining

527

(78)

12.1 Data Collection and Pre-Processing

528

(12)

12.1.1 Sources and Types of Data

530

(3)

12.1.2 Key Elements of Web Usage Data Pre-Processing

533

(7)

12.2 Data Modeling for Web Usage Mining

540

(4)

12.3 Discovery and Analysis of Web Usage Patterns

544

(11)

12.3.1 Session and Visitor Analysis

544

(1)

12.3.2 Cluster Analysis and Visitor Segmentation

545

(4)

12.3.3 Association and Correlation Analysis

549

(1)

12.3.4 Analysis of Sequential and Navigational Patterns

550

(4)

12.3:5 Classification and Prediction based on Web User Transactions

554

(1)

12.4 Recommender Systems and Collaborative Filtering

555

(16)

12.4.1 The Recommendation Problem

556

(1)

12.4.2 Content-Based Recommendation

557

(2)

12.4.3 Collaborative Filtering: K-Nearest Neighbor (KNN)

559

(2)

12.4.4 Collaborative Filtering: Using Association Rules

561

(4)

12.4.5 Collaborative Filtering: Matrix Factorization

565

(6)

12.5 Query Log Mining

571

(18)

12.5.1 Data Sources, Characteristics, and Challenges

573

(1)

12.5.2 Query Log Data Preparation

574

(3)

12.5.3 Query Log Data Models

577

(5)

12.5.4 Query Log Feature Extraction

582

(1)

12.5.5 Query Log Mining Applications

583

(3)

12.5.6 Query Log Mining Methods

586

(3)

12.6 Computational Advertising

589

(4)

12.7 Discussion and Outlook

593

(1)

Bibliographic Notes

593

(1)

Bibliography

594

(11)

Subject Index

605

Bing Liu is a professor of Computer Science at the University of Illinois at Chicago (UIC). He received his PhD in Artificial Intelligence from the University of Edinburgh. Before joining UIC, he was with the National University of Singapore. His current research interests include opinion mining and sentiment analysis, text and Web mining, data mining, and machine learning. He has published extensively in top journals and conferences in these fields. Several of his publications are considered seminal papers of the fields and are highly cited. He has also given more than 30 keynote and invited talks in academia and in industry. On professional services, Liu has served as associate editors of IEEE Transactions on Knowledge and Data Engineering (TKDE), Journal of Data Mining and Knowledge Discovery (DMKD), and SIGKDD Explorations, and is on the editorial boards of several other journals. He has also served as program chairs of IEEE International Conference on Data Mining (ICDM-2010), ACM Conference on Web Search and Data Mining (WSDM-2010), ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2008), SIAM Conference on Data Mining (SDM-2007), ACM Conference on Information and Knowledge Management (CIKM-2006), and Pacific Asia Conference on Data Mining (PAKDD-2002). Additionally, Liu has served extensively as area chairs and program committee members of leading conferences on data mining, Web mining, natural language processing, and machine learning. More information about him can be found from http://www.cs.uic.edu/~liub.

Lisainfo e-raamatute kohta

Püsilink: https://www.kriso.ee/db/97836421946032e.html

Märksõnad:

E-raamat: Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data

DRM piirangud

Kopeerimine (copy/paste):

Printimine:

Kasutamine:

Arvustused

Konto & seaded

Otsing

Otsingu andmebaas

Filtreeri tulemusi

Teemad E-raamatute teemad

Vali ostukorv