Muutke küpsiste eelistusi

E-raamat: Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data

  • Formaat - PDF+DRM
  • Hind: 61,74 €*
  • * hind on lõplik, st. muud allahindlused enam ei rakendu
  • Lisa ostukorvi
  • Lisa soovinimekirja
  • See e-raamat on mõeldud ainult isiklikuks kasutamiseks. E-raamatuid ei saa tagastada.

DRM piirangud

  • Kopeerimine (copy/paste):

    ei ole lubatud

  • Printimine:

    ei ole lubatud

  • Kasutamine:

    Digitaalõiguste kaitse (DRM)
    Kirjastus on väljastanud selle e-raamatu krüpteeritud kujul, mis tähendab, et selle lugemiseks peate installeerima spetsiaalse tarkvara. Samuti peate looma endale  Adobe ID Rohkem infot siin. E-raamatut saab lugeda 1 kasutaja ning alla laadida kuni 6'de seadmesse (kõik autoriseeritud sama Adobe ID-ga).

    Vajalik tarkvara
    Mobiilsetes seadmetes (telefon või tahvelarvuti) lugemiseks peate installeerima selle tasuta rakenduse: PocketBook Reader (iOS / Android)

    PC või Mac seadmes lugemiseks peate installima Adobe Digital Editionsi (Seeon tasuta rakendus spetsiaalselt e-raamatute lugemiseks. Seda ei tohi segamini ajada Adober Reader'iga, mis tõenäoliselt on juba teie arvutisse installeeritud )

    Seda e-raamatut ei saa lugeda Amazon Kindle's. 

Now in its second, updated edition, this authoritative and coherent text contains a rich blend of theory and practice and covers all the essential concepts and algorithms from relevant fields such as data mining, machine learning, and text processing.



Web mining aims to discover useful information and knowledge from Web hyperlinks, page contents, and usage data. Although Web mining uses many conventional data mining techniques, it is not purely an application of traditional data mining due to the semi-structured and unstructured nature of the Web data. The field has also developed many of its own algorithms and techniques.

Liu has written a comprehensive text on Web mining, which consists of two parts. The first part covers the data mining and machine learning foundations, where all the essential concepts and algorithms of data mining and machine learning are presented. The second part covers the key topics of Web mining, where Web crawling, search, social network analysis, structured data extraction, information integration, opinion mining and sentiment analysis, Web usage mining, query log mining, computational advertising, and recommender systems are all treated both in breadth and in depth. His book thus brings all the related concepts and algorithms together to form an authoritative and coherent text.

The book offers a rich blend of theory and practice. It is suitable for students, researchers and practitioners interested in Web mining and data mining both as a learning text and as a reference book. Professors can readily use it for classes on data mining, Web mining, and text mining. Additional teaching materials such as lecture slides, datasets, and implemented algorithms are available online.

Arvustused

From the reviews:

"This is a textbook about data mining and its application to the Web. [ ] Liu succeeds in helping readers appreciate the key role that data mining and machine learning play in Web applications. [ ] It also motivates the student by adding immediacy and relevance to the concepts and algorithms described. I liked the way the concepts are introduced in a stepwise manner. [ ] I also appreciated the bibliographical notes at the end of each chapter." ACM Computing Reviews, W. Hu, , January 2009

From the reviews of the second edition:

Liu (Univ. of Illinois, Chicago) discusses all three types of Web mining--structure, content, and usage--in the technologys efforts to glean information from hyperlinks, Web page content, and usage logs. [ ] Practical examples complement the discussions throughout the text, and each chapter includes useful Bibliographic Notes and an extensive bibliography. [ ] Liu states that his intended audience includes bothundergraduate and graduate students, but notes that researchers and Web programmers could benefit from this text as well. Summing Up: Recommended. Upper-division undergraduates through professionals. J. Johnson, Choice, Vol. 49 (5), January 2012

"[ ...] Liu's book provides a comprehensive, self-contained introduction to the major data mining techniques and their use in Web data mining. [ ...] Professionals and researchers alike will find this excellent book handy as a reference. Its extensive lists of references at the end of each chapter provide hundreds of pointers for further reading. As a textbook, it is also suitable for advanced undergraduate and graduate courses on Web mining; it is highly selfcontained and includes many easy-to-understand examples that will help readers grasp the key ideas behind current Web data mining techniques." ACM Computing Reviews, Fernando Berzal, February 2012

1 Introduction
1(16)
1.1 What is the World Wide Web?
1(1)
1.2 A Brief History of the Web and the Internet
2(2)
1.3 Web Data Mining
4(4)
1.3.1 What is Data Mining?
6(1)
1.3.2 What is Web Mining?
7(1)
1.4 Summary of
Chapters
8(3)
1.5 How to Read this Book
11(1)
Bibliographic Notes
12(1)
Bibliography
13(4)
Part I Data Mining Foundations
2 Association Rules and Sequential Patterns
17(46)
2.1 Basic Concepts of Association Rules
17(3)
2.2 Apriori Algorithm
20(6)
2.2.1 Frequent Itemset Generation
20(4)
2.2.2 Association Rule Generation
24(2)
2.3 Data Formats for Association Rule Mining
26(1)
2.4 Mining with Multiple Minimum Supports
26(10)
2.4.1 Extended Model
28(2)
2.4.2 Mining Algorithm
30(5)
2.4.3 Rule Generation
35(1)
2.5 Mining Class Association Rules
36(5)
2.5.1 Problem Definition
36(2)
2.5.2 Mining Algorithm
38(3)
2.5.3 Mining with Multiple Minimum Supports
41(1)
2.6 Basic Concepts of Sequential Patterns
41(2)
2.7 Mining Sequential Patterns Based on GSP
43(6)
2.7.1 GSP Algorithm
43(2)
2.7.2 Mining with Multiple Minimum Supports
45(4)
2.8 Mining Sequential Patterns Based on PrefixSpan
49(4)
2.8.1 PrefixSpan Algorithm
50(2)
2.8.2 Mining with Multiple Minimum Supports
52(1)
2.9 Generating Rules from Sequential Patterns
53(3)
2.9.1 Sequential Rules
54(1)
2.9.2 Label Sequential Rules
54(1)
2.9.3 Class Sequential Rules
55(1)
Bibliographic Notes
56(2)
Bibliography
58(5)
3 Supervised Learning
63(70)
3.1 Basic Concepts
63(4)
3.2 Decision Tree Induction
67(12)
3.2.1 Learning Algorithm
70(1)
3.2.2 Impurity Function
71(4)
3.2.3 Handling of Continuous Attributes
75(1)
3.2.4 Some Other Issues
76(3)
3.3 Classifier Evaluation
79(8)
3.3.1 Evaluation Methods
79(2)
3.3.2 Precision, Recall, F-score and Breakeven Point
81(2)
3.3.3 Receiver Operating Characteristic Curve
83(3)
3.3.4 Lift Curve
86(1)
3.4 Rule Induction
87(6)
3.4.1 Sequential Covering
87(3)
3.4.2 Rule Learning: Learn-One-Rule Function
90(3)
3.4.3 Discussion
93(1)
3.5 Classification Based on Associations
93(7)
3.5.1 Classification Using Class Association Rules
94(4)
3.5.2 Class Association Rules as Features
98(1)
3.5.3 Classification Using Normal Association Rules
99(1)
3.6 Naive Bayesian Classification
100(3)
3.7 Naive Bayesian Text Classification
103(6)
3.7.1 Probabilistic Framework
104(1)
3.7.2 Naive Bayesian Model
105(3)
3.7.3 Discussion
108(1)
3.8 Support Vector Machines
109(15)
3.8.1 Linear SVM: Separable Case
111(6)
3.8.2 Linear SVM: Non-Separable Case
117(3)
3.8.3 Nonlinear SVM: Kernel Functions
120(4)
3.9 K-Nearest Neighbor Learning
124(2)
3.10 Ensemble of Classifiers
126(2)
3.10.1 Bagging
126(1)
3.10.2 Boosting
126(2)
Bibliographic Notes
128(1)
Bibliography
129(4)
4 Unsupervised Learning
133(38)
4.1 Basic Concepts
133(3)
4.2 K-means Clustering
136(8)
4.2.1 K-means Algorithm
136(3)
4.2.2 Disk Version of the K-means Algorithm
139(1)
4.2.3 Strengths and Weaknesses
140(4)
4.3 Representation of Clusters
144(3)
4.3.1 Common Ways of Representing Clusters
145(1)
4.3.2 Clusters of Arbitrary Shapes
146(1)
4.4 Hierarchical Clustering
147(4)
4.4.1 Single-Link Method
149(1)
4.4.2 Complete-Link Method
149(1)
4.4.3 Average-Link Method
150(1)
4.4.4 Strengths and Weaknesses
150(1)
4.5 Distance Functions
151(4)
4.5.1 Numeric Attributes
151(1)
4.5.2 Binary and Nominal Attributes
152(2)
4.5.3 Text Documents
154(1)
4.6 Data Standardization
155(2)
4.7 Handling of Mixed Attributes
157(2)
4.8 Which Clustering Algorithm to Use?
159(1)
4.9 Cluster Evaluation
159(3)
4.10 Discovering Holes and Data Regions
162(3)
Bibliographic Notes
165(1)
Bibliography
166(5)
5 Partially Supervised Learning
171(40)
5.1 Learning from Labeled and Unlabeled Examples
171(13)
5.1.1 EM Algorithm with Naive Bayesian Classification
173(3)
5.1.2 Co-Training
176(2)
5.1.3 Self-Training
178(1)
5.1.4 Transductive Support Vector Machines
179(1)
5.1.5 Graph-Based Methods
180(3)
5.1.6 Discussion
183(1)
5.2 Learning from Positive and Unlabeled Examples
184(18)
5.2.1 Applications of PU Learning
185(2)
5.2.2 Theoretical Foundation
187(3)
5.2.3 Building Classifiers: Two-Step Approach
190(7)
5.2.4 Building Classifiers: Biased-SVM
197(2)
5.2.5 Building Classifiers: Probability Estimation
199(2)
5.2.6 Discussion
201(1)
Appendix: Derivation of EM for Naive Bayesian Classification
202(2)
Bibliographic Notes
204(2)
Bibliography
206(5)
Part II Web Mining
6 Information Retrieval and Web Search
211(58)
6.1 Basic Concepts of Information Retrieval
212(3)
6.2 Information Retrieval Models
215(5)
6.2.1 Boolean Model
216(1)
6.2.2 Vector Space Model
217(2)
6.2.3 Statistical Language Model
219(1)
6.3 Relevance Feedback
220(3)
6.4 Evaluation Measures
223(4)
6.5 Text and Web Page Pre-Processing
227(5)
6.5.1 Stopword Removal
227(1)
6.5.2 Stemming
228(1)
6.5.3 Other Pre-Processing Tasks for Text
228(1)
6.5.4 Web Page Pre-Processing
229(2)
6.5.5 Duplicate Detection
231(1)
6.6 Inverted Index and Its Compression
232(10)
6.6.1 Inverted Index
232(2)
6.6.2 Search Using an Inverted Index
234(1)
6.6.3 Index Construction
235(1)
6.6.4 Index Compression
236(6)
6.7 Latent Semantic Indexing
242(7)
6.7.1 Singular Value Decomposition
243(2)
6.7.2 Query and Retrieval
245(1)
6.7.3 An Example
246(3)
6.7.4 Discussion
249(1)
6.8 Web Search
249(3)
6.9 Meta-Search: Combining Multiple Rankings
252(5)
6.9.1 Combination Using Similarity Scores
254(1)
6.9.2 Combination Using Rank Positions
255(2)
6.10 Web Spamming
257(6)
6.10.1 Content Spamming
258(1)
6.10.2 Link Spamming
259(1)
6.10.3 Hiding Techniques
260(1)
6.10.4 Combating Spam
261(2)
Bibliographic Notes
263(1)
Bibliography
264(5)
7 Social Network Analysis
269(42)
7.1 Social Network Analysis
270(5)
7.1.1 Centrality
270(3)
7.1.2 Prestige
273(2)
7.2 Co-Citation and Bibliographic Coupling
275(2)
7.2.1 Co-Citation
276(1)
7.2.2 Bibliographic Coupling
277(1)
7.3 PageRank
277(11)
7.3.1 PageRank Algorithm
278(7)
7.3.2 Strengths and Weaknesses of PageRank
285(1)
7.3.3 Timed PageRank and Recency Search
286(2)
7.4 HITS
288(6)
7.4.1 HITS Algorithm
289(2)
7.4.2 Finding Other Eigenvectors
291(1)
7.4.3 Relationships with Co-Citation and Bibliographic Coupling
292(1)
7.4.4 Strengths and Weaknesses of HITS
293(1)
7.5 Community Discovery
294(10)
7.5.1 Problem Definition
295(2)
7.5.2 Bipartite Core Communities
297(1)
7.5.3 Maximum Flow Communities
298(3)
7.5.4 Email Communities Based on Betweenness
301(2)
7.5.5 Overlapping Communities of Named Entities
303(1)
Bibliographic Notes
304(1)
Bibliography
305(6)
8 Web Crawling
311(52)
8.1 A Basic Crawler Algorithm
312(3)
8.1.1 Breadth-First Crawlers
313(1)
8.1.2 Preferential Crawlers
314(1)
8.2 Implementation Issues
315(8)
8.2.1 Fetching
315(1)
8.2.2 Parsing
316(2)
8.2.3 Stopword Removal and Stemming
318(1)
8.2.4 Link Extraction and Canonicalization
318(2)
8.2.5 Spider Traps
320(1)
8.2.6 Page Repository
321(1)
8.2.7 Concurrency
322(1)
8.3 Universal Crawlers
323(4)
8.3.1 Scalability
324(2)
8.3.2 Coverage vs. Freshness vs. Importance
326(1)
8.4 Focused Crawlers
327(3)
8.5 Topical Crawlers
330(18)
8.5.1 Topical Locality and Cues
332(6)
8.5.2 Best-First Variations
338(3)
8.5.3 Adaptation
341(7)
8.6 Evaluation
348(5)
8.7 Crawler Ethics and Conflicts
353(3)
8.8 Some New Developments
356(2)
Bibliographic Notes
358(1)
Bibliography
359(4)
9 Structured Data Extraction: Wrapper Generation
363(62)
9.1 Preliminaries
364(6)
9.1.1 Two Types of Data Rich Pages
364(2)
9.1.2 Data Model
366(2)
9.1.3 HTML Mark-Up Encoding of Data Instances
368(2)
9.2 Wrapper Induction
370(8)
9.2.1 Extraction from a Page
370(3)
9.2.2 Learning Extraction Rules
373(4)
9.2.3 Identifying Informative Examples
377(1)
9.2.4 Wrapper Maintenance
378(1)
9.3 Instance-Based Wrapper Learning
378(3)
9.4 Automatic Wrapper Generation: Problems
381(3)
9.4.1 Two Extraction Problems
382(1)
9.4.2 Patterns as Regular Expressions
383(1)
9.5 String Matching and Tree Matching
384(6)
9.5.1 String Edit Distance
384(2)
9.5.2 Tree Matching
386(4)
9.6 Multiple Alignment
390(6)
9.6.1 Center Star Method
390(1)
9.6.2 Partial Tree Alignment
391(5)
9.7 Building DOM Trees
396(1)
9.8 Extraction Based on a Single List Page: Flat Data Records
397(10)
9.8.1 Two Observations about Data Records
398(1)
9.8.2 Mining Data Regions
399(5)
9.8.3 Identifying Data Records in Data Regions
404(1)
9.8.4 Data Item Alignment and Extraction
405(1)
9.8.5 Making Use of Visual Information
406(1)
9.8.6 Some Other Techniques
406(1)
9.9 Extraction Based on a Single List Page: Nested Data Records
407(6)
9.10 Extraction Based on Multiple Pages
413(2)
9.10.1 Using Techniques in Previous Sections
413(1)
9.10.2 RoadRunner Algorithm
414(1)
9.11 Some Other Issues
415(4)
9.11.1 Extraction from Other Pages
416(1)
9.11.2 Disjunction or Optional
416(1)
9.11.3 A Set Type or a Tuple Type
417(1)
9.11.4 Labeling and Integration
418(1)
9.11.5 Domain Specific Extraction
418(1)
9.12 Discussion
419(1)
Bibliographic Notes
419(2)
Bibliography
421(4)
10 Information Integration
425(34)
10.1 Introduction to Schema Matching
426(2)
10.2 Pre-Processing for Schema Matching
428(1)
10.3 Schema-Level Matching
429(2)
10.3.1 Linguistic Approaches
429(1)
10.3.2 Constraint Based Approaches
430(1)
10.4 Domain and Instance-Level Matching
431(3)
10.5 Combining Similarities
434(1)
10.6 1:m Match
435(1)
10.7 Some Other Issues
436(2)
10.7.1 Reuse of Previous Match Results
436(1)
10.7.2 Matching a Large Number of Schemas
437(1)
10.7.3 Schema Match Results
437(1)
10.7.4 User Interactions
438(1)
10.8 Integration of Web Query Interfaces
438(12)
10.8.1 A Clustering Based Approach
441(3)
10.8.2 A Correlation Based Approach
444(3)
10.8.3 An Instance Based Approach
447(3)
10.9 Constructing a Unified Global Query Interface
450(4)
10.9.1 Structural Appropriateness and the Merge Algorithm
451(2)
10.9.2 Lexical Appropriateness
453(1)
10.9.3 Instance Appropriateness
454(1)
Bibliographic Notes
454(1)
Bibliography
455(4)
11 Opinion Mining and Sentiment Analysis
459(68)
1.1.1 The Problem of Opinion Mining
460(1)
11.1.1 Problem Definitions
460(7)
11.1.2 Aspect-Based Opinion Summary
467(2)
11.2 Document Sentiment Classification
469(5)
11.2.1 Classification Based on Supervised Learning
470(2)
11.2.2 Classification Based on Unsupervised Learning
472(2)
11.3 Sentence Subjectivity and Sentiment Classification
474(3)
11.4 Opinion Lexicon Expansion
477(3)
11.5 Aspect-Based Opinion Mining
480(13)
11.5.1 Aspect Sentiment Classification
481(2)
11.5.2 Basic Rules of Opinions
483(3)
11.5.3 Aspect Extraction
486(4)
11.5.4 Simultaneous Opinion Lexicon Expansion and Aspect Extraction
490(3)
11.6 Mining Comparative Opinions
493(5)
11.6.1 Problem Definitions
493(2)
11.6.2 Identification of Comparative Sentences
495(1)
11.6.3 Identification of Preferred Entities
496(2)
11.7 Some Other Problems
498(5)
11.8 Opinion Search and Retrieval
503(3)
11.9 Opinion Spam Detection
506(8)
11.9.1 Types of Spam and Spammers
506(2)
11.9.2 Hiding Techniques
508(1)
11.9.3 Spam Detection Based on Supervised Learning
509(2)
11.9.4 Spam Detection Based on Abnormal Behaviors
511(2)
11.9.5 Group Spam Detection
513(1)
11.10 Utility of Reviews
514(1)
Bibliographic Notes
515(2)
Bibliography
517(10)
12 Web Usage Mining
527(78)
12.1 Data Collection and Pre-Processing
528(12)
12.1.1 Sources and Types of Data
530(3)
12.1.2 Key Elements of Web Usage Data Pre-Processing
533(7)
12.2 Data Modeling for Web Usage Mining
540(4)
12.3 Discovery and Analysis of Web Usage Patterns
544(11)
12.3.1 Session and Visitor Analysis
544(1)
12.3.2 Cluster Analysis and Visitor Segmentation
545(4)
12.3.3 Association and Correlation Analysis
549(1)
12.3.4 Analysis of Sequential and Navigational Patterns
550(4)
12.3:5 Classification and Prediction based on Web User Transactions
554(1)
12.4 Recommender Systems and Collaborative Filtering
555(16)
12.4.1 The Recommendation Problem
556(1)
12.4.2 Content-Based Recommendation
557(2)
12.4.3 Collaborative Filtering: K-Nearest Neighbor (KNN)
559(2)
12.4.4 Collaborative Filtering: Using Association Rules
561(4)
12.4.5 Collaborative Filtering: Matrix Factorization
565(6)
12.5 Query Log Mining
571(18)
12.5.1 Data Sources, Characteristics, and Challenges
573(1)
12.5.2 Query Log Data Preparation
574(3)
12.5.3 Query Log Data Models
577(5)
12.5.4 Query Log Feature Extraction
582(1)
12.5.5 Query Log Mining Applications
583(3)
12.5.6 Query Log Mining Methods
586(3)
12.6 Computational Advertising
589(4)
12.7 Discussion and Outlook
593(1)
Bibliographic Notes
593(1)
Bibliography
594(11)
Subject Index 605
Bing Liu is a professor of Computer Science at the University of Illinois at Chicago (UIC). He received his PhD in Artificial Intelligence from the University of Edinburgh. Before joining UIC, he was with the National University of Singapore. His current research interests include opinion mining and sentiment analysis, text and Web mining, data mining, and machine learning. He has published extensively in top journals and conferences in these fields. Several of his publications are considered seminal papers of the fields and are highly cited. He has also given more than 30 keynote and invited talks in academia and in industry. On professional services, Liu has served as associate editors of IEEE Transactions on Knowledge and Data Engineering (TKDE), Journal of Data Mining and Knowledge Discovery (DMKD), and SIGKDD Explorations, and is on the editorial boards of several other journals. He has also served as program chairs of IEEE International Conference on Data Mining (ICDM-2010), ACM Conference on Web Search and Data Mining (WSDM-2010), ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD-2008), SIAM Conference on Data Mining (SDM-2007), ACM Conference on Information and Knowledge Management (CIKM-2006), and Pacific Asia Conference on Data Mining (PAKDD-2002). Additionally, Liu has served extensively as area chairs and program committee members of leading conferences on data mining, Web mining, natural language processing, and machine learning. More information about him can be found from http://www.cs.uic.edu/~liub.