Preface |
|
xiii | |
Acknowledgment |
|
xv | |
|
|
xvii | |
|
|
xxi | |
|
|
xxiii | |
|
|
xxv | |
|
|
1 | (36) |
|
1.1 Basic Notion of Data Mining |
|
|
2 | (1) |
|
1.2 Knowledge Discovery: The Very Rationale Behind Data Mining |
|
|
2 | (2) |
|
1.3 Challenges in the Development of Data Mining |
|
|
4 | (2) |
|
|
4 | (1) |
|
1.3.2 High Dimensionality |
|
|
4 | (1) |
|
1.3.3 Heterogeneous and Complex Data |
|
|
5 | (1) |
|
1.3.4 Data Ownership and Distribution |
|
|
5 | (1) |
|
1.3.5 Non-Traditional Analysis |
|
|
5 | (1) |
|
1.4 Importance of Data Mining |
|
|
6 | (2) |
|
1.5 Classification of Data Mining Systems |
|
|
8 | (2) |
|
1.5.1 The Databases Mined |
|
|
9 | (1) |
|
1.5.2 The Knowledge Mined |
|
|
10 | (1) |
|
1.5.3 The Techniques Utilized |
|
|
10 | (1) |
|
1.5.4 The Application Adopted |
|
|
10 | (1) |
|
1.6 Generic Architecture of Data Mining System |
|
|
10 | (2) |
|
1.7 Major Issues in Data Mining |
|
|
12 | (2) |
|
1.7.1 Mining Methodology and User Interaction Issues |
|
|
12 | (1) |
|
|
13 | (1) |
|
1.7.3 Issues Relating to the Diversity of Database Types |
|
|
14 | (1) |
|
1.8 Data Mining Strategies |
|
|
14 | (4) |
|
|
15 | (1) |
|
|
16 | (1) |
|
|
17 | (1) |
|
1.8.3.1 k-Means algorithm |
|
|
17 | (1) |
|
|
18 | (1) |
|
1.9 Data Mining: Ever Increasing Range of Applications |
|
|
18 | (7) |
|
|
18 | (1) |
|
|
18 | (2) |
|
1.9.3 Science and Engineering |
|
|
20 | (1) |
|
|
21 | (1) |
|
1.9.5 Medical Data Mining |
|
|
21 | (1) |
|
1.9.6 Spatial Data Mining |
|
|
22 | (1) |
|
1.9.7 Challenges in Spatial Mining |
|
|
22 | (1) |
|
1.9.8 Temporal Data Mining |
|
|
23 | (1) |
|
|
23 | (1) |
|
1.9.10 Visual Data Mining |
|
|
24 | (1) |
|
|
24 | (1) |
|
|
24 | (1) |
|
1.9.13 Subject-based Data Mining |
|
|
25 | (1) |
|
|
25 | (1) |
|
1.10 Trends in Data Mining |
|
|
25 | (3) |
|
1.10.1 Application Exploration |
|
|
25 | (1) |
|
1.10.2 Scalable and Interactive Data Mining Methods |
|
|
26 | (1) |
|
1.10.3 Integration of Data Mining with Database Systems, Data Warehouse Systems, and Web Database Systems |
|
|
26 | (1) |
|
1.10.4 Standardization of Data Mining Query Language |
|
|
26 | (1) |
|
1.10.5 Visual Data Mining |
|
|
26 | (1) |
|
1.10.6 New Methods for Mining Complex Types of Data |
|
|
26 | (1) |
|
1.10.7 Biological Data Mining |
|
|
27 | (1) |
|
1.10.8 Data Mining and Software Engineering |
|
|
27 | (1) |
|
|
27 | (1) |
|
1.10.10 Distributed Data Mining |
|
|
27 | (1) |
|
1.10.11 Real-Time Data Mining |
|
|
27 | (1) |
|
1.10.12 Multi-Database Data Mining |
|
|
28 | (1) |
|
1.10.13 Privacy Protection and Information Security in Data Mining |
|
|
28 | (1) |
|
1.11 Classification Techniques in Data Mining |
|
|
28 | (3) |
|
1.11.1 Definition of the Classification |
|
|
29 | (1) |
|
1.11.2 Issues Regarding Classification |
|
|
29 | (1) |
|
1.11.3 Evaluation Methods for Classification |
|
|
29 | (1) |
|
1.11.4 Classifications Techniques |
|
|
30 | (1) |
|
|
30 | (1) |
|
1.11.4.2 Rule-based algorithm |
|
|
31 | (1) |
|
1.11.4.3 Distance-based algorithms |
|
|
31 | (1) |
|
1.11.4.4 Neural networks-based algorithms |
|
|
31 | (1) |
|
1.11.4.5 Statistical-based algorithms |
|
|
31 | (1) |
|
1.12 Applications of Classifications |
|
|
31 | (1) |
|
|
31 | (1) |
|
|
32 | (1) |
|
1.12.3 Supervised Event Detection |
|
|
32 | (1) |
|
1.12.4 Multimedia Data Analysis |
|
|
32 | (1) |
|
1.12.5 Biological Data Analysis |
|
|
32 | (1) |
|
1.12.6 Document Categorization and Filtering |
|
|
32 | (1) |
|
1.12.7 Social Network Analysis |
|
|
32 | (1) |
|
1.13 WEKA: An Effective Tool for Data Mining |
|
|
32 | (3) |
|
1.13.1 Main Features of the Weka |
|
|
33 | (1) |
|
|
33 | (1) |
|
1.13.3 Weka for Classification |
|
|
34 | (1) |
|
1.13.3.1 Selecting a classifier |
|
|
34 | (1) |
|
|
34 | (1) |
|
1.14 What We Aim to Cover Through the Present Book |
|
|
35 | (2) |
|
2 Current Literature Assessment in Data and Web Mining |
|
|
37 | (18) |
|
2.1 Big Data and Its Mining |
|
|
37 | (1) |
|
2.2 Data-Processing Basics |
|
|
38 | (1) |
|
|
38 | (2) |
|
|
40 | (1) |
|
2.5 Algorithms Used in Data Mining |
|
|
41 | (2) |
|
2.6 Classification and Mining |
|
|
43 | (1) |
|
2.7 Performance Metrics of Classification/Mining |
|
|
43 | (2) |
|
|
45 | (1) |
|
2.9 Categories of Web Data Mining |
|
|
45 | (2) |
|
2.10 Radial Basis Function Networks |
|
|
47 | (1) |
|
|
48 | (1) |
|
|
49 | (1) |
|
2.13 Support Vector Machine (SVM) |
|
|
49 | (1) |
|
2.14 Conclusion and Way Forward |
|
|
49 | (6) |
|
3 DataSet Creation for Web Mining |
|
|
55 | (34) |
|
|
56 | (1) |
|
3.2 Web Mining---Emerging Model of Business |
|
|
56 | (3) |
|
3.2.1 Introduction to Web Mining |
|
|
56 | (3) |
|
3.3 Tools Used for Acquisition of Parameters |
|
|
59 | (17) |
|
|
63 | (3) |
|
|
66 | (2) |
|
|
68 | (2) |
|
|
70 | (2) |
|
|
72 | (4) |
|
3.4 Difficulties Encountered |
|
|
76 | (1) |
|
|
76 | (1) |
|
3.4.2 Preparation and Selection of Websites |
|
|
76 | (1) |
|
3.4.3 Difficulty in Selecting Analysis Tool |
|
|
76 | (1) |
|
3.4.4 Unavailability of Data |
|
|
76 | (1) |
|
|
77 | (1) |
|
|
78 | (10) |
|
|
78 | (1) |
|
3.6.1.1 Data Preprocessing Techniques |
|
|
79 | (1) |
|
3.6.2 Preprocessing and Filtering |
|
|
80 | (1) |
|
3.6.2.1 Preprocessed and Filtered Overall Data |
|
|
80 | (1) |
|
3.6.2.2 Preprocessed and Filtered Web Accessibility Data |
|
|
80 | (1) |
|
3.6.2.3 Preprocessed and Filtered Design Data |
|
|
80 | (2) |
|
3.6.2.4 Preprocessed and Filtered Texts Data |
|
|
82 | (2) |
|
3.6.2.5 Preprocessed and Filtered Multimedia Data |
|
|
84 | (1) |
|
3.6.2.6 Preprocessed and Filtered Networking Data |
|
|
84 | (4) |
|
|
88 | (1) |
|
4 Classification of Websites |
|
|
89 | (110) |
|
|
89 | (4) |
|
|
90 | (1) |
|
|
90 | (1) |
|
|
91 | (1) |
|
|
92 | (1) |
|
|
92 | (1) |
|
4.2 Classification of Websites on Accessibility |
|
|
93 | (13) |
|
|
93 | (1) |
|
|
93 | (2) |
|
4.2.3 Clustered Instances |
|
|
95 | (1) |
|
4.2.4 Classification Via Clustering |
|
|
95 | (1) |
|
4.2.4.1 Classification via clustering using J48 algorithm |
|
|
96 | (2) |
|
4.2.4.2 Classification via clustering using RBFNetwork algorithm |
|
|
98 | (3) |
|
4.2.4.3 Classification via clustering using NaiveBayes algorithm |
|
|
101 | (2) |
|
4.2.4.4 Classification via clustering using SMO algorithm |
|
|
103 | (3) |
|
4.2.4.5 Comparison of above classification algorithms |
|
|
106 | (1) |
|
4.3 Classification Based on Website Design |
|
|
106 | (19) |
|
4.3.1 Attribute Selection |
|
|
109 | (1) |
|
|
109 | (3) |
|
|
112 | (1) |
|
4.3.4 Classification Through Clustering |
|
|
113 | (1) |
|
4.3.4.1 Classification via clustering using J48 algorithm |
|
|
113 | (3) |
|
4.3.4.2 Classification via clustering using RBFNetwork algorithm |
|
|
116 | (1) |
|
4.3.4.3 Classification via clustering using NaiveBayes algorithm |
|
|
117 | (2) |
|
4.3.4.4 Classification via clustering using SMO algorithm |
|
|
119 | (4) |
|
4.3.4.5 Comparison of above classification algorithms |
|
|
123 | (2) |
|
4.4 Classification Based on Text |
|
|
125 | (17) |
|
|
125 | (2) |
|
|
127 | (2) |
|
|
129 | (1) |
|
4.4.4 Classification Through Clustering |
|
|
129 | (1) |
|
4.4.4.1 Classification via clustering using J48 algorithm |
|
|
130 | (3) |
|
4.4.4.2 Classification via clustering using RBFNetwork algorithm |
|
|
133 | (2) |
|
4.4.4.3 Classification via clustering using NaiveBayes algorithm |
|
|
135 | (2) |
|
4.4.4.4 Classification via clustering using SMO algorithm |
|
|
137 | (3) |
|
4.4.4.5 Comparison of above classification algorithms |
|
|
140 | (2) |
|
4.5 Classification Based on Multimedia Content of Websites |
|
|
142 | (16) |
|
|
143 | (1) |
|
|
143 | (3) |
|
|
146 | (1) |
|
4.5.4 Classification Through Clustering |
|
|
147 | (1) |
|
4.5.4.1 Classification via clustering using J48 algorithm |
|
|
147 | (3) |
|
4.5.4.2 Classification via clustering using RBFNetwork algorithm |
|
|
150 | (2) |
|
4.5.4.3 Classification via clustering using NaiveBayes algorithm |
|
|
152 | (2) |
|
4.5.4.4 Classification via clustering using SMO algorithm |
|
|
154 | (2) |
|
4.5.4.5 Comparison of above classification algorithm |
|
|
156 | (2) |
|
4.6 Classification Based on Network Analysis of Webpage |
|
|
158 | (17) |
|
|
159 | (2) |
|
|
161 | (2) |
|
|
163 | (2) |
|
4.6.4 Classification Through Clustering |
|
|
165 | (1) |
|
4.6.4.1 Classification via clustering using J48 algorithm |
|
|
165 | (3) |
|
4.6.4.2 Classification via clustering using RBFNetwork algorithm |
|
|
168 | (1) |
|
4.6.4.3 Classification via clustering using NaiveBayes algorithm |
|
|
168 | (4) |
|
4.6.4.4 Classification via clustering using SMO algorithm |
|
|
172 | (2) |
|
4.6.4.5 Comparison of the above classification algorithm |
|
|
174 | (1) |
|
4.7 Classification of Websites Using Overall Performance |
|
|
175 | (17) |
|
|
176 | (2) |
|
|
178 | (1) |
|
4.7.3 Classification Via Clustering |
|
|
179 | (1) |
|
4.7.3.1 Classification via clustering using J48 algorithm |
|
|
179 | (4) |
|
4.7.3.2 Classification via clustering using RBFNetwork algorithm |
|
|
183 | (3) |
|
4.7.3.3 Classification via clustering using NaiveBayes algorithm |
|
|
186 | (2) |
|
4.7.3.4 Classification via clustering using SMO algorithm |
|
|
188 | (2) |
|
4.7.3.5 Comparison of the above classification algorithms |
|
|
190 | (2) |
|
4.8 Results at a Glance and Conclusion |
|
|
192 | (3) |
|
4.9 Summary and Future Directions |
|
|
195 | (4) |
Index |
|
199 | (4) |
About the Authors |
|
203 | |