Preface |
|
xi | |
Authors |
|
xiii | |
|
Part I Introduction to Data Science |
|
|
|
1 Importance of Data Science |
|
|
3 | (14) |
|
1.1 Need for Data Science |
|
|
3 | (4) |
|
1.2 What Is Data Science? |
|
|
7 | (2) |
|
|
9 | (1) |
|
1.4 Business Intelligence and Data Science |
|
|
10 | (1) |
|
1.5 Prerequisites for a Data Scientist |
|
|
11 | (1) |
|
1.6 Components of Data Science |
|
|
11 | (1) |
|
1.7 Tools and Skills Needed |
|
|
12 | (1) |
|
|
13 | (2) |
|
|
15 | (2) |
|
2 Statistics and Probability |
|
|
17 | (28) |
|
|
17 | (1) |
|
|
18 | (1) |
|
|
19 | (3) |
|
2.4 Sampling Techniques and Probability |
|
|
22 | (2) |
|
2.5 Information Gain and Entropy |
|
|
24 | (7) |
|
|
31 | (2) |
|
|
33 | (3) |
|
2.8 Probability Distribution Functions |
|
|
36 | (2) |
|
|
38 | (1) |
|
2.10 Inferential Statistics |
|
|
39 | (4) |
|
|
43 | (1) |
|
|
44 | (1) |
|
3 Databases for Data Science |
|
|
45 | (42) |
|
3.1 SQL - Tool for Data Science |
|
|
45 | (32) |
|
3.1.1 Basic Statistics with SQL |
|
|
45 | (2) |
|
3.1.2 Data Munging with SQL |
|
|
47 | (1) |
|
3.1.3 Filtering, Joins, and Aggregation |
|
|
48 | (9) |
|
3.1.4 Window Functions and Ordered Data |
|
|
57 | (15) |
|
3.1.5 Preparing Data for Analytics Tool |
|
|
72 | (5) |
|
3.2 Advanced NoSQL for Data Science |
|
|
77 | (2) |
|
|
77 | (1) |
|
3.2.2 Document Databases for Data Science |
|
|
77 | (1) |
|
3.2.3 Wide-Column Databases for Data Science |
|
|
78 | (1) |
|
3.2.4 Graph Databases for Data Science |
|
|
79 | (1) |
|
|
79 | (5) |
|
|
84 | (3) |
|
Part II Data Modeling and Analytics |
|
|
|
4 Data Science Methodology |
|
|
87 | (16) |
|
4.1 Analytics for Data Science |
|
|
87 | (2) |
|
4.2 Examples of Data Analytics |
|
|
89 | (1) |
|
4.3 Data Analytics Life Cycle |
|
|
90 | (9) |
|
|
91 | (1) |
|
|
91 | (3) |
|
|
94 | (2) |
|
|
96 | (2) |
|
4.3.5 Communicate Results |
|
|
98 | (1) |
|
|
99 | (1) |
|
|
99 | (1) |
|
|
100 | (3) |
|
5 Data Science Methods and Machine Learning |
|
|
103 | (26) |
|
|
103 | (11) |
|
|
103 | (6) |
|
5.1.2 Logistic Regression |
|
|
109 | (2) |
|
5.1.3 Multinomial Logistic Regression |
|
|
111 | (2) |
|
|
113 | (1) |
|
|
114 | (12) |
|
|
114 | (2) |
|
|
116 | (1) |
|
5.2.3 Support Vector Machines |
|
|
117 | (2) |
|
5.2.4 Nearest Neighbor learning |
|
|
119 | (1) |
|
|
120 | (2) |
|
|
122 | (4) |
|
|
126 | (1) |
|
|
126 | (3) |
|
6 Data Analytics and Text Mining |
|
|
129 | (18) |
|
|
129 | (6) |
|
6.1.1 Major Text Mining Areas |
|
|
130 | (1) |
|
6.1.1.1 Information Retrieval |
|
|
131 | (1) |
|
|
131 | (1) |
|
6.1.1.3 Natural Language Processing (NLP) |
|
|
131 | (4) |
|
|
135 | (3) |
|
6.2.1 Text Analysis Subtasks |
|
|
135 | (1) |
|
6.2.1.1 Cleaning and Parsing |
|
|
135 | (1) |
|
6.2.1.2 Searching and Retrieval |
|
|
136 | (1) |
|
|
136 | (1) |
|
6.2.1.4 Part-of-Speech Tagging |
|
|
136 | (1) |
|
|
136 | (1) |
|
|
137 | (1) |
|
6.2.2 Basic Text Analysis Steps |
|
|
137 | (1) |
|
6.3 Introduction to Natural Language Processing |
|
|
138 | (4) |
|
6.3.1 Major Components of NLP |
|
|
139 | (1) |
|
|
140 | (1) |
|
6.3.3 Statistical Processing of Natural Language |
|
|
141 | (1) |
|
6.3.3.1 Document Preprocessing |
|
|
141 | (1) |
|
|
141 | (1) |
|
6.3.4 Applications of NLP |
|
|
141 | (1) |
|
|
142 | (1) |
|
|
142 | (5) |
|
Part III Platforms for Data Science |
|
|
|
7 Data Science Tool: Python |
|
|
147 | (40) |
|
7.1 Basics of Python for Data Science |
|
|
147 | (6) |
|
7.2 Python Libraries: DataFrame Manipulation with pandas and NumPy |
|
|
153 | (6) |
|
7.3 Exploration Data Analysis with Python |
|
|
159 | (2) |
|
|
161 | (2) |
|
7.5 Clustering with Python |
|
|
163 | (5) |
|
|
168 | (2) |
|
7.7 Dimensionality Reduction |
|
|
170 | (4) |
|
7.8 Python for Machine ML |
|
|
174 | (3) |
|
7.9 KNN/Decision Tree/Random Forest/SVM |
|
|
177 | (5) |
|
7.10 Python IDEs for Data Science |
|
|
182 | (1) |
|
|
183 | (1) |
|
|
184 | (3) |
|
|
187 | (22) |
|
8.1 Reading and Getting Data into R |
|
|
187 | (3) |
|
8.1.1 Reading Data into R |
|
|
187 | (2) |
|
8.1.2 Writing Data into Files |
|
|
189 | (1) |
|
|
190 | (1) |
|
|
190 | (1) |
|
8.2 Ordered and Unordered Factors |
|
|
190 | (2) |
|
|
192 | (4) |
|
|
192 | (1) |
|
8.3.1.1 Creating an Array |
|
|
192 | (1) |
|
8.3.1.2 Accessing Elements in an Array |
|
|
193 | (1) |
|
8.3.1.3 Array Manipulation |
|
|
193 | (1) |
|
|
194 | (1) |
|
8.3.2.1 Creating a Matrix |
|
|
194 | (1) |
|
|
194 | (1) |
|
8.3.2.3 Eigenvalues and Eigenvectors |
|
|
195 | (1) |
|
8.3.2.4 Matrix Concatenation |
|
|
195 | (1) |
|
8.4 Lists and Data Frames |
|
|
196 | (2) |
|
|
196 | (1) |
|
|
196 | (1) |
|
8.4.1.2 Concatenation of Lists |
|
|
196 | (1) |
|
|
197 | (1) |
|
8.4.2.1 Creating a Data Frame |
|
|
197 | (1) |
|
8.4.2.2 Accessing the Data Frame |
|
|
197 | (1) |
|
8.4.2.3 Adding Rows and Columns |
|
|
198 | (1) |
|
8.5 Probability Distributions |
|
|
198 | (3) |
|
8.5.1 Normal Distribution |
|
|
199 | (2) |
|
8.6 Statistical Models in R |
|
|
201 | (2) |
|
|
202 | (1) |
|
|
203 | (1) |
|
|
203 | (3) |
|
|
203 | (1) |
|
|
204 | (1) |
|
|
204 | (1) |
|
|
205 | (1) |
|
|
206 | (1) |
|
8.8.1 Visualizing Distributions |
|
|
206 | (1) |
|
8.8.2 Statistics in Distributions |
|
|
206 | (1) |
|
|
207 | (1) |
|
|
208 | (1) |
|
9 Data Science Tool: MATLAB |
|
|
209 | (24) |
|
9.1 Data Science Workflow with MATLAB |
|
|
209 | (2) |
|
|
211 | (5) |
|
|
211 | (2) |
|
9.2.2 How MATLAB Represents Data |
|
|
213 | (1) |
|
|
214 | (1) |
|
9.2.4 Automating the Import Process |
|
|
215 | (1) |
|
9.3 Visualizing and Filtering Data |
|
|
216 | (4) |
|
9.3.1 Plotting Data Contained in Tables |
|
|
217 | (1) |
|
9.3.2 Selecting Data from Tables |
|
|
218 | (1) |
|
9.3.3 Accessing and Creating Table Variables |
|
|
219 | (1) |
|
9.4 Performing Calculations |
|
|
220 | (10) |
|
9.4.1 Basic Mathematical Operations |
|
|
220 | (2) |
|
|
222 | (1) |
|
|
223 | (1) |
|
9.4.4 Calculating Summary Statistics |
|
|
224 | (2) |
|
9.4.5 Correlations between Variables |
|
|
226 | (1) |
|
9.4.6 Accessing Subsets of Data |
|
|
226 | (2) |
|
9.4.7 Performing Calculations by Category |
|
|
228 | (2) |
|
|
230 | (1) |
|
|
231 | (2) |
|
10 GNU Octave as a Data Science Tool |
|
|
233 | (16) |
|
10.1 Vectors and Matrices |
|
|
233 | (5) |
|
10.2 Arithmetic Operations |
|
|
238 | (2) |
|
|
240 | (2) |
|
|
242 | (5) |
|
|
247 | (1) |
|
|
248 | (1) |
|
11 Data Visualization Using Tableau |
|
|
249 | (20) |
|
11.1 Introduction to Data Visualization |
|
|
249 | (1) |
|
11.2 Introduction to Tableau |
|
|
250 | (2) |
|
11.3 Dimensions and Measures, Descriptive Statistics |
|
|
252 | (4) |
|
|
256 | (3) |
|
11.5 Dashboard Design & Principles |
|
|
259 | (2) |
|
|
261 | (3) |
|
11.7 Integrate Tableau with Google Sheets |
|
|
264 | (1) |
|
|
265 | (2) |
|
|
267 | (2) |
Index |
|
269 | |