Preface |
|
xiii | |
Scope |
|
xiv | |
|
|
1 | (6) |
|
|
1 | (1) |
|
|
2 | (1) |
|
|
2 | (2) |
|
|
3 | (1) |
|
|
3 | (1) |
|
1.3 Notable Points in SAS and R Languages |
|
|
4 | (1) |
|
1.4 Some Important Functions with Comparative Comparisons Respectively |
|
|
4 | (1) |
|
|
5 | (1) |
|
|
5 | (2) |
|
|
6 | (1) |
|
2 Data Input, Import and Print |
|
|
7 | (12) |
|
|
7 | (1) |
|
|
7 | (1) |
|
2.2 Importing Data in SAS |
|
|
8 | (2) |
|
|
8 | (2) |
|
2.2.2 Using Proc Import to Import a Raw File |
|
|
10 | (1) |
|
2.2.3 Creating a temporary dataset from a permanent one using "set" |
|
|
10 | (1) |
|
|
10 | (3) |
|
2.3.1 Importing from Comma Separated Value (CSV) Files |
|
|
11 | (1) |
|
2.3.2 Importing from Excel Files |
|
|
11 | (1) |
|
|
12 | (1) |
|
2.3.4 Importing from SPSS and STATA |
|
|
12 | (1) |
|
2.3.5 Assigning the Values Imported to a Data Object in R |
|
|
12 | (1) |
|
|
13 | (1) |
|
|
13 | (1) |
|
2.4.1.1 Using the c() function is the simplest way to create a list in R |
|
|
13 | (1) |
|
2.4.1.2 Providing missing values to the vector |
|
|
13 | (1) |
|
2.4.1.3 To Input multiple columns of data |
|
|
14 | (1) |
|
2.4.1.4 Using loops to input |
|
|
14 | (1) |
|
|
14 | (2) |
|
|
16 | (1) |
|
|
16 | (1) |
|
|
16 | (1) |
|
|
17 | (1) |
|
|
17 | (2) |
|
|
18 | (1) |
|
3 Data Inspection and Cleaning |
|
|
19 | (14) |
|
|
19 | (1) |
|
|
19 | (3) |
|
3.2.1 Data Inspection in SAS |
|
|
19 | (1) |
|
3.2.2 Data Inspection in R |
|
|
20 | (2) |
|
|
22 | (7) |
|
3.3.1 Missing Values in SAS |
|
|
22 | (4) |
|
3.3.2 Missing Values in R |
|
|
26 | (3) |
|
|
29 | (2) |
|
3.4.1 Data Cleaning in SAS |
|
|
29 | (2) |
|
|
31 | (1) |
|
|
31 | (2) |
|
|
32 | (1) |
|
4 Handling Dates, Strings, Numbers |
|
|
33 | (18) |
|
4.1 Working with Numeric Data |
|
|
33 | (4) |
|
4.1.1 Handling Numbers in SAS |
|
|
33 | (2) |
|
|
35 | (2) |
|
4.2 Working with Date Data |
|
|
37 | (5) |
|
4.2.1 Handling Dates in SAS |
|
|
37 | (2) |
|
4.2.2 Handling Dates in R |
|
|
39 | (3) |
|
4.3 Handling Strings Data |
|
|
42 | (6) |
|
4.3.1 Handling Strings Data in SAS |
|
|
42 | (4) |
|
4.3.2 Handling Strings Data in R |
|
|
46 | (2) |
|
|
48 | (3) |
|
|
49 | (2) |
|
5 Numerical Summary and Groupby Analysis |
|
|
51 | (24) |
|
5.1 Numerical Summary and Groupby Analysis |
|
|
51 | (1) |
|
5.2 Numerical Summary and Groupby Analysis in SAS |
|
|
51 | (7) |
|
5.3 Numerical Summary and Group by Analysis in R |
|
|
58 | (13) |
|
5.3.1 Hmisc and Data.Table Packages |
|
|
60 | (3) |
|
|
63 | (8) |
|
|
71 | (4) |
|
|
72 | (3) |
|
6 Frequency Distributions and Cross Tabulations |
|
|
75 | (10) |
|
6.1 Frequency Distributions in SAS |
|
|
75 | (3) |
|
6.2 Frequency Distributions in R |
|
|
78 | (4) |
|
6.2.1 Frequency Tabulations in R |
|
|
78 | (3) |
|
6.2.2 Frequency Tabulations in R with Other Variables Statistics |
|
|
81 | (1) |
|
|
82 | (3) |
|
|
82 | (3) |
|
7 Using SQL with SAS and R |
|
|
85 | (34) |
|
|
85 | (1) |
|
|
85 | (1) |
|
|
85 | (1) |
|
|
86 | (1) |
|
|
86 | (26) |
|
|
89 | (1) |
|
|
89 | (1) |
|
7.2.3 AND, OR, NOT in SQL |
|
|
90 | (3) |
|
7.2.4 SQL Select Distinct |
|
|
93 | (1) |
|
|
94 | (2) |
|
|
96 | (1) |
|
7.2.7 SQL Aggregate Functions |
|
|
97 | (1) |
|
|
98 | (1) |
|
|
99 | (1) |
|
|
100 | (2) |
|
|
102 | (1) |
|
7.2.12 SQL LIKE and BETWEEN |
|
|
103 | (1) |
|
|
104 | (1) |
|
|
105 | (1) |
|
7.2.15 SQL CREATE TABLE and SQL CONSTRAINTS |
|
|
106 | (2) |
|
|
108 | (2) |
|
|
110 | (2) |
|
|
112 | (5) |
|
|
117 | (1) |
|
|
117 | (2) |
|
|
118 | (1) |
|
8 Functions, Loops, Arrays, Macros |
|
|
119 | (10) |
|
|
119 | (1) |
|
|
119 | (2) |
|
|
121 | (1) |
|
|
122 | (4) |
|
|
126 | (3) |
|
|
127 | (2) |
|
|
129 | (22) |
|
9.1 Importance of Data Visualization |
|
|
129 | (1) |
|
9.2 Data Visualization in SAS |
|
|
130 | (13) |
|
9.3 Data Visualization in R |
|
|
143 | (5) |
|
|
148 | (3) |
|
|
149 | (2) |
|
|
151 | (8) |
|
|
151 | (2) |
|
|
153 | (3) |
|
|
156 | (3) |
|
|
157 | (2) |
|
11 Statistics for Data Scientists |
|
|
159 | (24) |
|
|
159 | (1) |
|
11.2 Statistical Methods for Data Analysis |
|
|
160 | (1) |
|
|
160 | (1) |
|
11.4 Descriptive Statistics |
|
|
161 | (1) |
|
11.4.1 Measures of Central Tendency: It is the Measure of Location that Gives an Overall Idea of the Dataset |
|
|
161 | (1) |
|
11.4.2 Measures of Dispersion |
|
|
161 | (1) |
|
11.4.3 Skewness and Kurtosis |
|
|
162 | (1) |
|
11.4.4 Central Limit Theorem |
|
|
162 | (1) |
|
11.5 Inferential Statistics |
|
|
162 | (4) |
|
11.5.1 Hypothesis Testing |
|
|
163 | (2) |
|
|
165 | (1) |
|
|
166 | (1) |
|
11.6 Algorithms in Data Science |
|
|
166 | (15) |
|
|
167 | (1) |
|
11.6.2 Types of Regression |
|
|
167 | (1) |
|
11.6.3 Metrics to Evaluate Regression |
|
|
168 | (1) |
|
11.6.4 Types of Classification |
|
|
169 | (2) |
|
11.6.5 Metrics to Evaluate Classification |
|
|
171 | (3) |
|
11.6.6 Types of Clustering |
|
|
174 | (3) |
|
11.6.7 Types of Time Series Analysis |
|
|
177 | (3) |
|
11.6.8 Types of Dimensionality Reduction |
|
|
180 | (1) |
|
11.6.9 Types of Text Mining |
|
|
180 | (1) |
|
|
181 | (2) |
|
|
181 | (2) |
Further Reading |
|
183 | (2) |
Index |
|
185 | |