Muutke küpsiste eelistusi

E-raamat: SQL for Data Scientists: A Beginner's Guide for Building Datasets for Analysis

(HelioCampus)
  • Formaat: EPUB+DRM
  • Ilmumisaeg: 17-Aug-2021
  • Kirjastus: John Wiley & Sons Inc
  • Keel: eng
  • ISBN-13: 9781119669395
  • Formaat - EPUB+DRM
  • Hind: 37,04 €*
  • * hind on lõplik, st. muud allahindlused enam ei rakendu
  • Lisa ostukorvi
  • Lisa soovinimekirja
  • See e-raamat on mõeldud ainult isiklikuks kasutamiseks. E-raamatuid ei saa tagastada.
  • Raamatukogudele
  • Formaat: EPUB+DRM
  • Ilmumisaeg: 17-Aug-2021
  • Kirjastus: John Wiley & Sons Inc
  • Keel: eng
  • ISBN-13: 9781119669395

DRM piirangud

  • Kopeerimine (copy/paste):

    ei ole lubatud

  • Printimine:

    ei ole lubatud

  • Kasutamine:

    Digitaalõiguste kaitse (DRM)
    Kirjastus on väljastanud selle e-raamatu krüpteeritud kujul, mis tähendab, et selle lugemiseks peate installeerima spetsiaalse tarkvara. Samuti peate looma endale  Adobe ID Rohkem infot siin. E-raamatut saab lugeda 1 kasutaja ning alla laadida kuni 6'de seadmesse (kõik autoriseeritud sama Adobe ID-ga).

    Vajalik tarkvara
    Mobiilsetes seadmetes (telefon või tahvelarvuti) lugemiseks peate installeerima selle tasuta rakenduse: PocketBook Reader (iOS / Android)

    PC või Mac seadmes lugemiseks peate installima Adobe Digital Editionsi (Seeon tasuta rakendus spetsiaalselt e-raamatute lugemiseks. Seda ei tohi segamini ajada Adober Reader'iga, mis tõenäoliselt on juba teie arvutisse installeeritud )

    Seda e-raamatut ei saa lugeda Amazon Kindle's. 

Jump-start your career as a data scientist—learn to develop datasets for exploration, analysis, and machine learning

SQL for Data Scientists: A Beginner's Guide for Building Datasets for Analysis is a resource that’s dedicated to the Structured Query Language (SQL) and dataset design skills that data scientists use most. Aspiring data scientists will learn how to how to construct datasets for exploration, analysis, and machine learning. You can also discover how to approach query design and develop SQL code to extract data insights while avoiding common pitfalls.

You may be one of many people who are entering the field of Data Science from a range of professions and educational backgrounds, such as business analytics, social science, physics, economics, and computer science. Like many of them, you may have conducted analyses using spreadsheets as data sources, but never retrieved and engineered datasets from a relational database using SQL, which is a programming language designed for managing databases and extracting data.

This guide for data scientists differs from other instructional guides on the subject. It doesn’t cover SQL broadly. Instead, you’ll learn the subset of SQL skills that data analysts and data scientists use frequently. You’ll also gain practical advice and direction on "how to think about constructing your dataset."

  • Gain an understanding of relational database structure, query design, and SQL syntax
  • Develop queries to construct datasets for use in applications like interactive reports and machine learning algorithms
  • Review strategies and approaches so you can design analytical datasets
  • Practice your techniques with the provided database and SQL code

In this book, author Renee Teate shares knowledge gained during a 15-year career working with data, in roles ranging from database developer to data analyst to data scientist. She guides you through SQL code and dataset design concepts from an industry practitioner’s perspective, moving your data scientist career forward!

 

 

 

 

Introduction xix
Chapter 1 Data Sources 1(14)
Data Sources
1(1)
Tools for Connecting to Data Sources and Editing SQL
2(1)
Relational Databases
3(4)
Dimensional Data Warehouses
7(2)
Asking Questions About the Data Source
9(2)
Introduction to the Farmer's Market Database
11(1)
A Note on Machine Learning Dataset Terminology
12(1)
Exercises
13(2)
Chapter 2 The SELECT Statement 15(16)
The SELECT Statement
15(1)
The Fundamental Syntax Structure of a SELECT Query
16(1)
Selecting Columns and Limiting the Number of Rows Returned
16(2)
The ORDER BY Clause: Sorting Results
18(2)
Introduction to Simple Inline Calculations
20(2)
More Inline Calculation Examples: Rounding
22(2)
More Inline Calculation Examples: Concatenating Strings
24(2)
Evaluating Query Output
26(3)
SELECT Statement Summary
29(1)
Exercises Using the Included Database
30(1)
Chapter 3 The WHERE Clause 31(18)
The WHERE Clause
31(1)
Filtering SELECT Statement Results
32(2)
Filtering on Multiple Conditions
34(6)
Multi-Column Conditional Filtering
40(1)
More Ways to Filter
41(5)
BETWEEN
41(1)
IN
42(1)
LIKE
43(1)
IS NULL
44(1)
A Warning About Null Comparisons
44(2)
Filtering Using Subqueries
46(1)
Exercises Using the Included Database
47(2)
Chapter 4 CASE Statements 49(12)
CASE Statement Syntax
50(2)
Creating Binary Flags Using CASE
52(1)
Grouping or Binning Continuous Values Using CASE
53(3)
Categorical Encoding Using CASE
56(3)
CASE Statement Summary
59(1)
Exercises Using the Included Database
60(1)
Chapter 5 SQL JOINs 61(18)
Database Relationships and SQL JOINs
61(10)
A Common Pitfall when Filtering Joined Data
71(3)
JOINs with More than Two Tables
74(2)
Exercises Using the Included Database
76(3)
Chapter 6 Aggregating Results for Analysis 79(18)
GROUP BY Syntax
79(1)
Displaying Group Summaries
80(4)
Performing Calculations Inside Aggregate Functions
84(4)
MIN and MAX
88(2)
COUNT and COUNT DISTINCT
90(1)
Average
91(2)
Filtering with HAVING
93(1)
CASE Statements Inside Aggregate Functions
94(2)
Exercises Using the Included Database
96(1)
Chapter 7 Window Functions and Subqueries 97(16)
ROW NUMBER
98(3)
RANK and DENSE RANK
101(1)
NTILE
102(1)
Aggregate Window Functions
103(5)
LAG and LEAD
108(3)
Exercises Using the Included Database
111(2)
Chapter 8 Date and Time Functions 113(14)
Setting datetime Field Values
114(1)
EXTRACT and DATE_PART
115(1)
DATE_ADD and DATE_SUB
116(2)
DATEDIFF
118(1)
TIMESTAMPDIFF
119(1)
Date Functions in Aggregate Summaries and Window Functions
119(7)
Exercises
126(1)
Chapter 9 Exploratory Data Analysis with SQL 127(16)
Demonstrating Exploratory Data Analysis with SQL
128(1)
Exploring the Products Table
128(3)
Exploring Possible Column Values
131(3)
Exploring Changes Over Time
134(1)
Exploring Multiple Tables Simultaneously
135(3)
Exploring Inventory vs. Sales
138(4)
Exercises
142(1)
Chapter 10 Building SQL Datasets for Analytical Reporting 143(16)
Thinking Through Analytical Dataset Requirements
144(5)
Using Custom Analytical Datasets in SQL: CTEs and Views
149(4)
Taking SQL Reporting Further
153(4)
Exercises
157(2)
Chapter 11 More Advanced Query Structures 159(14)
UNIONs
159(4)
Self-Join to Determine To-Date Maximum
163(4)
Counting New vs. Returning Customers by Week
167(4)
Summary
171(1)
Exercises
171(2)
Chapter 12 Creating Machine Learning Datasets Using SQL 173(18)
Datasets for Time Series Models
174(2)
Datasets for Binary Classification
176(13)
Creating the Dataset
178(3)
Expanding the Feature Set
181(4)
Feature Engineering
185(4)
Taking Things to the Next Level
189(1)
Exercises
189(2)
Chapter 13 Analytical Dataset Development Examples 191(38)
What Factors Correlate with Fresh Produce Sales?
191(20)
How Do Sales Vary by Customer Zip Code, Market Distance, and Demographic Data?
211(6)
How Does Product Price Distribution Affect Market Sales?
217(12)
Chapter 14 Storing and Modifying Data 229(10)
Storing SQL Datasets as Tables and Views
229(3)
Adding a Timestamp Column
232(1)
Inserting Rows and Updating Values in Database Tables
233(3)
Using SQL Inside Scripts
236(1)
In Closing
237(1)
Exercises
238(1)
Appendix Answers to Exercises 239(16)
Index 255
RENÉE M. P. TEATE is the Director of Data Science at HelioCampus, a higher ed tech startup based in the Washington, DC area. She prepares datasets with SQL, develops predictive models with Python, and designs interactive dashboards in Tableau for university decision-makers. She created the Becoming a Data Scientist podcast, helped build the data science learning community on Twitter, and is a sought-after speaker at industry conferences.