Muutke küpsiste eelistusi

Practical Python Data Wrangling and Data Quality: Getting Started with Reading, Cleaning, and Analyzing Data [Pehme köide]

  • Formaat: Paperback / softback, 500 pages, kõrgus x laius: 233x178 mm
  • Ilmumisaeg: 31-Dec-2021
  • Kirjastus: O'Reilly Media
  • ISBN-10: 1492091502
  • ISBN-13: 9781492091509
Teised raamatud teemal:
  • Pehme köide
  • Hind: 75,81 €*
  • * hind on lõplik, st. muud allahindlused enam ei rakendu
  • Tavahind: 89,19 €
  • Säästad 15%
  • Raamatu kohalejõudmiseks kirjastusest kulub orienteeruvalt 2-4 nädalat
  • Kogus:
  • Lisa ostukorvi
  • Tasuta tarne
  • Tellimisaeg 2-4 nädalat
  • Lisa soovinimekirja
  • Formaat: Paperback / softback, 500 pages, kõrgus x laius: 233x178 mm
  • Ilmumisaeg: 31-Dec-2021
  • Kirjastus: O'Reilly Media
  • ISBN-10: 1492091502
  • ISBN-13: 9781492091509
Teised raamatud teemal:

There are awesome discoveries to be made and valuable stories to be told in datasets--and this book will help you uncover them. Whether you already work with data or just want to understand its possibilities, the techniques and advice in this practical book will help you learn how to better clean, evaluate, and analyze data to generate meaningful insights and compelling visualizations.

Through foundational concepts and worked examples, author Susan McGregor provides the tools you need to evaluate and analyze all kinds of data and communicate your findings effectively. This book provides a methodical, jargon-free way for practitioners of all levels to harness the power of data.

  • Use Python 3.8+ to read, write, and transform data from a variety of sources
  • Understand and use programming basics in Python to wrangle data at scale
  • Organize, document, and structure your code using best practices
  • Complete exercises either on your own machine or on the web
  • Collect data from structured data files, web pages, and APIs
  • Perform basic statistical analysis to make meaning from data sets
  • Visualize and present data in clear and compelling ways
Preface ix
1 Introduction to Data Wrangling and Data Quality
1(1)
What Is "Data Wrangling"?
2(1)
What Is "Data Quality"?
3(3)
Data Integrity
4(1)
Data "Fit"
5(1)
Why Python?
6(2)
Versatility
6(1)
Accessibility
7(1)
Readability
7(1)
Community
7(1)
Python Alternatives
8(1)
Writing and "Running" Python
8(3)
Working with Python on Your Own Device
11(8)
Getting Started with the Command Line
11(3)
Installing Python, Jupyter Notebook, and a Code Editor
14(5)
Working with Python Online
19(1)
Hello World!
20(3)
Using Atom to Create a Standalone Python File
20(1)
Using Jupyter to Create a New Python Notebook
21(1)
Using Google Colab to Create a New Python Notebook
22(1)
Adding the Code
23(1)
In a Standalone File
23(1)
In a Notebook
23(1)
Running the Code
23(1)
In a Standalone File
23(1)
In a Notebook
24(1)
Documenting, Saving, and Versioning Your Work
24(11)
Documenting
24(1)
Saving
25(1)
Versioning
26(9)
Conclusion
35(2)
2 Introduction to Python
37(34)
The Programming "Parts of Speech"
38(9)
Nouns = Variables
39(3)
Verbs = Functions
42(4)
Cooking with Custom Functions
46(1)
Libraries: Borrowing Custom Functions from Other Coders
47(1)
Taking Control: Loops and Conditionals
47(8)
In the Loop
48(3)
One Condition
51(4)
Understanding Errors
55(7)
Syntax Snafus
56(2)
Runtime Runaround
58(2)
Logic Loss
60(2)
Hitting the Road with Citi Bike Data
62(8)
Starting with Pseudocode
63(6)
Seeking Scale
69(1)
Conclusion
70(1)
3 Understanding Data Quality
71(20)
Assessing Data Fit
73(6)
Validity
74(2)
Reliability
76(1)
Representativeness
77(2)
Assessing Data Integrity
79(9)
Necessary, but Not Sufficient
81(1)
Important
82(3)
Achievable
85(3)
Improving Data Quality
88(2)
Data Cleaning
88(1)
Data Augmentation
89(1)
Conclusion
90(1)
4 Working with File-Based and Feed-Based Data in Python
91(50)
Structured Versus Unstructured Data
93(4)
Working with Structured Data
97(8)
File-Based, Table-Type Data--Take It to Delimit
97(2)
Wrangling Table-Type Data with Python
99(6)
Real-World Data Wrangling: Understanding Unemployment
105(29)
XLSX, ODS, and All the Rest
107(7)
Finally, Fixed-Width
114(4)
Feed-Based Data--Web-Driven Live Updates
118(2)
Wrangling Feed-Type Data with Python
120(14)
Working with Unstructured Data
134(1)
Image-Based Text: Accessing Data in PDFs
134(6)
Wrangling PDFs with Python
135(4)
Accessing PDF Tables with Tabula
139(1)
Conclusion
140(1)
5 Accessing Web-Based Data
141(44)
Accessing Online XML and JSON
143(2)
Introducing APIs
145(1)
Basic APIs: A Search Engine Example
146(2)
Specialized APIs: Adding Basic Authentication
148(3)
Getting a FRED API Key
149(1)
Using Your API key to Request Data
150(1)
Reading API Documentation
151(2)
Protecting Your API Key When Using Python
153(6)
Creating Your "Credentials" File
155(1)
Using Your Credentials in a Separate Script
155(2)
Getting Started with gitignore
157(2)
Specialized APIs: Working With OAuth
159(13)
Applying for a Twitter Developer Account
160(2)
Creating Your Twitter "App" and Credentials
162(5)
Encoding Your API Key and Secret
167(1)
Requesting an Access Token and Data from the Twitter API
168(4)
API Ethics
172(1)
Web Scraping: The Data Source of Last Resort
173(11)
Carefully Scraping the MTA
176(2)
Using Browser Inspection Tools
178(2)
The Python Web Scraping Solution: Beautiful Soup
180(4)
Conclusion
184(1)
6 Assessing Data Quality
185(40)
The Pandemic and the PPP
187(1)
Assessing Data Integrity
187(28)
Is It of Known Pedigree?
188(1)
Is It Timely?
189(1)
Is It Complete?
189(12)
Is It Well-Annotated?
201(5)
Is It High Volume?
206(2)
Is It Consistent?
208(3)
Is It Multivariate?
211(2)
Is It Atomic?
213(1)
Is It Clear?
213(2)
Is It Dimensionally Structured?
215(1)
Assessing Data Fit
215(7)
Validity
216(3)
Reliability
219(1)
Representativeness
220(2)
Conclusion
222(3)
7 Cleaning, Transforming, and Augmenting Data
225(32)
Selecting a Subset of Citi Bike Data
226(10)
A Simple Split
227(2)
Regular Expressions: Supercharged String Matching
229(4)
Making a Date
233(3)
De-crufting Data Files
236(3)
Decrypting Excel Dates
239(3)
Generating True CSVs from Fixed-Width Data
242(2)
Correcting for Spelling Inconsistencies
244(6)
The Circuitous Path to "Simple" Solutions
250(2)
Gotchas That Will Get Ya!
252(1)
Augmenting Your Data
253(3)
Conclusion
256(1)
8 Structuring and Refactoring Your Code
257(30)
Revisiting Custom Functions
258(1)
Will You Use It More Than Once?
258(1)
Is It Ugly and Confusing?
258(1)
Do You Just Really Hate the Default Functionality?
259(1)
Understanding Scope
259(3)
Defining the Parameters for Function "Ingredients"
262(2)
What Are Your Options?
263(1)
Getting Into Arguments?
263(1)
Return Values
264(1)
Climbing the "Stack"
265(2)
Refactoring for Fun and Profit
267(10)
A Function for Identifying Weekdays
267(3)
Metadata Without the Mess
270(7)
Documenting Your Custom Scripts and Functions with pydoc
277(4)
The Case for Command-Line Arguments
281(3)
Where Scripts and Notebooks Diverge
284(1)
Conclusion
285(2)
9 Introduction to Data Analysis
287(36)
Context Is Everything
288(1)
Same but Different
289(1)
What's Typical? Evaluating Central Tendency
290(2)
What's That Mean?
290(1)
Embrace the Median
291(1)
Think Different: Identifying Outliers
292(1)
Visualization for Data Analysis
292(14)
What's Our Data's Shape? Understanding Histograms
296(1)
The Significance of Symmetry
297(8)
Counting "Clusters"
305(1)
The $2 Million Question
306(11)
Proportional Response
317(4)
Conclusion
321(2)
10 Presenting Your Data
323(34)
Foundations for Visual Eloquence
324(2)
Making Your Data Statement
326(1)
Charts, Graphs, and Maps: Oh My!
327(18)
Pie Charts
328(2)
Bar and Column Charts
330(5)
Line Charts
335(4)
Scatter Charts
339(3)
Maps
342(3)
Elements of Eloquent Visuals
345(4)
The "Finicky" Details Really Do Make a Difference
345(1)
Trust Your Eyes (and the Experts)
345(2)
Selecting Scales
347(1)
Choosing Colors
347(1)
Above All, Annotate!
348(1)
From Basic to Beautiful: Customizing a Visualization with seaborn and matplotlib
349(5)
Beyond the Basics
354(1)
Conclusion
355(2)
11 Beyond Python
357(8)
Additional Tools for Data Review
358(3)
Spreadsheet Programs
358(1)
Open Refine
359(2)
Additional Tools for Sharing and Presenting Data
361(2)
Image Editing for JPGs, PNGs, and GIFs
361(1)
Software for Editing SVGs and Other Vector Formats
362(1)
Reflecting on Ethics
363(1)
Conclusion
364(1)
A More Python Programming Resources 365(4)
B A Bit More About Git 369(6)
C Finding Data 375(6)
D Resources for Visualization and Information Design 381(2)
Index 383
Susan McGregor is the Assistant Director of the Tow Center for Digital Journalism, and has been teaching journalists and other non-programmers to code for more than a decade. With a background in computer science, journalism and information visualization, McGregor loves solving problems that help people achieve greater agency. Following several years as the Senior Programmer of the Online News Graphics team at The Wall Street Journal, McGregor spent nearly a decade at Columbia University, where she taught classes on everything from introductory data journalism to advanced algorithmic investigation and analysis.