E-raamat: Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython

  • Formaat: 522 pages
  • Ilmumisaeg: 25-Sep-2017
  • Kirjastus: O'Reilly Media, Inc, USA
  • ISBN-13: 9781491957615
Teised raamatud teemal:
  • Formaat - EPUB+DRM
  • Hind: 43,47 EUR*
  • Lisa soovinimekirja
  • Lisa ostukorvi
  • * hind on lõplik, st. muud allahindlused enam ei rakendu
  • See e-raamat on mõeldud ainult isiklikuks kasutamiseks.
  • Formaat: 522 pages
  • Ilmumisaeg: 25-Sep-2017
  • Kirjastus: O'Reilly Media, Inc, USA
  • ISBN-13: 9781491957615
Teised raamatud teemal:

DRM piirangud

  • Kopeerimine (copy/paste):

    ei ole lubatud

  • Printimine:

    ei ole lubatud

  • Kasutamine:

    E-raamatu lugemiseks on vaja luua Adobe ID ning laadida arvutisse Adobe Digital Editions. Lähemalt siit. E-raamatut saab lugeda ning alla laadida kuni 6'de seadmesse.
    E-raamatut ei saa lugeda Amazon Kindle's. Ülejäänud meie e-poes pakutavad e-lugerid võimaldavad lugeda Adobe ID-ga kaitstud e-raamatuid.

Get complete instructions for manipulating, processing, cleaning, and crunching datasets in Python. Updated for Python 3.6, the second edition of this hands-on guide is packed with practical case studies that show you how to solve a broad set of data analysis problems effectively. You'll learn the latest versions of pandas, NumPy, IPython, and Jupyter in the process. Written by Wes McKinney, the creator of the Python pandas project, this book is a practical, modern introduction to data science tools in Python. It's ideal for analysts new to Python and for Python programmers new to data science and scientific computing. Data files and related material are available on GitHub. Use the IPython shell and Jupyter notebook for exploratory computing Learn basic and advanced features in NumPy (Numerical Python) Get started with data analysis tools in the pandas library Use flexible tools to load, clean, transform, merge, and reshape data Create informative visualizations with matplotlib Apply the pandas groupby facility to slice, dice, and summarize datasets Analyze and manipulate regular and irregular time series data Learn how to solve real-world data analysis problems with thorough, detailed examples
Preface xi
1 Preliminaries 1(14)
1.1 What Is This Book About?
1(1)
What Kinds of Data?
1(1)
1.2 Why Python for Data Analysis?
2(2)
Python as Glue
2(1)
Solving the "Two-Language" Problem
3(1)
Why Not Python?
3(1)
1.3 Essential Python Libraries
4(4)
NumPy
4(1)
pandas
4(1)
matplotlib
5(1)
IPython and Jupyter
6(1)
SciPy
6(1)
scikit-learn
7(1)
statsmodels
8(1)
1.4 Installation and Setup
8(4)
Windows
9(1)
Apple (OS X, macOS)
9(1)
GNU/Linux
9(1)
Installing or Updating Python Packages
10(1)
Python 2 and Python 3
11(1)
Integrated Development Environments (IDES) and Text Editors
11(1)
1.5 Community and Conferences
12(1)
1.6 Navigating This Book
12(3)
Code Examples
13(1)
Data for Examples
13(1)
Import Conventions
14(1)
Jargon
14(1)
2 Python Language Basics, IPython, and Jupyter Notebooks 15(36)
2.1 The Python Interpreter
16(1)
2.2 IPython Basics
17(13)
Running the IPython Shell
17(1)
Running the Jupyter Notebook
18(3)
Tab Completion
21(2)
Introspection
23(2)
The %run Command
25(1)
Executing Code from the Clipboard
26(1)
Terminal Keyboard Shortcuts
27(1)
About Magic Commands
28(1)
Matplotlib Integration
29(1)
2.3 Python Language Basics
30(21)
Language Semantics
30(8)
Scalar Types
38(8)
Control Flow
46(5)
3 Built-in Data Structures, Functions, and Files 51(34)
3.1 Data Structures and Sequences
51(18)
Tuple
51(3)
List
54(5)
Built-in Sequence Functions
59(2)
dict
61(4)
set
65(2)
List, Set, and Dict Comprehensions
67(2)
3.2 Functions
69(11)
Namespaces, Scope, and Local Functions
70(1)
Returning Multiple Values
71(1)
Functions Are Objects
72(1)
Anonymous (Lambda) Functions
73(1)
Currying: Partial Argument Application
74(1)
Generators
75(2)
Errors and Exception Handling
77(3)
3.3 Files and the Operating System
80(4)
Bytes and Unicode with Files
83(1)
3.4 Conclusion
84(1)
4 NumPy Basics: Arrays and Vectorized Computation 85(38)
4.1 The NumPy ndarray: A Multidimensional Array Object
87(18)
Creating ndarrays
88(2)
Data Types for ndarrays
90(3)
Arithmetic with NumPy Arrays
93(1)
Basic Indexing and Slicing
94(5)
Boolean Indexing
99(3)
Fancy Indexing
102(1)
Transposing Arrays and Swapping Axes
103(2)
4.2 Universal Functions: Fast Element-Wise Array Functions
105(3)
4.3 Array-Oriented Programming with Arrays
108(7)
Expressing Conditional Logic as Array Operations
109(2)
Mathematical and Statistical Methods
111(2)
Methods for Boolean Arrays
113(1)
Sorting
113(1)
Unique and Other Set Logic
114(1)
4.4 File Input and Output with Arrays
115(1)
4.5 Linear Algebra
116(2)
4.6 Pseudorandom Number Generation
118(1)
4.7 Example: Random Walks
119(3)
Simulating Many Random Walks at Once
121(1)
4.8 Conclusion
122(1)
5 Getting Started with pandas 123(44)
5.1 Introduction to pandas Data Structures
124(12)
Series
124(4)
DataFrame
128(6)
Index Objects
134(2)
5.2 Essential Functionality
136(22)
Reindexing
136(2)
Dropping Entries from an Axis
138(2)
Indexing, Selection, and Filtering
140(5)
Integer Indexes
145(1)
Arithmetic and Data Alignment
146(5)
Function Application and Mapping
151(2)
Sorting and Ranking
153(4)
Axis Indexes with Duplicate Labels
157(1)
5.3 Summarizing and Computing Descriptive Statistics
158(7)
Correlation and Covariance
160(2)
Unique Values, Value Counts, and Membership
162(3)
5.4 Conclusion
165(2)
6 Data Loading, Storage, and File Formats 167(24)
6.1 Reading and Writing Data in Text Format
167(16)
Reading Text Files in Pieces
173(2)
Writing Data to Text Format
175(1)
Working with Delimited Formats
176(2)
JSON Data
178(2)
XML and HTML: Web Scraping
180(3)
6.2 Binary Data Formats
183(4)
Using HDF5 Format
184(2)
Reading Microsoft Excel Files
186(1)
6.3 Interacting with Web APIs
187(1)
6.4 Interacting with Databases
188(2)
6.5 Conclusion
190(1)
7 Data Cleaning and Preparation 191(30)
7.1 Handling Missing Data
191(6)
Filtering Out Missing Data
193(2)
Filling In Missing Data
195(2)
7.2 Data Transformation
197(14)
Removing Duplicates
197(1)
Transforming Data Using a Function or Mapping
198(2)
Replacing Values
200(1)
Renaming Axis Indexes
201(2)
Discretization and Binning
203(2)
Detecting and Filtering Outliers
205(1)
Permutation and Random Sampling
206(2)
Computing Indicator/Dummy Variables
208(3)
7.3 String Manipulation
211(8)
String Object Methods
211(2)
Regular Expressions
213(3)
Vectorized String Functions in pandas
216(3)
7.4 Conclusion
219(2)
8 Data Wrangling: Join, Combine, and Reshape 221(32)
8.1 Hierarchical Indexing
221(6)
Reordering and Sorting Levels
224(1)
Summary Statistics by Level
225(1)
Indexing with a DataFrame's columns
225(2)
8.2 Combining and Merging Datasets
227(15)
Database-Style DataFrame Joins
227(5)
Merging on Index
232(4)
Concatenating Along an Axis
236(5)
Combining Data with Overlap
241(1)
8.3 Reshaping and Pivoting
242(9)
Reshaping with Hierarchical Indexing
243(3)
Pivoting "Long" to "Wide" Format
246(3)
Pivoting "Wide" to "Long" Format
249(2)
8.4 Conclusion
251(2)
9 Plotting and Visualization 253(36)
9.1 A Brief matplotlib API Primer
254(15)
Figures and Subplots
255(5)
Colors, Markers, and Line Styles
260(2)
Ticks, Labels, and Legends
262(4)
Annotations and Drawing on a Subplot
266(2)
Saving Plots to File
268(1)
matplotlib Configuration
269(1)
9.2 Plotting with pandas and seaborn
269(17)
Line Plots
270(3)
Bar Plots
273(5)
Histograms and Density Plots
278(3)
Scatter or Point Plots
281(3)
Facet Grids and Categorical Data
284(2)
9.3 Other Python Visualization Tools
286(1)
9.4 Conclusion
287(2)
10 Data Aggregation and Group Operations 289(30)
10.1 GroupBy Mechanics
290(8)
Iterating Over Groups
293(2)
Selecting a Column or Subset of Columns
295(1)
Grouping with Dicts and Series
296(1)
Grouping with Functions
297(1)
Grouping by Index Levels
297(1)
10.2 Data Aggregation
298(6)
Column-Wise and Multiple Function Application
300(3)
Returning Aggregated Data Without Row Indexes
303(1)
10.3 Apply: General split-apply-combine
304(11)
Suppressing the Group Keys
306(1)
Quantile and Bucket Analysis
307(1)
Example: Filling Missing Values with Group-Specific Values
308(2)
Example: Random Sampling and Permutation
310(2)
Example: Group Weighted Average and Correlation
312(2)
Example: Group-Wise Linear Regression
314(1)
10.4 Pivot Tables and Cross-Tabulation
315(3)
Cross-Tabulations: Crosstab
317(1)
10.5 Conclusion
318(1)
11 Time Series 319(46)
11.1 Date and Time Data Types and Tools
320(4)
Converting Between String and Datetime
321(3)
11.2 Time Series Basics
324(5)
Indexing, Selection, Subsetting
325(3)
Time Series with Duplicate Indices
328(1)
11.3 Date Ranges, Frequencies, and Shifting
329(8)
Generating Date Ranges
330(2)
Frequencies and Date Offsets
332(2)
Shifting (Leading and Lagging) Data
334(3)
11.4 Time Zone Handling
337(4)
Time Zone Localization and Conversion
337(3)
Operations with Time Zone-Aware Timestamp Objects
340(1)
Operations Between Different Time Zones
341(1)
11.5 Periods and Period Arithmetic
341(9)
Period Frequency Conversion
342(2)
Quarterly Period Frequencies
344(2)
Converting Timestamps to Periods (and Back)
346(1)
Creating a Periodlndex from Arrays
347(3)
11.6 Resampling and Frequency Conversion
350(6)
Downsampling
351(3)
Upsampling and Interpolation
354(1)
Resampling with Periods
355(1)
11.7 Moving Window Functions
356(8)
Exponentially Weighted Functions
360(1)
Binary Moving Window Functions
361(2)
User-Defined Moving Window Functions
363(1)
11.8 Conclusion
364(1)
12 Advanced pandas 365(20)
12.1 Categorical Data
365(10)
Background and Motivation
365(2)
Categorical Type in pandas
367(2)
Computations with Categoricals
369(3)
Categorical Methods
372(3)
12.2 Advanced GroupBy Use
375(5)
Group Transforms and "Unwrapped" GroupBys
375(4)
Grouped Time Resampling
379(1)
12.3 Techniques for Method Chaining
380(3)
The pipe Method
382(1)
12.4 Conclusion
383(2)
13 Introduction to Modeling Libraries in Python 385(20)
13.1 Interfacing Between pandas and Model Code
385(3)
13.2 Creating Model Descriptions with Patsy
388(7)
Data Transformations in Patsy Formulas
391(1)
Categorical Data and Patsy
392(3)
13.3 Introduction to statsmodels
395(4)
Estimating Linear Models
395(3)
Estimating Time Series Processes
398(1)
13.4 Introduction to scikit-learn
399(4)
13.5 Continuing Your Education
403(2)
14 Data Analysis Examples 405(46)
14.1 1.USA.gov Data from Bitly
405(10)
Counting Time Zones in Pure Python
406(2)
Counting Time Zones with pandas
408(7)
14.2 MovieLens 1M Dataset
415(6)
Measuring Rating Disagreement
420(1)
14.3 US Baby Names 1880-2010
421(15)
Analyzing Naming Trends
427(9)
14.4 USDA Food Database
436(6)
14.5 2012 Federal Election Commission Database
442(8)
Donation Statistics by Occupation and Employer
444(3)
Bucketing Donation Amounts
447(2)
Donation Statistics by State
449(1)
14.6 Conclusion
450(1)
A Advanced NumPy 451(34)
A.1 ndarray Object Internals
451(2)
NumPy dtype Hierarchy
452(1)
A.2 Advanced Array Manipulation
453(9)
Reshaping Arrays
454(2)
C Versus Fortran Order
456(1)
Concatenating and Splitting Arrays
456(3)
Repeating Elements: tile and repeat
459(2)
Fancy Indexing Equivalents: take and put
461(1)
A.3 Broadcasting
462(6)
Broadcasting Over Other Axes
464(3)
Setting Array Values by Broadcasting
467(1)
A.4 Advanced ufunc Usage
468(3)
ufunc Instance Methods
468(2)
Writing New ufuncs in Python
470(1)
A.5 Structured and Record Arrays
471(2)
Nested dtypes and Multidimensional Fields
471(1)
Why Use Structured Arrays?
472(1)
A.6 More About Sorting
473(5)
Indirect Sorts: argsort and lexsort
474(2)
Alternative Sort Algorithms
476(1)
Partially Sorting Arrays
476(1)
numpy.searchsorted: Finding Elements in a Sorted Array
477(1)
A.7 Writing Fast NumPy Functions with Numba
478(2)
Creating Custom numpy.ufunc Objects with Numba
480(1)
A.8 Advanced Array Input and Output
480(2)
Memory-Mapped Files
480(2)
HDF5 and Other Array Storage Options
482(1)
A.9 Performance Tips
482(3)
The Importance of Contiguous Memory
482(3)
B More on the !Python System 485(22)
B.1 Using the Command History
485(2)
Searching and Reusing the Command History
485(1)
Input and Output Variables
486(1)
B.2 Interacting with the Operating System
487(2)
Shell Commands and Aliases
488(1)
Directory Bookmark System
489(1)
B.3 Software Development Tools
489(11)
Interactive Debugger
490(4)
Timing Code: %time and %timeit
494(2)
Basic Profiling: %prun and %run-p
496(2)
Profiling a Function Line by Line
498(2)
B.4 Tips for Productive Code Development Using IPython
500(2)
Reloading Module Dependencies
500(1)
Code Design Tips
501(1)
B.5 Advanced IPython Features
502(3)
Making Your Own Classes IPython-Friendly
503(1)
Profiles and Configuration
503(2)
B.6 Conclusion
505(2)
Index 507
Wes McKinney is the main author of pandas, the popular open sourcePython library for data analysis. Wes is an active speaker andparticipant in the Python and open source communities. He worked as a quantitative analyst at AQR Capital Management and Python consultant before founding DataPad, a data analytics company, in 2013. He graduated from MIT with an S.B. in Mathematics.