Klienditugi: 7440010 (E-R 10-18)

Abi | Registreeri | Logi sisse

Data Science at the Command Line [Pehme köide]

3.86/5 (211 hinnangut Goodreads-ist)

Jeroen Janssens

Formaat: Paperback / softback, 212 pages, kõrgus x laius x paksus: 233x176x16 mm, kaal: 28 g
Ilmumisaeg: 07-Oct-2014
Kirjastus: O'Reilly Media
ISBN-10: 1491947853
ISBN-13: 9781491947852

Teised raamatud teemal:

Data capture & analysis

Pehme köide
Hind: 58,59 €*
* saadame teile pakkumise kasutatud raamatule, mille hind võib erineda kodulehel olevast hinnast
See raamat on trükist otsas, kuid me saadame teile pakkumise kasutatud raamatule.
Kogus:
- - 1
  - 2
  - 3
  - 4
  - 5
  - 6
  - 7
  - 8
  - 9
  - 10
Lisa ostukorvi
Tasuta tarne
Lisa soovinimekirja

Formaat: Paperback / softback, 212 pages, kõrgus x laius x paksus: 233x176x16 mm, kaal: 28 g
Ilmumisaeg: 07-Oct-2014
Kirjastus: O'Reilly Media
ISBN-10: 1491947853
ISBN-13: 9781491947852

Teised raamatud teemal:

Data capture & analysis

Püsilink: https://www.kriso.ee/db/9781491947852.html

Märksõnad:

This hands-on guide demonstrates how the flexibility of the command line can help you become a more efficient and productive data scientist. You’ll learn how to combine small, yet powerful, command-line tools to quickly obtain, scrub, explore, and model your data.

To get you started—whether you’re on Windows, OS X, or Linux—author Jeroen Janssens introduces the Data Science Toolbox, an easy-to-install virtual environment packed with over 80 command-line tools.

Discover why the command line is an agile, scalable, and extensible technology. Even if you’re already comfortable processing data with, say, Python or R, you’ll greatly improve your data science workflow by also leveraging the power of the command line.

Obtain data from websites, APIs, databases, and spreadsheets
Perform scrub operations on plain text, CSV, HTML/XML, and JSON
Explore data, compute descriptive statistics, and create visualizations
Manage your data science workflow using Drake
Create reusable tools from one-liners and existing Python or R code
Parallelize and distribute data-intensive pipelines using GNU Parallel
Model data with dimensionality reduction, clustering, regression, and classification algorithms

Preface

1 Introduction

(12)

Overview

(1)

Data Science Is OSEMN

(2)

Obtaining Data

(1)

Scrubbing Data

(1)

Exploring Data

(1)

Modeling Data

(1)

Interpreting Data

(1)

Intermezzo
Chapters

(1)

What Is the Command Line??

(2)

Why Data Science at the Command Line?

(2)

The Command Line Is Agile

(1)

The Command Line Is Augmenting

(1)

The Command Line Is Scalable

(1)

The Command Line Is Extensible

(1)

The Command Line Is Ubiquitous

(1)

A Real-World Use Case

(3)

Further Reading

(1)

2 Getting Started

(16)

Overview

(1)

Setting Up Your Data Science Toolbox

(4)

Step 1 Download and Install VirtualBox

(1)

Step 2 Download and Install Vagrant

(1)

Step 3 Download and Start the Data Science Toolbox

(1)

Step 4 Log In (on Linux and Mac OS X)

(1)

Step 4 Log In (on Microsoft Windows)

(1)

Step 5 Shut Down or Start Anew

(1)

Essential Concepts and Tools

(10)

The Environment

(1)

Executing a Command-Line Tool

(1)

Five Types of Command-Line Tools

(3)

Combining Command-Line Tools

(1)

Redirecting Input and Output

(1)

Working with Files

(1)

Help!

(2)

Further Reading

(2)

3 Obtaining Data

(12)

Overview

(1)

Copying Local Files to the Data Science Toolbox

(1)

Local Version of Data Science Toolbox

(1)

Remote Version of Data Science Toolbox

(1)

Decompressing Files

(1)

Converting Microsoft Excel Spreadsheets

(2)

Querying Relational Databases

(1)

Downloading from the Internet

(2)

Calling Web APIs

(2)

Further Reading

(2)

4 Creating Reusable Command-Line Tools

(14)

Overview

(1)

Converting One-Liners into Shell Scripts

(7)

Step 1 Copy and Paste

(1)

Step 2 Add Permission to Execute

(1)

Step 3 Define Shebang

(1)

Step 4 Remove Fixed Input

(1)

Step 5 Parameterize

(1)

Step 6 Extend Your PATH

(1)

Creating Command-Line Tools with Python and R

(4)

Porting the Shell Script

(2)

Processing Streaming Data from Standard Input

(1)

Further Reading

(2)

5 Scrubbing Data

(26)

Overview

(1)

Common Scrub Operations for Plain Text

(6)

Filtering Lines

(3)

Extracting Values

(2)

Replacing and Deleting Values

(1)

Working with CSV

(5)

Bodies and Headers and Columns, Oh My!

(5)

Performing SQL Queries on CSV

(1)

Working with HTML/XML and JSON

(5)

Common Scrub Operations for CSV

(8)

Extracting and Reordering Columns

(1)

Filtering Lines

(2)

Merging Columns

(2)

Combining Multiple CSV Files

(3)

Further Reading

(1)

6 Managing Your Data Workflow

(10)

Overview

(1)

Introducing Drake

(1)

Installing Drake

(2)

Obtain Top Ebooks from Project Gutenberg

(1)

Every Workflow Starts with a Single Step

(2)

Well, That Depends

(2)

Rebuilding Specific Targets

(1)

Discussion

(1)

Further Reading

(1)

7 Exploring Data

(24)

Overview

(1)

Inspecting Data and Its Properties

(4)

Header or Not, Here I Come

(1)

Inspect All the Data

(1)

Feature Names and Data Types

(2)

Unique Identifiers, Continuous Variables, and Factors

(1)

Computing Descriptive Statistics

(6)

Using csvstat

(3)

Using R from the Command Line with Rio

(3)

Creating Visualizations

102

(12)

Introducing Gnuplot and feedgnuplot

102

(2)

Introducing ggplot2

104

(3)

Histograms

107

(1)

Bar Plots

108

(2)

Density Plots

110

(1)

Box Plots

111

(1)

Scatter Plots

112

(1)

Line Graphs

113

(1)

Summary

114

(1)

Further Reading

114

(1)

8 Parallel Pipelines

115

(20)

Overview

116

(1)

Serial Processing

116

(3)

Looping Over Numbers

116

(1)

Looping Over Lines

117

(1)

Looping Over Files

118

(1)

Parallel Processing

119

(6)

Introducing GNU Parallel

121

(1)

Specifying Input

122

(1)

Controlling the Number of Concurrent Jobs

123

(1)

Logging and Output

123

(1)

Creating Parallel Tools

124

(1)

Distributed Processing

125

(7)

Get a List of Running AWS EC2 Instances

126

(1)

Running Commands on Remote Machines

127

(1)

Distributing Local Data Among Remote Machines

128

(1)

Processing Files on Remote Machines

129

(3)

Discussion

132

(1)

Further Reading

133

(2)

9 Modeling Data

135

(24)

Overview

136

(1)

More Wine, Please!

136

(3)

Dimensionality Reduction with Tapkee

139

(3)

Introducing Tapkee

140

(1)

Installing Tapkee

140

(1)

Linear and Nonlinear Mappings

141

(1)

Clustering with Weka

142

(8)

Introducing Weka

143

(1)

Taming Weka on the Command Line

143

(4)

Converting Between CSV and ARFF

147

(1)

Comparing Three Clustering Algorithms

147

(3)

Regression with SciKit-Learn Laboratory

150

(3)

Preparing the Data

150

(1)

Running the Experiment

151

(1)

Parsing the Results

151

(2)

Classification with BigML

153

(3)

Creating Balanced Train and Test Data Sets

153

(2)

Calling the API

155

(1)

Inspecting the Results

155

(1)

Conclusion

156

(1)

Further Reading

156

(3)

10 Conclusion

159

(6)

Let's Recap

159

(1)

Three Pieces of Advice

160

(1)

Be Patient

160

(1)

Be Creative

161

(1)

Be Practical

161

(1)

"Where to Go from Here?

161

(1)

APIs

161

(1)

Shell Programming

162

(1)

Python, R, and SQL

162

(1)

Interpreting Data

162

(1)

Getting in Touch

162

(3)

A List of Command-Line Tools

165

(18)

B Bibliography

183

(4)

Index

187

Jeroen Janssens is a senior data scientist at YPlan in New York City. His specialties are in machine learning, anomaly detection, and data visualization. Jeroen is passionate about building open source tools for doing data science. He obtained a B.Sc. in Life Sciences and an M.Sc. in Artificial Intelligence, both cum laude from Maastricht University, the Netherlands. Jeroen completed his Ph.D. in Machine Learning at the Tilburg center for Cognition and Communication, Tilburg University. Outside of work, you may find him biking the Brooklyn Bridge, beatboxing, or eating stroopwafels.

Data Science at the Command Line [Pehme köide]

Konto & seaded

Otsing

Otsingu andmebaas

Filtreeri tulemusi

Teemad Ingliskeelsed raamatud

Vali ostukorv