Klienditugi: 7440010 (E-R 10-18)

Abi | Registreeri | Logi sisse

Data Science at the Command Line: Obtain, Scrub, Explore, and Model Data with Unix Power Tools 2nd New edition [Pehme köide]

3.83/5 (36 hinnangut Goodreads-ist)

Jeroen Janssens

Formaat: Paperback / softback, 250 pages, kõrgus x laius: 232x178 mm
Ilmumisaeg: 27-Aug-2021
Kirjastus: O'Reilly Media
ISBN-10: 1492087912
ISBN-13: 9781492087915

Teised raamatud teemal:

Databases - (Hetkel poes: 1 nimetust)

Pehme köide
Hind: 63,19 €*
* hind on lõplik, st. muud allahindlused enam ei rakendu
Tavahind: 74,34 €
Säästad 15%
Raamatu kohalejõudmiseks kirjastusest kulub orienteeruvalt 2-4 nädalat
Kogus:
- - 1
  - 2
  - 3
  - 4
  - 5
  - 6
  - 7
  - 8
  - 9
  - 10
Lisa ostukorvi
Tasuta tarne
Tellimisaeg 2-4 nädalat
Lisa soovinimekirja

Formaat: Paperback / softback, 250 pages, kõrgus x laius: 232x178 mm
Ilmumisaeg: 27-Aug-2021
Kirjastus: O'Reilly Media
ISBN-10: 1492087912
ISBN-13: 9781492087915

Teised raamatud teemal:

Databases - (Hetkel poes: 1 nimetust)

Püsilink: https://www.kriso.ee/db/9781492087915.html

Märksõnad:

This thoroughly revised guide demonstrates how the flexibility of the command line can help you become a more efficient and productive data scientist. You'll learn how to combine small yet powerful command-line tools to quickly obtain, scrub, explore, and model your data. To get you started, author Jeroen Janssens provides a Docker image packed with over 80 tools--useful whether you work with Windows, macOS, or Linux.

You'll quickly discover why the command line is an agile, scalable, and extensible technology. Even if you're comfortable processing data with Python or R, you'll learn how to greatly improve your data science workflow by leveraging the command line's power. This book is ideal for data scientists, analysts, and engineers; software and machine learning engineers; and system administrators.

Obtain data from websites, APIs, databases, and spreadsheets
Perform scrub operations on text, CSV, HTM, XML, and JSON files
Explore data, compute descriptive statistics, and create visualizations
Manage your data science workflow
Create reusable command-line tools from one-liners and existing Python or R code
Parallelize and distribute data-intensive pipelines
Model data with dimensionality reduction, clustering, regression, and classification algorithms

Foreword

xiii

Preface

1 Introduction

(10)

Data Science Is OSEMN

(2)

Obtaining Data

(1)

Scrubbing Data

(1)

Exploring Data

(1)

Modeling Data

(1)

Interpreting Data

(1)

Intermezzo
Chapters

(1)

What Is the Command Line?

(2)

Why Data Science at the Command Line?

(3)

The Command Line Is Agile

(1)

The Command Line Is Augmenting

(1)

The Command Line Is Scalable

(1)

The Command Line Is Extensible

(1)

The Command Line Is Ubiquitous

(1)

Summary

(1)

For Further Exploration

(1)

2 Getting Started

(24)

Getting the Data

(1)

Installing the Docker Image

(1)

Essential Unix Concepts

(20)

The Environment

(1)

Executing a Command-Line Tool

(1)

Five Types of Command-Line Tools

(4)

Combining Command-Line Tools

(2)

Redirecting Input and Output

(4)

Working with Files and Directories

(2)

Managing Output

(2)

Help!

(3)

Summary

(1)

For Further Exploration

(2)

3 Obtaining Data

(18)

Overview

(1)

Copying Local Files to the Docker Container

(1)

Downloading from the Internet

(4)

Introducing curl

(1)

Saving

(1)

Other Protocols

(1)

Following Redirects

(2)

Decompressing Files

(2)

Converting Microsoft Excel Spreadsheets to CSV

(3)

Querying Relational Databases

(1)

Calling Web APIs

(5)

Authentication

(2)

Streaming APIs

(2)

Summary

(1)

For Further Exploration

(1)

4 Creating Command-Line Tools

(24)

Overview

(1)

Converting One-Liners into Shell Scripts

(14)

Step 1: Create a File

(3)

Step 2: Give Permission to Execute

(1)

Step 3: Define a Shebang

(3)

Step 4: Remove the Fixed Input

(1)

Step 5: Add Arguments

(2)

Step 6: Extend Your PATH

(1)

Creating Command-Line Tools with Python and R

(5)

Porting the Shell Script

(2)

Processing Streaming Data from Standard Input

(2)

Summary

(1)

For Further Exploration

(3)

5 Scrubbing Data

(30)

Overview

(1)

Transformations, Transformations Everywhere

(3)

Plain Text

(9)

Filtering Lines

(5)

Extracting Values

(2)

Replacing and Deleting Values

(2)

CSV

(11)

Bodies and Headers and Columns, Oh My!

(3)

Performing SQL Queries on CSV

(1)

Extracting and Reordering Columns

(1)

Filtering Rows

(1)

Merging Columns

(3)

Combining Multiple CSV Files

(2)

Working with XML/HTML and JSON

101

(3)

Summary

104

(1)

For Further Exploration

105

(2)

6 Project Management with Make

107

(12)

Overview

108

(1)

Introducing Make

109

(1)

Running Tasks

109

(3)

Building, for Real

112

(1)

Adding Dependencies

113

(5)

Summary

118

(1)

For Further Exploration

118

(1)

7 Exploring Data

119

(34)

Overview

120

(1)

Inspecting Data and Its Properties

120

(6)

Header or Not, Here I Come

120

(1)

Inspect All the Data

121

(1)

Feature Names and Data Types

122

(2)

Unique Identifiers, Continuous Variables, and Factors

124

(2)

Computing Descriptive Statistics

126

(7)

Column Statistics

126

(3)

R One-Liners on the Shell

129

(4)

Creating Visualizations

133

(19)

Displaying Images from the Command Line

133

(5)

Plotting in a Rush

138

(2)

Creating Bar Charts

140

(2)

Creating Histograms

142

(1)

Creating Density Plots

143

(1)

Happy Little Accidents

144

(2)

Creating Scatter Plots

146

(1)

Creating Trend Lines

147

(2)

Creating Box Plots

149

(1)

Adding Labels

150

(2)

Going Beyond Basic Plots

152

(1)

Summary

152

(1)

For Further Exploration

152

(1)

8 Parallel Pipelines

153

(24)

Overview

154

(1)

Serial Processing

154

(4)

Looping Over Numbers

155

(1)

Looping Over Lines

156

(1)

Looping Over Files

157

(1)

Parallel Processing

158

(9)

Introducing GNU Parallel

160

(2)

Specifying Input

162

(2)

Controlling the Number of Concurrent Jobs

164

(1)

Logging and Output

164

(2)

Creating Parallel Tools

166

(1)

Distributed Processing

167

(7)

Get List of Running AWS EC2 Instances

167

(2)

Running Commands on Remote Machines

169

(1)

Distributing Local Data Among Remote Machines

170

(1)

Processing Files on Remote Machines

171

(3)

Summary

174

(1)

For Further Exploration

175

(2)

9 Modeling Data

177

(22)

Overview

178

(1)

More Wine, Please!

178

(4)

Dimensionality Reduction with Tapkee

182

(5)

Introducing Tapkee

183

(1)

Linear and Nonlinear Mappings

183

(4)

Regression with Vowpal Wabbit

187

(6)

Preparing the Data

187

(1)

Training the Model

188

(2)

Testing the Model

190

(3)

Classification with SciKit-Learn Laboratory

193

(4)

Preparing the Data

193

(1)

Running the Experiment

194

(1)

Parsing the Results

195

(2)

Summary

197

(1)

For Further Exploration

198

(1)

10 Polyglot Data Science

199

(14)

Overview

200

(1)

Jupyter

200

(3)

Python

203

(2)

205

(2)

RStudio

207

(1)

Apache Spark

208

(2)

Summary

210

(1)

For Further Exploration

211

(2)

11 Conclusion

213

(6)

Let's Recap

213

(1)

Three Pieces of Advice

214

(1)

Be Patient

214

(1)

Be Creative

215

(1)

Be Practical

215

(1)

Where to Go from Here

215

(2)

The Command Line

216

(1)

Shell Programming

216

(1)

Python, R, and SQL

216

(1)

APIs

216

(1)

Machine Learning

217

(1)

Getting in Touch

217

(2)

List of Command-Line Tools

219

(30)

Index

249

Jeroen Janssens teaches data science; often through training and coaching, occasionally through speaking, and infrequently through writing. His interests include visualizing data, building machine learning models, and automating things using either Python, R, or Bash. He is the author of Data Science at the Command Line, published by OReilly Media. Jeroen holds a PhD in machine learning from Tilburg University and an MSc in artificial intelligence from Maastricht University. Previously, he was an assistant professor at Jheronimus Academy of Data Science and a data scientist at Elsevier in Amsterdam and various startups in New York City. Currently, Jeroen is the CEO of Data Science Workshops, which organises open enrollment workshops, in-company courses, inspiration sessions, hackathons, and meetups. All related to data science of course.

Data Science at the Command Line: Obtain, Scrub, Explore, and Model Data with Unix Power Tools 2nd New edition [Pehme köide]

Konto & seaded

Otsing

Otsingu andmebaas

Filtreeri tulemusi

Teemad Ingliskeelsed raamatud

Vali ostukorv