Klienditugi: 7440010 (E-R 10-18)

Abi | Registreeri | Logi sisse

Data Wrangler's Handbook: Simple Tools for Powerful Results [Pehme köide]

3.00/5 (8 hinnangut Goodreads-ist)

Kyle Banerjee

Formaat: Paperback / softback, 176 pages, kõrgus x laius x paksus: 228x152x10 mm, kaal: 257 g
Ilmumisaeg: 30-Aug-2019
Kirjastus: Association of College & Research Libraries
ISBN-10: 083891909X
ISBN-13: 9780838919095

Teised raamatud teemal:

Computer networking & communications - (Hetkel poes: 3 nimetust)
Storage media & peripherals
Information retrieval

Pehme köide
Hind: 77,05 €
Raamatu kohalejõudmiseks kirjastusest kulub orienteeruvalt 2-4 nädalat
Kogus:
- - 1
  - 2
  - 3
  - 4
  - 5
  - 6
  - 7
  - 8
  - 9
  - 10
Lisa ostukorvi
Tasuta tarne
Tellimisaeg 2-4 nädalat
Lisa soovinimekirja

Formaat: Paperback / softback, 176 pages, kõrgus x laius x paksus: 228x152x10 mm, kaal: 257 g
Ilmumisaeg: 30-Aug-2019
Kirjastus: Association of College & Research Libraries
ISBN-10: 083891909X
ISBN-13: 9780838919095

Teised raamatud teemal:

Computer networking & communications - (Hetkel poes: 3 nimetust)
Storage media & peripherals
Information retrieval

Püsilink: https://www.kriso.ee/db/9780838919095.html

Märksõnad:

Data manipulation and analysis are far easier than you might imagine—in fact, using tools that come standard with your desktop computer, you can learn how to extract, manipulate, and analyze data (and metadata) of any size and complexity. In this handbook, data wizard Banerjee will familiarize you with easily digestible but powerful concepts that will enable you to feel confident working with data. With his expert guidance, you'll learn how to

use a single-word command to sort files of any size by any criteria, identify duplicates, and perform numerous other common library tasks;
understand data formats, delimited text and CSV files, XML, JSON, scripting, and other key components of data;
undertake more sophisticated tasks such as comparing files, converting data from one format to another, reformatting values, combining data from multiple files, and communicating with APIs (Application Programming Interfaces);
save time and stress through simple techniques for transforming text, recognizing symbols that perform important tasks, a Regular Expression cheat sheet, a glossary, and other tools.

Library technologists and those involved in maintaining and analyzing data and metadata will find Banerjee’s resource essential.

Data manipulation and analysis are far easier than you might imagine—in fact, using tools that come standard with your desktop computer, you can learn how to extract, manipulate, and analyze data (and metadata) of any size and complexity.

"Data manipulation and analysis are far easier than you might imagine - in fact, using tools that come standard with your desktop computer, you can learn how to extract, manipulate, and analyze data (and metadata) of any size and complexity. In this handbook, data wizard Banerjee will familiarize you with easily digestible but powerful concepts that will enable you to feel confident working with data. With his expert guidance, you'll learn how to use a single-word command to sort files of any size by anycriteria, identify duplicates, and perform numerous other common library tasks; understand data formats, delimited text and CSV files, XML, JSON, scripting, and other key components of data; undertake more sophisticated tasks such as comparing files, converting data from one format to another, reformatting values, combining data from multiple files, and communicating with APIs (Application Programming Interfaces); and save time and stress through simple techniques for transforming text, recognizing symbols that perform important tasks, a Regular Expression cheat sheet, a glossary, and other tools"--

Banerjee, who has worked with data in academic, government, and nonprofit settings, offers a handbook for librarians that explains simple methods that can be used on any computer for managing, extracting, or analyzing text-based data. Focusing on the most essential information, he describes the computer environment and basic concepts for navigating it, including finding the command line; how to apply command line concepts; formats and performing sophisticated operations on delimited text, XML (eXtensible Markup Language), JSON (JavaScript Object Notation), and other formats; how to simplify complicated data problems; tools and techniques for delimited texts, XML, and JSON; scripting; and solving common problems, such as viewing large files, locating files with particular data or characteristics, working with internal metadata or APIs (Application Programming Interfaces), and combining data from different sources. The book ends with commands and functions useful in a library context. Annotation ©2019 Ringgold, Inc., Portland, OR (protoview.com)

Arvustused

I highly recommend The Data Wranglers Handbook for anyone who now manipulates data or may need to do so in the future. In Banerjees words, 'If these tasks [ that require data wrangling] sound intimidating, this book is for you. You will understand everything in this book even if you have no special technical knowledge or programming experience.'"" Technicalities

List of Figures and Tables

Acknowledgments

xiii

Introduction

Chapter 1 Getting Started with the Command Line

(6)

Finding the Command Line

(3)

Mac

(1)

Windows

(2)

Meet the Command Line

(3)

Chapter 2 Command Line Concepts

(16)

Two Powerful Symbols

(3)

Direct Output to a File (Greater than Symbol)

(1)

Direct Output to Another Program (Pipe Symbol)

(1)

Command Substitution

(2)

Regular Expressions---The Swiss Army Knife for Data

(9)

Literal Characters

(1)

Special Characters

(1)

Wildcard Characters

(3)

Logical Operators

(1)

Grouping

(4)

Scripting

(2)

Chapter 3 Understanding Formats

(12)

David Forero

Chapter 4 Simplify Complicated Problems

(14)

Isolating Specific Data Elements

(6)

Converting Data into Formats That Are Easier to Work With

(7)

Chapter 5 Delimited Text

(8)

CSV (Comma Separated Values)

(4)

Commas and Quotation Marks in CSV Files

(2)

Multiline Fields in CSV Files

(2)

Multivalued Fields in Delimited Files

(2)

Chapter 6 XML

(40)

So What Is XML, Really?

(2)

What Makes XML So Useful?

(1)

Why Is XML So Easy?

(14)

DOM (Document Object Model)

(1)

XPath

(2)

XSLT (extensible Stylesheet Language Transformations)

(7)

Working with Large XML Files

(1)

Working with Complex XML Files

(13)

XmlStarlet

(1)

Installing XmlStarlet

(1)

Converting XML Documents

(6)

Chapter 7 JSON (JavaScript Object Notation)

(16)

Chapter 8 Scripting

113

(10)

Variables

115

(1)

Arguments

116

(1)

Conditional Execution

117

(2)

Loops

119

(4)

Chapter 9 Solving Common Problems

123

(10)

Viewing Large Files

123

(1)

Locating Files That Contain Particular Data

123

(1)

Finding Files with Specific Characteristics

124

(1)

Working with Internal Metadata

124

(2)

Working with APIs

126

(4)

Combining Data from Different Sources

130

(1)

Other Tasks

131

(2)

Chapter 10 Conclusions

133

(16)

One-Line Wonders

136

(1)

Locating, Viewing, and Performing Basic File Operations

137

(1)

Combine Information from Multiple Files into a Single File

137

(1)

Combine Three Files, Each Consisting of a Single Column, into a Three-Column Table

137

(1)

Extract 1,000 Random Lines or Records from a File

137

(1)

Find Files with Specific Characteristics

137

(2)

Find All Lines in All Files in the Current Directory as Well as All Subdirectories Containing a Regular Expression

137

(1)

Identify All Files in Current Directories and Subdirectories That Contain a Value

137

(1)

List All Files in Current Directory and Subdirectories over a 100 MB in Order of Decreasing Size

138

(1)

List the Names, Pixel Dimensions, and File Sizes of All Files in the Current Directory and Subdirectories in Tab Delimited Format

138

(1)

Print Line Number of File That Match Occurred On

138

(1)

Split Large Files into Smaller Chunks with Each File Breaking on a Line

138

(1)

View 200 Characters Starting at Position 385621 in a File

138

(1)

View Lines 4369--4374 of a File

138

(1)

Retrieving and Sending Information over a Network

139

(1)

Retrieve a Document from the Web and Send It to a File

139

(1)

Send an XML Document to an API Requiring HTTP Authentication

139

(1)

Sorting, Counting, Deduplication, and File Comparison

139

(1)

Combine Two Files on a Common Field

139

(1)

Compare Two Sorted Files

139

(1)

Count Occurrences for Each Entry in a File, Listed in Order of Decreasing Frequency

139

(1)

Count Records Containing an Expression

139

(1)

Count Words, Lines, and Characters in File

140

(1)

Identify All Unique Entries and Supply a Count of How Many Times Each Occurs

140

(1)

Sort a File and Remove Duplicates, Show Only Duplicated Entries, or Show Only Unique Entries

140

(1)

Useful Scripting Operations

140

(2)

Capture Parameters Passed to a Script

140

(1)

Divide a Line into Parameters

140

(1)

Iterate through Every Item in Parameter List

140

(1)

Perform a Loop

141

(1)

Perform an Operation Conditionally

141

(1)

Run a Script on Every Line of a File

141

(1)

Send the Output of a Command as Arguments to Another Command

141

(1)

Send the Output of a Command to Another Command

141

(1)

Send the Output of a Command to a File

141

(1)

Store the Output of a Command in a Variable

141

(1)

Use Foreign Character Sets in a Terminal Window

141

(1)

Transforming Text

142

(2)

Convert File of Dates to YYYY-MM-DD Format

142

(1)

Convert to Title Case

142

(1)

Convert to Upper Case

142

(1)

Convert List of Names from Direct Order to Indirect Order

142

(1)

Extract and Manipulate All Lines in a File That Match a Complex Pattern

143

(1)

Extract and Manipulate All Entries in All Files in an Entire Directory Hierarchy That Match a Pattern

143

(1)

Remove Lines from a File That Match a Pattern

143

(1)

Remove Carriage Return Characters Inserted by Windows Programs from a File

143

(1)

Remove Newline Characters from a File

143

(1)

Replace Newlines in a File with Character 7 (Bell)

144

(1)

Replace Search_Expr with Replace_Expr Only on Lines That Contain condition_Expr

144

(1)

Replace Search_Expr with Replace_Expr Except on Lines That Contain Condition_Expr

144

(1)

Replace Smart Quotes with Straight Quotes

144

(1)

Working with Delimited Files

144

(2)

Convert Comma Delimited File Where Some Values Are Quoted and Some Values Are Not to Tab Delimited

144

(1)

Convert Multiline Records to Table

145

(1)

Extract Individual Fields from Files

145

(1)

Find the Most Common Values in the Second Field of a File

145

(1)

Find All Lines in Tab Delimited File Not Containing Six Fields

146

(1)

Fix Delimited File That Contains Line Breaks in Fields

146

(1)

Remove Trailing and Leading Whitespace from Tab Delimited Data Fields

146

(1)

Reorder Fields in a Tab Delimited File

146

(1)

Working with JSON and XML

146

(3)

Add an Attribute to an XML Document

146

(1)

Add an Element to an XML Document

146

(1)

Apply XSLT Stylesheet to XML Document

146

(1)

Convert JSON to Tab Delimited Format

146

(1)

Delete Elements, Attributes, or Values Based on XPath Expressions

146

(1)

Display Structure of XML File

147

(1)

Pretty Print JSON Document

147

(1)

Pretty Print XML Document

147

(2)

Glossary

149

(2)

Symbols That Perform Important Tasks

151

(1)

Useful Commands

152

(3)

Regular Expression Cheat Sheet

155

(2)

Index

157

Kyle Banerjee has wrangled data for diverse purposes in academic, government, and nonprofit environments since 1996. A firm believer that understanding people is the key to building services of the future from the systems and data of the past, his professional interests revolve around understanding workflows and identifying opportunities in data previously thought inconsistent or incomplete. He has published several books and numerous articles on a variety of topics related to applying technology in library settings.

Data Wrangler's Handbook: Simple Tools for Powerful Results [Pehme köide]

Arvustused

Konto & seaded

Otsing

Otsingu andmebaas

Filtreeri tulemusi

Teemad Ingliskeelsed raamatud

Vali ostukorv