Muutke küpsiste eelistusi

Practical Natural Language Processing: A Comprehensive Guide to Building Real-World NLP Systems [Pehme köide]

  • Formaat: Paperback / softback, 375 pages, kõrgus x laius: 233x178 mm
  • Ilmumisaeg: 30-Jun-2020
  • Kirjastus: O'Reilly Media
  • ISBN-10: 1492054054
  • ISBN-13: 9781492054054
  • Pehme köide
  • Hind: 75,81 €*
  • * hind on lõplik, st. muud allahindlused enam ei rakendu
  • Tavahind: 89,19 €
  • Säästad 15%
  • Raamatu kohalejõudmiseks kirjastusest kulub orienteeruvalt 2-4 nädalat
  • Kogus:
  • Lisa ostukorvi
  • Tasuta tarne
  • Tellimisaeg 2-4 nädalat
  • Lisa soovinimekirja
  • Formaat: Paperback / softback, 375 pages, kõrgus x laius: 233x178 mm
  • Ilmumisaeg: 30-Jun-2020
  • Kirjastus: O'Reilly Media
  • ISBN-10: 1492054054
  • ISBN-13: 9781492054054
Many books and courses tackle natural language processing (NLP) problems with toy use cases and well-defined datasets. But if you want to build, iterate, and scale NLP systems in a business setting and tailor them for particular industry verticals, this is your guide. Software engineers and data scientists will learn how to navigate the maze of options available at each step of the journey.

Through the course of the book, authors Sowmya Vajjala, Bodhisattwa Majumder, Anuj Gupta, and Harshit Surana will guide you through the process of building real-world NLP solutions embedded in larger product setups. Youll learn how to adapt your solutions for different industry verticals such as healthcare, social media, and retail.

With this book, youll:

Understand the wide spectrum of problem statements, tasks, and solution approaches within NLP Implement and evaluate different NLP applications using machine learning and deep learning methods Fine-tune your NLP solution based on your business problem and industry vertical Evaluate various algorithms and approaches for NLP product tasks, datasets, and stages Produce software solutions following best practices around release, deployment, and DevOps for NLP systems Understand best practices, opportunities, and the roadmap for NLP from a business and product leaders perspective
Foreword xv
Preface xvii
Part I Foundations
1 NLP: A Primer
3(34)
NLP in the Real World
5(3)
NLP Tasks
6(2)
What Is Language?
8(6)
Building Blocks of Language
9(3)
Why Is NLP Challenging?
12(2)
Machine Learning, Deep Learning, and NLP: An Overview
14(2)
Approaches to NLP
16(15)
Heuristics-Based NLP
16(3)
Machine Learning for NLP
19(3)
Deep Learning for NLP
22(6)
Why Deep Learning Is Not Yet the Silver Bullet for NLP
28(3)
An NLP Walkthrough: Conversational Agents
31(2)
Wrapping Up
33(4)
2 NLP Pipeline
37(44)
Data Acquisition
39(3)
Text Extraction and Cleanup
42(7)
HTML Parsing and Cleanup
44(1)
Unicode Normalization
45(1)
Spelling Correction
46(1)
System-Specific Error Correction
47(2)
Pre-Processing
49(11)
Preliminaries
50(2)
Frequent Steps
52(3)
Other Pre-Processing Steps
55(2)
Advanced Processing
57(3)
Feature Engineering
60(2)
Classical NLP/ML Pipeline
62(1)
DL Pipeline
62(1)
Modeling
62(6)
Start with Simple Heuristics
63(1)
Building Your Model
64(1)
Building THE Model
65(3)
Evaluation
68(4)
Intrinsic Evaluation
68(3)
Extrinsic Evaluation
71(1)
Post-Modeling Phases
72(1)
Deployment
72(1)
Monitoring
72(1)
Model Updating
73(1)
Working with Other Languages
73(1)
Case Study
74(2)
Wrapping Up
76(5)
3 Text Representation
81(38)
Vector Space Models
84(1)
Basic Vectorization Approaches
85(7)
One-Hot Encoding
85(2)
Bag of Words
87(2)
Bag of N-Grams
89(1)
TF-IDF
90(2)
Distributed Representations
92(13)
Word Embeddings
94(9)
Going Beyond Words
103(2)
Distributed Representations Beyond Words and Characters
105(2)
Universal Text Representations
107(1)
Visualizing Embeddings
108(4)
Handcrafted Feature Representations
112(1)
Wrapping Up
113(6)
Part II Essentials
4 Text Classification
119(42)
Applications
121(2)
A Pipeline for Building Text Classification Systems
123(3)
A Simple Classifier Without the Text Classification Pipeline
125(1)
Using Existing Text Classification APIs
126(1)
One Pipeline, Many Classifiers
126(8)
Naive Bayes Classifier
127(4)
Logistic Regression
131(1)
Support Vector Machine
132(2)
Using Neural Embeddings in Text Classification
134(6)
Word Embeddings
134(2)
Subword Embeddings and fastText
136(2)
Document Embeddings
138(2)
Deep Learning for Text Classification
140(7)
CNNs for Text Classification
143(1)
LSTMs for Text Classification
144(1)
Text Classification with Large, Pre-Trained Language Models
145(2)
Interpreting Text Classification Models
147(2)
Explaining Classifier Predictions with Lime
148(1)
Learning with No or Less Data and Adapting to New Domains
149(3)
No Training Data
149(1)
Less Training Data: Active Learning and Domain Adaptation
150(2)
Case Study: Corporate Ticketing
152(3)
Practical Advice
155(2)
Wrapping Up
157(4)
5 Information Extraction
161(38)
IE Applications
162(2)
IE Tasks
164(1)
The General Pipeline for IE
165(1)
Keyphrase Extraction
166(3)
Implementing KPE
167(1)
Practical Advice
168(1)
Named Entity Recognition
169(9)
Building an NER System
171(4)
NER Using an Existing Library
175(1)
NER Using Active Learning
176(1)
Practical Advice
177(1)
Named Entity Disambiguation and Linking
178(3)
NEL Using Azure API
179(2)
Relationship Extraction
181(4)
Approaches to RE
182(2)
RE with the Watson API
184(1)
Other Advanced IE Tasks
185(5)
Temporal Information Extraction
186(1)
Event Extraction
187(2)
Template Filling
189(1)
Case Study
190(3)
Wrapping Up
193(6)
6 Chatbots
199(42)
Applications
200(2)
A Simple FAQ Bot
201(1)
A Taxonomy of Chatbots
202(3)
Goal-Oriented Dialog
204(1)
Chitchats
204(1)
A Pipeline for Building Dialog Systems
205(1)
Dialog Systems in Detail
206(12)
PizzaStop Chatbot
208(10)
Deep Dive into Components of a Dialog System
218(8)
Dialog Act Classification
219(1)
Identifying Slots
219(1)
Response Generation
220(1)
Dialog Examples with Code Walkthrough
221(5)
Other Dialog Pipelines
226(3)
End-to-End Approach
227(1)
Deep Reinforcement Learning for Dialogue Generation
227(1)
Human-in-the-Loop
228(1)
Rasa NLU
229(3)
A Case Study: Recipe Recommendations
232(4)
Utilizing Existing Frameworks
233(2)
Open-Ended Generative Chatbots
235(1)
Wrapping Up
236(5)
7 Topics in Brief
241(36)
Search and Information Retrieval
243(9)
Components of a Search Engine
245(3)
A Typical Enterprise Search Pipeline
248(1)
Setting Up a Search Engine: An Example
249(2)
A Case Study: Book Store Search
251(1)
Topic Modeling
252(6)
Training a Topic Model: An Example
256(1)
What's Next?
257(1)
Text Summarization
258(4)
Summarization Use Cases
258(1)
Setting Up a Summarizer: An Example
259(1)
Practical Advice
260(2)
Recommender Systems for Textual Data
262(3)
Creating a Book Recommender System: An Example
263(1)
Practical Advice
264(1)
Machine Translation
265(3)
Using a Machine Translation API: An Example
266(1)
Practical Advice
267(1)
Question-Answering Systems
268(3)
Developing a Custom Question-Answering System
270(1)
Looking for Deeper Answers
270(1)
Wrapping Up
271(6)
Part III Applied
8 Social Media
277(32)
Applications
279(1)
Unique Challenges
280(6)
NLP for Social Data
286(15)
Word Cloud
286(2)
Tokenizer for SMTD
288(1)
Trending Topics
288(2)
Understanding Twitter Sentiment
290(2)
Pre-Processing SMTD
292(4)
Text Representation for SMTD
296(3)
Customer Support on Social Channels
299(2)
Memes and Fake News
301(3)
Identifying Memes
301(1)
Fake News
302(2)
Wrapping Up
304(5)
9 E-Commerce and Retail
309(32)
E-Commerce Catalog
310(1)
Review Analysis
310(1)
Product Search
311(1)
Product Recommendations
311(1)
Search in E-Commerce
311(3)
Building an E-Commerce Catalog
314(12)
Attribute Extraction
314(5)
Product Categorization and Taxonomy
319(4)
Product Enrichment
323(2)
Product Deduplication and Matching
325(1)
Review Analysis
326(8)
Sentiment Analysis
327(2)
Aspect-Level Sentiment Analysis
329(2)
Connecting Overall Ratings to Aspects
331(1)
Understanding Aspects
332(2)
Recommendations for E-Commerce
334(4)
A Case Study: Substitutes and Complements
335(3)
Wrapping Up
338(3)
10 Healthcare, Finance, and Law
341(32)
Healthcare
341(19)
Health and Medical Records
343(1)
Patient Prioritization and Billing
344(1)
Pharmacovigilance
344(1)
Clinical Decision Support Systems
344(1)
Health Assistants
344(2)
Electronic Health Records
346(9)
Mental Healthcare Monitoring
355(2)
Medical Information Extraction and Analysis
357(3)
Finance and Law
360(8)
NLP Applications in Finance
362(3)
NLP and the Legal Landscape
365(3)
Wrapping Up
368(5)
Part IV Bringing It All Together
11 The End-to-End NLP Process
373(36)
Revisiting the NLP Pipeline: Deploying NLP Software
374(4)
An Example Scenario
376(2)
Building and Maintaining a Mature System
378(12)
Finding Better Features
379(1)
Iterating Existing Models
380(1)
Code and Model Reproducibility
381(1)
Troubleshooting and Interpretability
381(3)
Monitoring
384(1)
Minimizing Technical Debt
385(1)
Automating Machine Learning
386(4)
The Data Science Process
390(4)
The KDD Process
390(2)
Microsoft Team Data Science Process
392(2)
Making AI Succeed at Your Organization
394(6)
Team
394(1)
Right Problem and Right Expectations
395(1)
Data and Timing
396(1)
A Good Process
397(1)
Other Aspects
398(2)
Peeking over the Horizon
400(3)
Final Words
403(6)
Index 409
Sowmya Vajjala has a PhD in Computational Linguistics from University of Tubingen, Germany. She currently works as a research officer at National Research Council, Canada's largest federal research and development organization. Her past work experience spans both academia as a faculty at Iowa State University, USA as well as industry at Microsoft Research and The Globe and Mail.

Bodhisattwa Majumder is a doctoral candidate in NLP and ML at UC San Diego. Earlier he studied at IIT Kharagpur where he graduated summa cum laude. Previously, he built large-scale NLP systems at Google AI Research and Microsoft Research, which went into products serving millions of users. Currently, he is also leading his university team in the Amazon Alexa Prize for 2019-2020.

Anuj Gupta has built NLP and ML systems at Fortune 100 companies as well as startups as a senior leader. He has incubated and led multiple ML teams in his career. He studied computer science at IIT Delhi and IIIT Hyderabad. He is currently Head of Machine Learning and Data Science at Vahan Inc. Above all, he is a father and husband.

Harshit Surana is founder at DeepFlux Inc. He has built and scaled ML systems at several Silicon Valley startups as a founder and an advisor. He studied computer science at Carnegie Mellon University where he worked with the MIT Media Lab on common sense AI. His research in NLP has received over 200 citations.