Muutke küpsiste eelistusi

Text Analytics with Python: A Practical Real-World Approach to Gaining Actionable Insights from your Data 1st ed. [Pehme köide]

  • Formaat: Paperback / softback, 385 pages, kõrgus x laius: 235x155 mm, kaal: 6204 g, 33 Illustrations, color; 21 Illustrations, black and white; XXI, 385 p. 54 illus., 33 illus. in color., 1 Paperback / softback
  • Ilmumisaeg: 01-Dec-2016
  • Kirjastus: APress
  • ISBN-10: 148422387X
  • ISBN-13: 9781484223871
  • Pehme köide
  • Hind: 54,54 €*
  • * saadame teile pakkumise kasutatud raamatule, mille hind võib erineda kodulehel olevast hinnast
  • See raamat on trükist otsas, kuid me saadame teile pakkumise kasutatud raamatule.
  • Kogus:
  • Lisa ostukorvi
  • Tasuta tarne
  • Lisa soovinimekirja
  • Formaat: Paperback / softback, 385 pages, kõrgus x laius: 235x155 mm, kaal: 6204 g, 33 Illustrations, color; 21 Illustrations, black and white; XXI, 385 p. 54 illus., 33 illus. in color., 1 Paperback / softback
  • Ilmumisaeg: 01-Dec-2016
  • Kirjastus: APress
  • ISBN-10: 148422387X
  • ISBN-13: 9781484223871

Derive useful insights from your data using Python. Learn the techniques related to natural language processing and text analytics, and gain the skills to know which technique is best suited to solve a particular problem.

Text Analytics with Python teaches you both basic and advanced concepts, including text and language syntax, structure, semantics. You will focus on algorithms and techniques, such as text classification, clustering, topic modeling, and text summarization. 

A structured and comprehensive approach is followed in this book so that readers with little or no experience do not find themselves overwhelmed. You will start with the basics of natural language and Python and move on to advanced analytical and machine learning concepts. You will look at each technique and algorithm with both a bird's eye view to understand how it can be used as well as with a microscopic view to understand the mathematical concepts and to implement them to solve your own problems.

This book:

  • Provides complete coverage of the major concepts and techniques of natural language processing (NLP) and text analytics
  • Includes practical real-world examples of techniques for implementation, such as building a text classification system to categorize news articles, analyzing app or game reviews using topic modeling and text summarization, and clustering popular movie synopses and analyzing the sentiment of movie reviews
  • Shows implementations based on Python and several popular open source libraries in NLP and text analytics, such as the natural language toolkit (nltk), gensim, scikit-learn, spaCy and Pattern


What you will learn: 
• Natural Language concepts
• Analyzing Text syntax and structure
• Text Classification
• Text Clustering and Similarity analysis
• Text Summarization 
• Semantic and Sentiment analysis
 
 
Readership :
The book is for IT professionals, analysts, developers, linguistic experts, data scientists, and anyone with a keen interest in linguistics, analytics, and generating insights from textual data.

About the Author xv
About the Technical Reviewer xvii
Acknowledgments xix
Introduction xxi
Chapter 1 Natural Language Basics
1(50)
Natural Language
2(6)
What Is Natural Language?
2(1)
The Philosophy of Language
2(3)
Language Acquisition and Usage
5(3)
Linguistics
8(2)
Language Syntax and Structure
10(15)
Words
11(1)
Phrases
12(2)
Clauses
14(1)
Grammar
15(8)
Word Order Typology
23(2)
Language Semantics
25(12)
Lexical Semantic Relations
25(3)
Semantic Networks and Models
28(1)
Representation of Semantics
29(8)
Text Corpora
37(9)
Corpora Annotation and Utilities
38(1)
Popular Corpora
39(1)
Accessing Text Corpora
40(6)
Natural Language Processing
46(3)
Machine Translation
46(1)
Speech Recognition Systems
47(1)
Question Answering Systems
47(1)
Contextual Recognition and Resolution
48(1)
Text Summarization
48(1)
Text Categorization
49(1)
Text Analytics
49(1)
Summary
50(1)
Chapter 2 Python Refresher
51(56)
Getting to Know Python
51(9)
The Zen of Python
54(1)
Applications: When Should You Use Python?
55(3)
Drawbacks: When Should You Not Use Python?
58(1)
Python Implementations and Versions
59(1)
Installation and Setup
60(6)
Which Python Version?
60(1)
Which Operating System?
61(1)
Integrated Development Environments
61(1)
Environment Setup
62(2)
Virtual Environments
64(2)
Python Syntax and Structure
66(3)
Data Structures and Types
69(9)
Numeric Types
70(2)
Strings
72(1)
Lists
73(1)
Sets
74(1)
Dictionaries
75(1)
Tuples
76(1)
Files
77(1)
Miscellaneous
78(1)
Controlling Code Flow
78(6)
Conditional Constructs
79(1)
Looping Constructs
80(2)
Handling Exceptions
82(2)
Functional Programming
84(7)
Functions
84(1)
Recursive Functions
85(1)
Anonymous Functions
86(1)
Iterators
87(1)
Comprehensions
88(2)
Generators
90(1)
The itertools and functools Modules
91(1)
Classes
91(3)
Working with Text
94(10)
String Literals
94(2)
String Operations and Methods
96(8)
Text Analytics Frameworks
104(2)
Summary
106(1)
Chapter 3 Processing and Understanding Text
107(60)
Text Tokenization
108(7)
Sentence Tokenization
108(4)
Word Tokenization
112(3)
Text Normalization
115(17)
Cleaning Text
115(1)
Tokenizing Text
116(1)
Removing Special Characters
116(2)
Expanding Contractions
118(1)
Case Conversions
119(1)
Removing Stopwords
120(1)
Correcting Words
121(7)
Stemming
128(3)
Lemmatization
131(1)
Understanding Text Syntax and Structure
132(33)
Installing Necessary Dependencies
133(1)
Important Machine Learning Concepts
134(1)
Parts of Speech (POS) Tagging
135(8)
Shallow Parsing
143(10)
Dependency-based Parsing
153(5)
Constituency-based Parsing
158(7)
Summary
165(2)
Chapter 4 Text Classification
167(50)
What Is Text Classification?
168(2)
Automated Text Classification
170(2)
Text Classification Blueprint
172(2)
Text Normalization
174(3)
Feature Extraction
177(16)
Bag of Words Model
179(2)
TF-IDF Model
181(6)
Advanced Word Vectorization Models
187(6)
Classification Algorithms
193(6)
Multinomial Naive Bayes
195(2)
Support Vector Machines
197(2)
Evaluating Classification Models
199(5)
Building a Multi-Class Classification System
204(10)
Applications and Uses
214(1)
Summary
215(2)
Chapter 5 Text Summarization
217(48)
Text Summarization and Information Extraction
218(2)
Important Concepts
220(3)
Documents
220(1)
Text Normalization
220(1)
Feature Extraction
221(1)
Feature Matrix
221(1)
Singular Value Decomposition
221(2)
Text Normalization
223(1)
Feature Extraction
224(1)
Keyphrase Extraction
225(9)
Collocations
226(4)
Weighted Tag-Based Phrase Extraction
230(4)
Topic Modeling
234(16)
Latent Semantic Indexing
235(6)
Latent Dirichlet Allocation
241(4)
Non-negative Matrix Factorization
245(1)
Extracting Topics from Product Reviews
246(4)
Automated Document Summarization
250(13)
Latent Semantic Analysis
253(3)
TextRank
256(5)
Summarizing a Product Description
261(2)
Summary
263(2)
Chapter 6 Text Similarity and Clustering
265(54)
Important Concepts
266(2)
Information Retrieval (IR)
266(1)
Feature Engineering
267(1)
Similarity Measures
267(1)
Unsupervised Machine Learning Algorithms
268(1)
Text Normalization
268(2)
Feature Extraction
270(1)
Text Similarity
271(1)
Analyzing Term Similarity
271(14)
Hamming Distance
274(1)
Manhattan Distance
275(2)
Euclidean Distance
277(1)
Levenshtein Edit Distance
278(5)
Cosine Distance and Similarity
283(2)
Analyzing Document Similarity
285(11)
Cosine Similarity
287(2)
Hellinger-Bhattacharya Distance
289(3)
Okapi BM25 Ranking
292(4)
Document Clustering
296(3)
Clustering Greatest Movies of All Time
299(18)
K-means Clustering
301(7)
Affinity Propagation
308(5)
Ward's Agglomerative Hierarchical Clustering
313(4)
Summary
317(2)
Chapter 7 Semantic and Sentiment Analysis
319(58)
Semantic Analysis
320(1)
Exploring WordNet
321(9)
Understanding Synsets
321(2)
Analyzing Lexical Semantic Relations
323(7)
Word Sense Disambiguation
330(2)
Named Entity Recognition
332(4)
Analyzing Semantic Representations
336(6)
Propositional Logic
336(2)
First Order Logic
338(4)
Sentiment Analysis
342(1)
Sentiment Analysis of IMDb Movie Reviews
343(33)
Setting Up Dependencies
343(4)
Preparing Datasets
347(1)
Supervised Machine Learning Technique
348(4)
Unsupervised Lexicon-based Techniques
352(22)
Comparing Model Performances
374(2)
Summary
376(1)
Index 377
Dipanjan Sarkar is a Data Scientist at Intel, the world's largest silicon company which is on a mission to make the world more connected and productive. He primarily works on Analytics, Business Intelligence, Application Development and building large scale Intelligent Systems. He received his master's degree in Information Technology from the International Institute of Information Technology, Bangalore with a focus on Data Science and Software Engineering. He is also an avid supporter of self-learning, especially Massive Open Online Courses and holds a Data Science Specialization from Johns Hopkins University on Coursera.





He has been an analytics practitioner for over 4 years now specializing in statistical, predictive and text analytics. He has also authored a couple of books on R and Machine Learning and occasionally reviews technical books and acts as a course beta tester for Coursera. Dipanjan's interests include learning about new technology, financial markets, disruptive start-ups, data science and more recently, artificial intelligence and deep learning. In his spare time he loves reading, gaming and watching popular sitcoms and football.