About the Authors |
|
ix | |
About the Technical Reviewer |
|
xi | |
Introduction |
|
xiii | |
|
Part I Web Scraping Basics |
|
|
1 | (78) |
|
|
3 | (22) |
|
1.1 What Is Web Scraping? |
|
|
3 | (5) |
|
1.1.1 Why Web Scraping for Data Science? |
|
|
4 | (1) |
|
1.1.2 Who Is Using Web Scraping? |
|
|
5 | (3) |
|
|
8 | (17) |
|
|
8 | (1) |
|
1.2.2 A Quick Python Primer |
|
|
9 | (16) |
|
Chapter 2 The Web Speaks HTTP |
|
|
25 | (24) |
|
2.1 The Magic of Networking |
|
|
25 | (3) |
|
2.2 The Hypertext Transfer Protocol: HTTP |
|
|
28 | (6) |
|
2.3 HTTP in Python: The Requests Library |
|
|
34 | (5) |
|
2.4 Query Strings: URLs with Parameters |
|
|
39 | (10) |
|
Chapter 3 Stirring the HTML and CSS Soup |
|
|
49 | (30) |
|
3.1 Hypertext Markup Language: HTML |
|
|
49 | (2) |
|
3.2 Using Your Browser as a Development Tool |
|
|
51 | (5) |
|
3.3 Cascading Style Sheets: CSS |
|
|
56 | (5) |
|
3.4 The Beautiful Soup Library |
|
|
61 | (11) |
|
3.5 More on Beautiful Soup |
|
|
72 | (7) |
|
Part II Advanced Web Scraping |
|
|
79 | (94) |
|
Chapter 4 Delving Deeper in HTTP |
|
|
81 | (46) |
|
4.1 Working with Forms and POST Requests |
|
|
81 | (16) |
|
4.2 Other HTTP Request Methods |
|
|
97 | (3) |
|
|
100 | (8) |
|
|
108 | (11) |
|
4.5 Using Sessions with Requests |
|
|
119 | (2) |
|
4.6 Binary, JSON, and Other Forms of Content |
|
|
121 | (6) |
|
Chapter 5 Dealing with JavaScript |
|
|
127 | (28) |
|
|
127 | (1) |
|
|
128 | (6) |
|
5.3 Scraping with Selenium |
|
|
134 | (14) |
|
|
148 | (7) |
|
Chapter 6 From Web Scraping to Web Crawling |
|
|
155 | (18) |
|
6.1 What Is Web Crawling? |
|
|
155 | (3) |
|
6.2 Web Crawling in Python |
|
|
158 | (3) |
|
6.3 Storing Results in a Database |
|
|
161 | (12) |
|
Part III Managerial Concerns and Best Practices |
|
|
173 | (126) |
|
Chapter 7 Managerial and Legal Concerns |
|
|
175 | (12) |
|
7.1 The Data Science Process |
|
|
175 | (4) |
|
7.2 Where Does Web Scraping Fit In? |
|
|
179 | (2) |
|
|
181 | (6) |
|
|
187 | (10) |
|
|
187 | (6) |
|
8.1.1 Alternative Python Libraries |
|
|
187 | (1) |
|
|
188 | (1) |
|
|
188 | (1) |
|
|
189 | (1) |
|
8.1.5 Scraping in Other Programming Languages |
|
|
190 | (1) |
|
|
191 | (1) |
|
8.1.7 Graphical Scraping Tools |
|
|
191 | (2) |
|
8.2 Best Practices and Tips |
|
|
193 | (4) |
|
|
197 | (102) |
|
|
199 | (2) |
|
9.2 Using the Hacker News API |
|
|
201 | (1) |
|
|
202 | (4) |
|
|
206 | (3) |
|
9.5 Scraping GitHub Stars |
|
|
209 | (5) |
|
9.6 Scraping Mortgage Rates |
|
|
214 | (6) |
|
9.7 Scraping and Visualizing IMDB Ratings |
|
|
220 | (2) |
|
9.8 Scraping IATA Airline Information |
|
|
222 | (6) |
|
9.9 Scraping and Analyzing Web Forum Interactions |
|
|
228 | (9) |
|
9.10 Collecting and Clustering a Fashion Data Set |
|
|
237 | (4) |
|
9.11 Sentiment Analysis of Scraped Amazon Reviews |
|
|
241 | (11) |
|
9.12 Scraping and Analyzing News Articles |
|
|
252 | (19) |
|
9.13 Scraping and Analyzing a Wikipedia Graph |
|
|
271 | (7) |
|
9.14 Scraping and Visualizing a Board Members Graph |
|
|
278 | (3) |
|
9.15 Breaking CAPTCHA's Using Deep Learning |
|
|
281 | (18) |
Index |
|
299 | |