In this mission, we’ll look at one of the best ways of getting unique data sets: web scraping. In the previous missions, we have been working with APIs.
In the Working with APIs mission, you discovered the advantages of using APIs to get data. A lot of data, however, isn't accessible through data sets or APIs, but it exists on some web page or collection of pages. One way to access the data without waiting for the provider to create an API or copying it manually by yourself is to use a technique called web scraping.
In this mission, we'll discover how to use web scraping to extract the data we want from a web page using Python and the
beautifulsoup library. We will also see the structure of a web page and use some basic HTML and CSS skills to aid us in web scraping.
We'll use the
requests library heavily as we learn about Web scraping. This library enables us to download web pages. The
beautifulsoup library will also be very important, as it allows us to more easily to extract the relevant parts of each web page we download to get just the data that we want, without all of the superfluous code and other elements that might be present on the web page.
You’ll be able to do this web scraping and make use of both libraries from right within our browser-based platform, which means there’s no download or setup time required. Dive in, and you’ll be learning web scraping in Python in less than a minute!
2. Web Page Structure
3. Retrieving Elements from a Page
4. Using Find All
5. Element IDs
6. Element Classes
7. CSS Selectors
8. Using CSS Selectors
9. Nesting CSS Selectors
10. Using Nested CSS Selectors
11. Beyond the Basics