Python Web Scraping Tutorial
Python Web Scraping Tutorial

Hey Friends, hope you are doing well. In today’s article, I am going to share the python web scraping tutorial for absolute beginners. Before starting, first, let us understand the meaning of web scraping.

Web Scraping

Web scraping also called web data mining or web harvesting is the process of extracting, parsing, downloading and organizing useful information from the web automatically. It is very useful for data analysis. We can also collect the emails or any other valuable data from the website by using this technique.

Uses Of Web Scraping

The uses of web scraping are endless as www websites has many advantages:

  • Price Comparison: Services such as ParseHub use web scraping to collect data from online shopping websites and use it to compare the prices of products.
  • Email address gathering: Many companies that use email as a medium for marketing, use web scraping to collect email ID and then send bulk emails.
  • Social Media Data Collection: It is used to collect data from social websites.
  • Research and Development: Web scraping is used to collect a large set of data (Statistics, General Information, Temperature, etc.) from websites, which are analyzed and used to carry out Surveys or for R&D.
  • Job listings: Details regarding job openings, interviews are collected from different websites and then listed in one place so that it is easily accessible to the user.

What is Web Scraping? Is Web Scraping legal?

Web scraping is an automated method used to extract large amounts of data from websites. The data on the websites are unstructured. Web scraping helps collect these unstructured data and store it in a structured form. There are different ways to scrape websites such as online Services, APIs or writing your own code. In this article, we’ll see how to implement web scraping with python. 

Python Web Scraping Tutorial

Talking about whether web scraping is legal or not, some websites allow web scraping and some don’t. To know whether a website allows web scraping or not, you can look at the website’s “robots.txt” file. You can find this file by appending “/robots.txt” to the URL that you want to scrape. For this example, I am scraping Flipkart website. So, to see the “robots.txt” file, the URL is www.flipkart.com/robots.txt.

Read More: Top 10 Best Professional Web Design Software You Don’t Want to Miss

Why Python for Web Scraping?

  1. Ease of Use
  2. Large Collection of Libraries
  3. Easily Understandable Syntax
  4. Small code, a large task
  5. Large Community

How does Web Scraping work?

When you run code for web scrapping, then a request is sent to the URL that you have mentioned. As a response to the request, the server sends the data and allows you to read the HTML or XML page. The code then, parses the HTML or XML page, finds the data and extracts it. 

To extract data using web scraping with python, you need to follow these basic steps:

  1. Find the URL that you want to scrape
  2. Inspecting the Page
  3. Find the data you want to extract
  4. Write the code
  5. Run the code and extract the data
  6. Store the data in the required format 

Important libraries of python that we can use in web scraping:

Selenium: selenium is an automated web testing library that automates browser activities.

BeautifulSoup: Beautiful Soup is a python library for parsing HTML and XML documents. It creates parse tree that is helpful to extract the data easily.

Pandas: Pandas is another python library using for data manipulations and data analysis.

Demo: Python Web Scraping Tutorial

Here’s is the demo code for scrapping product details from amazon website:

Python Web Scraping Tutorial

import requests from bs4 import BeautifulSoup

URL = ‘https://www.amazon.in/Veirdo-Mens-Cotton-T-shirt-G4_BSR_BLACK_L_Black_Large/dp/B01MSTXWBS/ref=bbp_bb_d33a38_st_pbyk_w_0?psc=1&smid=APT08BGG6TKYO’

headers={“User-Agent”:’Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36′}
page = requests.get(URL,headers=headers)
soup=BeautifulSoup(page.content,’html.parser’)
title = soup.find(id=”productTitle”).get_text()
price=soup.find(id=”priceblock_saleprice”).get_text()
print(title.strip())

print(price.strip())

This scrapper code will return the product details from amazon website.