170
1

Scrape All URLs Of Websites Using Python | Web Scraping Python | Python Projects

170
1
Scrape All URLs Of Websites Using Python | Web Scraping Python | Python Projects

What is Web Scraping in Python?

 Web scraping in Python is one of the most useful python projects.

Web scraping is an automated process that extracts data from websites. Web scraping can be done in many languages like Python, PHP, Java, etc. Python is one of the most popular and widely used programming languages for web scraping.

Web scraping is the process of collecting data from websites.

Or

Web scraping is the process of extracting data from web pages through code.

Web scraping can be used for many purposes, such as web search engines, analyzing market trends, and extracting information from web pages. It has been around for decades and is one of the most popular ways to gather information. Web scraping has a lot of advantages over other methods, such as being fast, cheap, and scalable.

Python provides a library called “Scrapy” to scrape websites. It is a relatively easy process with Scrapy. All you need to do is install the library and use it in your Python code to scrape all URLs from a website. It can be used for many purposes like data mining, statistics, market research, and business intelligence. It is a technique that programmers use to gather data from a website by parsing the HTML and XML code behind it.

Python programming language is a versatile and powerful programming language that can be used for many different purposes. Web scraping with Python can be accomplished with the help of libraries like BeautifulSoup or Scrapy.

In this article, we’ve created a very simple web scraper using python programming language to Scrape All URLs Of Websites.

Requirements

  • Any Code editor or IDE (Pycharm or VS code).
  • Python Interpreter.
  • pip install BeautifulSoup4
  • pip install requests

Source Code

from bs4 import BeautifulSoup
import requests

# creating empty list
urls = []

# function created
def scrape(site):
    # getting the request from url
    r = requests.get(site)

    # converting the text
    s = BeautifulSoup(r.text, "html.parser")
    for i in s.find_all("a"):
        href = i.attrs['href']
        if href.startswith("/"):
            site = site + href
            if site not in urls:
                urls.append(site)
                print(site)
                # calling the scrape function itself
                # generally called recursion
                scrape(site)

# main function
if __name__ == "__main__":
    site = "http://example.webscraping.com"
    scrape(site)

Demo Video

 

If you liked this, click the 💚 below so other people will see this here on Xalgord. Please let me know if you have any comments! Feel free to connect on Instagram.

xalgord
WRITTEN BY

xalgord

Highly motivated, Hard-working, and Resourceful Programmer. I Love To Explain And Teach Technology, Solve Tech Problems And Learn Something New Every Day.

Leave a Reply

One thought on “Scrape All URLs Of Websites Using Python | Web Scraping Python | Python Projects

  1. Avatar of seraph

    There is an problem with this code. Calling the function within the loop only iterates and append the first link that startswith ‘/’ before calling itself over, and over again, as evident in the video.