Simple Web Scraping Using Python 3 (Part 1)

Requirements

Installation

easy_install pippip install beautifulsoup4
python -m pip install beautifulsoup4

Version Check

Basic HTML

Web Scraping Rules & Regulations

Acknowledgment

Inspecting the Page

Step 1

<div class="quotes">
<p class="aquote">
I hear and i forget.<br> I see and i remember.<br> I do and i understand.
</p>
<p class="author">
Confucious
</p>
</div>

Code Begins

Step 1

from bs4 import BeautifulSoup as soup from urllib.request import urlopen as uReq

Step 2:

page='http://adilshehzad.me/my-fav-quotes-for-webcrawling-test'

Step 3

client=uReq(page) 

Step 4

page_html=client.read()

Step 5

page_soup=soup(page_html,"html.parser")

Step 6

quotes=page_soup.findAll("div" ,{"class":"quotes"})
for quote in quotes:
fav_quote=quote.findAll("p" ,{"class":"aquote"})
aquote=fav_quote[0].text.strip()

fav_author=quote.findAll("p" ,{"class":"author"})
author=fav_author[0].text.strip()
print(aquote)
print(author)
file1 = open('output.txt', 'a')
print(aquote, file=file1)
print(author, file=file1)

Wrapping Up

DevOps Engineer - Author