Web Scraping using IPython

Introduction

IPython is a command shell for interactive computing in multiple programming languages, originally developed for the Python programming language, that offers introspection, rich media, shell syntax, tab completion, and history.

IPython provides a great interactive shell for user to play with code. When you have to test some sample python code or want to test a script, IPython is the “Thing” for you. I will be doing web scraping in this post using IPython.

NOTE- This is tested on Ubuntu 14.04.

Installation

There are several ways to install IPython. I am going to install it via pip. To install pip run-
$ sudo apt-get install python-pip

It will install pip ( tool to install python packages ). To install ipython via pip, run –
$ pip install ipython

It will install ipython on your machine. We can also install ipython using apt-get but we will need pip to install packages needed to do web scraping. This is the reason i have used pip to install ipython

Python packages Required

For this blog, we need two python packages-

Python’s request module to send and receive GET requests. To install it, run-

$ pip install requests

Beautiful Soup module to extract information from HTML page. To install it, run-

$ pip install bs4

With this much done, lets start Scraping.

Let the Fun Begin

For this example, I am going to get the “Latest News” in Student Corner from "http://gndec.ac.in/".

First of all start ipython using command-

$ ipython

It will open an ipython interpreter ( You can also save code in a script and run it). Then enter the code below, one line per time, pressing enter after each line –

import requests, os, bs4

url = "http://gndec.ac.in"

req = requests.get(url)

soup = bs4.BeautifulSoup(req.text, 'html.parser')

output = soup.select("#block-block-15")[0].select(".content")[0].find_all("p")[0].find_all("span")[0].getText()

print output
You will have all the latest news in your Terminal!

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s