Python and the Web

The urllib2 module contains a number of utilities for simple access to data on the web.

For instance, the example below allows the user to enter a series of urls, and displays both the metadata and the contents of each of the documents.

import sys
import urllib2

# read a url
print "Enter the first url (or just enter to quit):"
chosenurl = sys.stdin.readline()

# keep processing urls as long as the user enters them
while (len(chosenurl) > 1):
	urlfile = urllib2.open(chosenurl)
        print "The associated metadata is: ", urlfile.info()
        print "The file content is:"
        for nextline in urlfile
		print nextline
	print "Enter the next url (or just enter to quit):"
	chosenurl = sys.stdin.readline()
   

Similarly, here is a simple Python script that allows you to access a url that requires basic authentication (i.e. the user is supposed to provide a username and password when requesting the url).

#! /usr/bin/env python

import urllib2, sys, base64

# get the url desired, as well as the username/password to be used
print "Enter the url:"
chosenurl = sys.stdin.readline()
print "Enter your username:"
username = sys.stdin.readline()
print "Enter the password (warning: this is not encrypted)"
password = sys.stdin.readline()

# request the url
request = urllib2.Request(url)

# respond to the authentication request with the name/pwd
base64string = base64.encodestring('%s:%s' % (username, password))[:-1]
request.add_header("Authorization", "Basic %s" % base64string)

# open the resulting resource and read its contents
htmlFile = urllib2.urlopen(request)
htmlData = htmlFile.read()
print htmlData
htmlFile.close()
Some of the other commonly used routines from urllib include:

Python and email

While the email module contains tools for handling more sophisticated message structures, even the smtplib module contains a number of mail handling utilities, e.g. to send mail:

import smptlib
server = smptlib.SMPT('localhost')
server.sendmail('someonesending@somewhere', 'someonegetting@somewhereelse', 
"""To: someonegetting@somewhereelse
From: someonesending@somewhere

The body of the email.
""")
server.quit()

Python for CGI

Output is typically generated by a Python CGI script simply using the print statement, e.g.

print "Content-type: text/html"
print " "
print "<html><body>"
print "Hi!"
print "</body></html>"
For obtaining form data passed to the CGI script, life is simplest if we use the Python CGI module: import cgi

Submitted form data is available via the FieldStorage class, so we can capture that information with statements like: myformdata = cgi.FieldStorage()

To extract the data from form field "name", we can use statements like:
namevalue = myformdata["name"].value

Note that sometimes a field contains multiple values, so we can check to see if it is a list or not:
if isinstance(namevalue, list):

There are a huge number of other possibilities, but this should at least give you a start at Python CGI.

Cookie handling in Python

Naturally Python also provides support for handling cookies, here is a very simple example showing how to check for existing cookies, modify them if necessary, and create them if they don't exist.

#! /usr/bin/env python

# grab the modules we'll use
import os, cgi, Cookie, time
from Cookie import SimpleCookie


# Looking up, creating, and adjusting simple cookies
# ==================================================
# a cookie that counts how often the user has visited this page

# check to see if there is already a defined cookie
if os.environ.has_key('HTTP_COOKIE'):
	countercookie = SimpleCookie(os.environ['HTTP_COOKIE'])

# otherwise we'll define one here
else:
	countercookie = SimpleCookie()

# if a value already exists for a cookie named 'counter' grab that
#    otherwise we want to set one with value 0
initialvalues = {'counter': 0 }
for key in initialvalues.keys():
	if not countercookie.has_key(key):
		countercookie[key] = initialvalues[key]

# increment the 'counter' component of the cookie by 1
countercookie['counter'] = int(countercookie['counter'].value) + 1

# print our HTML form, specifying the number of visits
print "Content-type: text/html"
print
print " Visit number:  "
print countercookie['counter'].value 
print " "

# if we want to set an expiry date as N seconds in the future,
#    simply use the following (here with N = 86400 seconds, or 24 hours)
countercookie['counter']['expires'] = 86400

# Aside: here's how to grab the current time 
#      if you're planning on using/manipulating that
currtime = time.gmtime(time.time())