Python

Try to master Python. Don’t merely try to learn enough to complete the problem sets or pass the exams. Your goal should be to become fluent in Python. In that regard, it is like learning French or Chinese. If you want to converse with native speakers, you need to learn more than the dialogs or exams.

Google Python Class

Introduction https://developers.google.com/edu/python/introduction

Python 2 vs 3

A non-exhaustive list of features which are only available in 3.x releases and won't be backported to the 2.x series:

strings are Unicode by default
clean Unicode/bytes separation
exception chaining
function annotations
syntax for keyword-only arguments
extended tuple unpacking
non-local variable declarations

Interactive interpreter (like perl, scheme) - not compiled (like C, C++, Java)
no type declarations
- a = 9
- a = ‘a string’
operator overloading
- a = 9 + 3
- a = “hello” + “ world”
- b = 3 * 4
- b = ‘hello ‘ * 3
Source code
- .py extension
- .pyc for bytecode
- #! /usr/bin/python (shebang in UNIX)
- # comment character

if __name__ == '__main__':

main()

import
- import sys
- from sys import *
command line arguments
- import sys
- sys.argv[1]
- len(sys.argv) is like argc in C
user defined functions
- def name(args):
- indentation, not {}
code check at runtime, not compile time
naming conventions and style: Python Enhancement Proposal (PEP) 8
modules and namespaces
- from sys import argv, exit
Standard modules http://docs.python.org/library
Online help
- Google: python string lowercase
- http://docs.python.org/
- StackOverflow: http://stackoverflow.com/questions/tagged/python
help(len)
help(sys)
dir(sys)

Strings https://developers.google.com/edu/python/strings

str class
zero-based indexing
single, double, or triple quotes
- ‘hello world’
- “hello world”
- “””

hello

world

“””

backslash to quote
- ‘don\’t quote me’
str() to convert number to string
- ‘happy new year, ‘ + str(2016)

string methods:

s.lower(), s.upper() -- returns the lowercase or uppercase version of the string
s.strip() -- returns a string with whitespace removed from the start and end
s.isalpha()/s.isdigit()/s.isspace()... -- tests if all the string chars are in the various character classes
s.startswith('other'), s.endswith('other') -- tests if the string starts or ends with the given other string
s.find('other') -- searches for the given other string (not a regular expression) within s, and returns the first index where it begins or -1 if not found
s.replace('old', 'new') -- returns a string where all occurrences of 'old' have been replaced by 'new'
s.split('delim') -- returns a list of substrings separated by the given delimiter. The delimiter is not a regular expression, it's just text. 'aaa,bbb,ccc'.split(',') -> ['aaa', 'bbb', 'ccc']. As a convenient special case s.split() (with no arguments) splits on all whitespace chars.
s.join(list) -- opposite of split(), joins the elements in the given list together using the string as the delimiter. e.g. '---'.join(['aaa', 'bbb', 'ccc']) -> aaa---bbb---ccc

String slices

s[1:4] is 'ell' -- chars starting at index 1 and extending up to but not including index 4
s[1:] is 'ello' -- omitting either index defaults to the start or end of the string
s[:] is 'Hello' -- omitting both always gives us a copy of the whole thing (this is the pythonic way to copy a sequence like a string or list)
s[1:100] is 'ello' -- an index that is too big is truncated down to the string length

QUESTION: what is ‘hello’[:-1]

String %
if statement
- if / elif / else
Exercise: string1.py
ord() chr()

Lists https://developers.google.com/edu/python/lists

zero-based
[] indexing
len() works as with strings
for and in

squares = [1, 4, 9, 16]

sum = 0

for num in squares:

sum += num

print sum ## 30

in to check membership

list = ['larry', 'curly', 'moe']

if 'curly' in list:

print 'yay'

range
- for i in range(100):
- print i
While loop

i = 0

while i < len(a):

print a[i]

i = i + 3

List Methods

Here are some other common list methods.

list.append(elem) -- adds a single element to the end of the list. Common error: does not return the new list, just modifies the original.
list.insert(index, elem) -- inserts the element at the given index, shifting elements to the right.
list.extend(list2) adds the elements in list2 to the end of the list. Using + or += on a list is similar to using extend().
list.index(elem) -- searches for the given element from the start of the list and returns its index. Throws a ValueError if the element does not appear (use "in" to check without a ValueError).
list.remove(elem) -- searches for the first instance of the given element and removes it (throws ValueError if not present)
list.sort() -- sorts the list in place (does not return it). (The sorted() function shown below is preferred.)
list.reverse() -- reverses the list in place (does not return it)
list.pop(index) -- removes and returns the element at the given index. Returns the rightmost element if index is omitted (roughly the opposite of append()).

list build up
list slices
Exercise: list1.py

Sorting https://developers.google.com/edu/python/sorting

list.sort()
sorted(list)
sorted(list, reverse=True)
key

strs = ['ccc', 'aaaa', 'd', 'bb']

print sorted(strs, key=len) ## ['d', 'bb', 'ccc', 'aaaa']

## "key" argument specifying str.lower function to use for sorting

print sorted(strs, key=str.lower) ## ['aa', 'BB', 'CC', 'zz']

def MyFn(s):

return s[-1]

## Now pass key=MyFn to sorted() to sort by the last letter:

print sorted(strs, key=MyFn) ## ['wa', 'zb', 'xc', 'yd']

Tuples
```
 tuple = (1, 2, 'hi')
 print len(tuple)  ## 3
 print tuple[2]    ## hi
 tuple[2] = 'bye'  ## NO, tuples cannot be changed
 tuple = (1, 2, 'bye')  ## this works
```
To create a size-1 tuple, the lone element must be followed by a comma.
```
  tuple = ('hi',)   ## size-1 tuple
```
It's a funny case in the syntax, but the comma is necessary to distinguish the tuple from the ordinary case of putting an expression in parentheses. In some cases you can omit the parenthesis and Python will see from the commas that you intend a tuple.
Assigning a tuple to an identically sized tuple of variable names assigns all the corresponding values. If the tuples are not the same size, it throws an error. This feature works for lists too.
```
  (x, y, z) = (42, 13, "hike")
  print z  ## hike
  (err_string, err_code) = Foo()  ## Foo() returns a length-2 tuple
```
List Comprehensions

nums = [1, 2, 3, 4]

squares = [ n * n for n in nums ] ## [1, 4, 9, 16]

strs = ['hello', 'and', 'goodbye']

shouting = [ s.upper() + '!!!' for s in strs ]

## ['HELLO!!!', 'AND!!!', 'GOODBYE!!!']

## Select values <= 2

nums = [2, 8, 1, 6]

small = [ n for n in nums if n <= 2 ] ## [2, 1]

## Select fruits containing 'a', change to upper case

fruits = ['apple', 'cherry', 'bannana', 'lemon']

afruits = [ s.upper() for s in fruits if 'a' in s ]

## ['APPLE', 'BANNANA']

Exercise: list1.py

Dicts and Files https://developers.google.com/edu/python/dict-files

Associate lists, hash table, mapping, dictionary

dict = {}

dict['a'] = 'alpha'

dict['g'] = 'gamma'

dict['o'] = 'omega'

print dict ## {'a': 'alpha', 'o': 'omega', 'g': 'gamma'}

print dict['a'] ## Simple lookup, returns 'alpha'

dict['a'] = 6 ## Put new key/value into dict

'a' in dict ## True

## print dict['z'] ## Throws KeyError

if 'z' in dict: print dict['z'] ## Avoid KeyError

print dict.get('z') ## None (instead of KeyError)

## By default, iterating over a dict iterates over its keys.

## Note that the keys are in a random order.

for key in dict: print key

## prints a g o

## Exactly the same as above

for key in dict.keys(): print key

## Get the .keys() list:

print dict.keys() ## ['a', 'o', 'g']

## Likewise, there's a .values() list of values

print dict.values() ## ['alpha', 'omega', 'gamma']

## Common case -- loop over the keys in sorted order,

## accessing each key/value

for key in sorted(dict.keys()):

print key, dict[key]

## .items() is the dict expressed as (key, value) tuples

print dict.items() ## [('a', 'alpha'), ('o', 'omega'), ('g', 'gamma')]

## This loop syntax accesses the whole dict by looping

## over the .items() tuple list, accessing one (key, value)

## pair on each iteration.

for k, v in dict.items(): print k, '>', v

## a > alpha o > omega g > gamma

del operator

var = 6

del var # var no more!

list = ['a', 'b', 'c', 'd']

del list[0] ## Delete first element

del list[-2:] ## Delete last two elements

print list ## ['b']

dict = {'a':1, 'b':2, 'c':3}

del dict['b'] ## Delete 'b' entry

print dict ## {'a':1, 'c':3}

files
open - r = read, w = write, a = append

# Echo the contents of a file

f = open('foo.txt', 'rU') ## U = Universal mode, which is smart about end of lines

for line in f: ## iterates over the lines of the file

print line, ## trailing , so print does not add an end-of-line char

## since 'line' already includes the end-of line.

f.close()

Exercise: wordcount.py

Regular Expressions https://developers.google.com/edu/python/regular-expressions

Regular expressions are for pattern matching, e.g., search, replace
re module

import re

match = re.search(pat, str)

example:

str = 'an example word:cat!!'

match = re.search(r'word:\w\w\w', str)

# If-statement after search() tests if it succeeded

if match:

print 'found', match.group() ## 'found word:cat'

else:

print 'did not find'

patterns

The power of regular expressions is that they can specify patterns, not just fixed characters. Here are the most basic patterns which match single chars:

a, X, 9, < -- ordinary characters just match themselves exactly. The meta-characters which do not match themselves because they have special meanings are: . ^ $ * + ? { [ ] \ | ( ) (details below)
. (a period) -- matches any single character except newline '\n'
\w -- (lowercase w) matches a "word" character: a letter or digit or underbar [a-zA-Z0-9_]. Note that although "word" is the mnemonic for this, it only matches a single word char, not a whole word. \W (upper case W) matches any non-word character.
\b -- boundary between word and non-word
\s -- (lowercase s) matches a single whitespace character -- space, newline, return, tab, form [ \n\r\t\f]. \S (upper case S) matches any non-whitespace character.
\t, \n, \r -- tab, newline, return
\d -- decimal digit [0-9] (some older regex utilities do not support but \d, but they all support \w and \s)
^ = start, $ = end -- match the start or end of the string
\ -- inhibit the "specialness" of a character. So, for example, use \. to match a period or \\ to match a slash. If you are unsure if a character has special meaning, such as '@', you can put a slash in front of it, \@, to make sure it is treated just as a character.

Repetition

Things get more interesting when you use + and * to specify repetition in the pattern

+ -- 1 or more occurrences of the pattern to its left, e.g. 'i+' = one or more i's
* -- 0 or more occurrences of the pattern to its left
? -- match 0 or 1 occurrences of the pattern to its left

Leftmost & Largest

First the search finds the leftmost match for the pattern, and second it tries to use up as much of the string as possible -- i.e. + and * go as far as possible (the + and * are said to be "greedy").

Examples

## i+ = one or more i's, as many as possible.

match = re.search(r'pi+', 'piiig') => found, match.group() == "piii"

## Finds the first/leftmost solution, and within it drives the +

## as far as possible (aka 'leftmost and largest').

## In this example, note that it does not get to the second set of i's.

match = re.search(r'i+', 'piigiiii') => found, match.group() == "ii"

## \s* = zero or more whitespace chars

## Here look for 3 digits, possibly separated by whitespace.

match = re.search(r'\d\s*\d\s*\d', 'xx1 2 3xx') => found, match.group() == "1 2 3"

match = re.search(r'\d\s*\d\s*\d', 'xx12 3xx') => found, match.group() == "12 3"

match = re.search(r'\d\s*\d\s*\d', 'xx123xx') => found, match.group() == "123"

## ^ = matches the start of string, so this fails:

match = re.search(r'^b\w+', 'foobar') => not found, match == None

## but without the ^ it succeeds:

match = re.search(r'b\w+', 'foobar') => found, match.group() == "bar"

square brackets
group extraction
findall
findall with files
findall and groups

Options

The re functions take options to modify the behavior of the pattern match. The option flag is added as an extra argument to the search() or findall() etc., e.g. re.search(pat, str, re.IGNORECASE).

IGNORECASE -- ignore upper/lowercase differences for matching, so 'a' matches both 'a' and 'A'.
DOTALL -- allow dot (.) to match newline -- normally it matches anything but newline. This can trip you up -- you think .* matches everything, but by default it does not go past the end of a line. Note that \s (whitespace) includes newlines, so if you want to match a run of whitespace that may include a newline, you can just use \s*
MULTILINE -- Within a string made of many lines, allow ^ and $ to match the start and end of each line. Normally ^/$ would just match the start and end of the whole string.

Other

Greedy vs non-greedy
Substitusion
Exercise: baby names https://developers.google.com/edu/python/exercises/baby-names

Utilities https://developers.google.com/edu/python/utilities

File System -- os, os.path, shutil

The *os* and *os.path* modules include many functions to interact with the file system. The *shutil* module can copy files.

os module docs
filenames = os.listdir(dir) -- list of filenames in that directory path (not including . and ..). The filenames are just the names in the directory, not their absolute paths.
os.path.join(dir, filename) -- given a filename from the above list, use this to put the dir and filename together to make a path
os.path.abspath(path) -- given a path, return an absolute form, e.g. /home/nick/foo/bar.html
os.path.dirname(path), os.path.basename(path) -- given dir/foo/bar.html, return the dirname "dir/foo" and basename "bar.html"
os.path.exists(path) -- true if it exists
os.mkdir(dir_path) -- makes one dir, os.makedirs(dir_path) makes all the needed dirs in this path
shutil.copy(source-path, dest-path) -- copy a file (dest path directories should exist)

Running External Processes -- commands

The *commands* module is a simple way to run an external command and capture its output.

commands module docs
(status, output) = commands.getstatusoutput(cmd) -- runs the command, waits for it to exit, and returns its status int and output text as a tuple. The command is run with its standard output and standard error combined into the one output text. The status will be non-zero if the command failed. Since the standard-err of the command is captured, if it fails, we need to print some indication of what happened.
output = commands.getoutput(cmd) -- as above, but without the status int.
There is a commands.getstatus() but it does something else, so don't use it -- dumbest bit of method naming ever!
If you want more control over the running of the sub-process, see the "popen2" module (http://docs.python.org/lib/module-popen2.html)
There is also a simple os.system(cmd) which runs the command and dumps its output onto your output and returns its error code. This works if you want to run the command but do not need to capture its output into your python data structures.

Exceptions

An exception represents a run-time error that halts the normal execution at a particular line and transfers control to error handling code. This section just introduces the most basic uses of exceptions. For example a run-time error might be that a variable used in the program does not have a value (ValueError .. you've probably seen that one a few times), or a file open operation error because that a does not exist (IOError). (See [[http://docs.python.org/tut/node10.html][exception docs]])

Without any error handling code (as we have done thus far), a run-time exception just halts the program with an error message. That's a good default behavior, and you've seen it many times. You can add a "try/except" structure to your code to handle exceptions, like this:

try:

## Either of these two lines could throw an IOError, say

## if the file does not exist or the read() encounters a low level error.

f = open(filename, 'rU')

text = f.read()

f.close()

except IOError:

## Control jumps directly to here if any of the above lines throws IOError.

sys.stderr.write('problem reading:' + filename)

## In any case, the code then continues with the line after the try/except

The try: section includes the code which might throw an exception. The except: section holds the code to run if there is an exception. If there is no exception, the except: section is skipped (that is, that code is for error handling only, not the "normal" case for the code). You can get a pointer to the exception object itself with syntax "except IOError, e: .. (e points to the exception object)".

HTTP -- urllib and urlparse

The module *urllib* provides url fetching -- making a url look like a file you can read form. The *urlparse* module can take apart and put together urls.

urllib module docs
ufile = urllib.urlopen(url) -- returns a file like object for that url
text = ufile.read() -- can read from it, like a file (readlines() etc. also work)
info = ufile.info() -- the meta info for that request. info.gettype() is the mime time, e.g. 'text/html'
baseurl = ufile.geturl() -- gets the "base" url for the request, which may be different from the original because of redirects
urllib.urlretrieve(url, filename) -- downloads the url data to the given file path
urlparse.urljoin(baseurl, url) -- given a url that may or may not be full, and the baseurl of the page it comes from, return a full url. Use geturl() above to provide the base url.

## Given a url, try to retrieve it. If it's text/html,

## print its base url and its text.

def wget(url):

ufile = urllib.urlopen(url) ## get file-like object for url

info = ufile.info() ## meta-info about the url content

if info.gettype() == 'text/html':

print 'base url:' + ufile.geturl()

text = ufile.read() ## read all its text

print text

The above code works fine, but does not include error handling if a url does not work for some reason. Here's a version of the function which adds try/except logic to print an error message if the url operation fails.

## Version that uses try/except to print an error message if the

## urlopen() fails.

def wget2(url):

try:

ufile = urllib.urlopen(url)

if ufile.info().gettype() == 'text/html':

print ufile.read()

except IOError:

print 'problem reading url:', url

Python3: urllib.request module

See f0210.py

  try:
    ufile = urllib.request.urlopen(url)
    print (ufile.read())
  except IOError:
      return ('problem reading url:' + url)

Homework 0

import math (for hypotenuse())
# comment character
pass statement
submit on zoo
http://cacm.acm.org
testing code - modified from Google python exercises
lambda expression: anonymous function
- use in map example

Homework 1

iteration: for x in sequence
recursion
list comprehension
slices

Homework 2

regular expressions
files
URLs and web programming

Python notes

__pycache__ for 3.2 ++ byte code files, instead of .pyc
LP Chap 22 - byte codes
frozen binaries - bundling PVM interpreter with byte code
reload
from imp import reload (v3)
module attributes
dir(module)
namespace
IDLE / Eclipse

Hierarchy

Programs are comprised of modules
Modules contain statements
Statements contain expressions
Expressions create and process objects.

Built in objects = type() strive for polymorphism

numbers
strings
lists
dictionaries
tuples
files
sets - remove duplicates, order neutral equality test, filtering
others: boolean, types, None
program units: functions, modules, classes
implementation related types: compiled code, stack traceback

String encodings

ASCII
UTF-8 - compatible with ASCII, variable length 8 bit bytes
UTF-16
UTF-32
latin-1

Numbers

Octal 0o20 oct(64)
Hex 0x20 hex(64)
Binary 0b10 bin(64)
‘{0:o} {1:x} {2:b}’.format(64, 64, 64)
‘%o %x %x %X’ % (64, 64, 255, 255)

Garbage collection

== vs is
import sys
sys.getrefcount(obj)

Files

import pickle
object serialization
import json - for JSON formats , e.g., dicts
import csv for spreadsheets

recursion