CS 200: Regular Expressions in Python

This notebook mirrors the Google Python Course: Regular Expressions

Regular expressions comprise a pattern matching language. They also are a formal grammar that is a proper subset of context free grammars. In addition, regular expressions are provably equivalent to deterministic finite state automata, aka deterministic finite state acceptors or DFA's.

The functions defined in this notebook are found in retest.py.

Python implements regular expression pattern matching in the re module.

A pattern is a string containing either characters or meta-characters.

The re method search(pattern, string) performs a pattern match.

The power of regular expressions is that they can specify patterns, not just fixed characters. Here are the most basic patterns which match single characters:

Examples

Repetition

Things get more interesting when you use + and * to specify repetition in the pattern

Leftmost & Largest

First the search finds the leftmost match for the pattern, and second it tries to use up as much of the string as possible -- i.e. + and go as far as possible (the + and \ are said to be "greedy").

Examples

Finds the first/leftmost solution, and within it drives the + as far as possible (aka 'leftmost and largest').

In this example, note that it does not get to the second set of i's.

\s* = zero or more whitespace chars

Here look for 3 digits, possibly separated by whitespace.

^ = matches the start of string, so the first case fails:

Square brackets indicate a character class. e.g. [aeiou] matches any vowel

Testing regular expressions

Group Matching

Findall

End of regular expressions notebook.