| 11/30/07 | Lecture 34. Sorting, Analysis of Algorithms, Recursion and Memoization.
For more on memoization (and dynamic programming), see the lecture notes memoization. Last lecture we saw that any algorithm that uses pairwise comparisons to sort n elements must, in the worst case, use at least Omega(n log n) comparisons. Insertion sort uses O(n^2) comparisons in the worst case, leaving a gap between the upper bound and the lower bound. In fact, there are several sorts that use O(n log n) pairwise comparisons; we will look at one of them: merge sort.The fundamental operation used by mergesort is to merge two already-sorted lists into a single list that is properly sorted. Thus, for example, we'd like (merge '(1 3 16 25) '(2 4 5 15)) => (1 2 3 4 5 15 16 25). The idea is repeatedly to compare the first elements of the two lists, moving the smaller one to the end of the output list. When one of the two lists empties, the other can just be copied to the end of the output list. For the example, we compare 1 and 2, and move 1 to the output list: (1), then we compare 2 and 3 and move 2 to the output list: (1 2), and then compare 3 and 4 and move 3 to the output list: (1 2 3), and then compare 4 and 16 and move 4 to the output list: (1 2 3 4), and then compare 5 and 16, and move 5 to the output list: (1 2 3 4 5), and then compare 15 and 16 and move 15 to the output list: (1 2 3 4 5 15). At this point, the second list is now empty, and we just move the remaining elements, 16 and 25 to the output list: (1 2 3 4 5 15 16 25). A Scheme implementation of merge:
(define merge
(lambda (ls1 ls2)
(cond
((null? ls1) ls2)
((null? ls2) ls1)
((<= (car ls1) (car ls2)) (cons (car ls1) (merge (cdr ls1) ls2)))
(else (cons (car ls2) (merge ls1 (cdr ls2)))))))
To understand how many compares (applications of <=) this procedure
might use in the worst case, note that each compare results in
an element being moved from an input list to the output list,
where it is not considered again.
Also, when either input list is empty, no compares are required: the
remaining elements of the other input list are just moved to the output
list.
Thus, the worst case number of compares when at least one list is
nonempty is m+n-1, because the last element can be moved to the output
list without any compares (because the other list must be empty at that
point.)
Note that the "best case", that is, the number of compares used by
the algorithm for the most favorable choice of inputs (assuming that
they are both nonempty lists) is min(m,n).
(Think about what nonempty input lists ls1 and ls2 would provoke
the worst case number of comparisons? the best case?)
The idea of merge sort is to divide the input list into two lists of equal length (or off by one, if the input list has odd length), call merge sort recursively, and then merge the two resulting sorted lists together. So a Scheme implementation of merge sort is:
(define msort
(lambda (ls)
(cond
((or (null? ls) (null? (cdr ls))) ls)
(else (merge (msort (one-half ls)) (msort (other-half ls)))))))
where one-half and other-half are the procedures that divide the
input list ls into two (nearly) equal length sublists.
One way to divide the lists is the first half of the elements
and the last half of the elements, but the method of division
into two lists may be arbitrary, as long as it runs in time O(n).
Why does msort use only O(n log n) comparisons? The only comparisons are done in the merge procedure -- the rest of msort is data shuffling and recursive calls. Let's look at a particular example, where we assume splitting the list into the first half and last half, and we show the arguments to the recursive calls of msort.
(13 2 6 8 3 33 11 9)
/ \
(13 2 6 8) (3 33 11 9)
/ \ / \
(13 2) (6 8) (3 33) (11 9)
/ \ / \ / \ / \
(13) (2) (6) (8) (3) (33) (11) (9)
The leaves of this tree of recursive calls are base cases
(lists of length 1, which simply return the list itself.)
Pairs of these lists of length 1 are merged (using one comparison
each) to get 4 lists of length 2: (2 13) (6 8) (3 33) (9 11).
Pairs of these lists of length 2 are merged (using 3 comparisons
each) to get 2 lists of length 4: (2 6 8 13) (3 9 11 33).
Finally, this pair of lists is merged (using 7 comparisons)
to give the final sorted results: (2 3 6 8 9 11 13 33).
Thus, the total number of comparisons used by msort for this
input is 4+6+7 = 17.
(Think about whether this is a worst case for a list of length 8,
and what a "best case" input would look like.)
In general, why can we say that the total number of comparisons used by msort will be O(n log n)? If we consider the tree of recursive calls, it can be divided into "levels" of nodes at the same distance from the root. In the example, the levels are: a list of length 8, two lists of length 4, 4 lists of length 2, and 8 lists of length 1. The lengths are being halved from one level to the next, meaning that the total number of levels will be about log n, certainly O(log n). The total number of comparisons done by calls to merge at each level is at most n. This is because the sum of the lengths of the lists at each level is n, and the number of merges done to merge pairs of them will be at most the sum of their lengths. Thus we get at most O(n log n) comparisons. (We could be more precise with this analysis, but this is sufficient to see the asymptotic behavior.) (Think about why the total running time, assuming one-half and other-half are O(n) operations, will also be O(n log n).) Another topic: memoization. Consider the Fibonacci numbers, defined by F(0) = 0, F(1) = 1, and F(n) = F(n-1) + F(n-2) for all n greater than 1. This is the celebrated sequence: 0, 1, 1, 2, 3, 5, 8, 13, ... . A straightforward Scheme implementation of the definition is:
(define fib
(lambda (n)
(if (<= n 1)
n
(+ (fib (- n 1)) (fib (- n 2))))))
Thus, (fib 0) is 0, (fib 1) is 1, (fib 6) is 8, etc.
This is straightforward, but will run very slowly for large n.
The reason is that there will be many, many repeated calls
of fib on the same arguments.
For example, we illustrate the arguments of the recursive calls
for (fib 6):
6
/ \
5 4
/ \ / \
4 3 3 2
/ \ / \ / \ / \
3 2 2 1 2 1 1 0
/ \ / \ / \ / \
2 1 1 0 1 0 1 0
/ \
1 0
The number of procedure calls for (fib n) will in fact be proportional
to F(n).
How big is F(n)?
Well, we could prove inductively that F(n) is less than 2^n, because
in general to calculate F(n), we add the two preceding values,
which are (inductively) less than 2^(n-2) and 2^(n-1), respectively.
However, the growth of F(n) is *exponential*; in fact, it grows
roughly as (1.618)^n, where 1.618 is an approximation of
(sqrt(5)+1)/2, the Golden ratio.
Thus, the number of recursive calls to calculate (fib n) is
exponential in n.
We can make the number of calls O(n) instead of exponential in n by "remembering" the values of (fib k) that we have already computed by saving them in a table. Thus, instead of recomputing (fib 4) the second (and subsequent) time(s) we encounter it, we simply look up its value in our table and use that instead. With this technique, called "memoization", the tree of recursive calls would be reduced to the following.
6
/ \
5 4 : look it up!
/ \
4 3 : look it up!
/ \
3 2 : look it up!
/ \
2 1
/ \
1 0
Thus, the value of (fib k) for k from 2 to n is only computed
once, and the total number of calls (and total time) is reduced
to O(n).
(Of course, we could further reduce the space usage by observing
that we only need the two preceding values.)
In general, the signal for the use of memoization is that there
are lots of calls to a (functional) procedure with the same
arguments -- then it may pay to record the values of the procedure
on these arguments in a table, to avoid recomputation.
If we take this idea further, and decide to build the table
of relevant values from the "bottom up", we get "dynamic programming",
a very general and powerful technique for designing algorithms.
|
|---|---|
| 11/28/07 | Lecture 33. Sorting, Analysis of Algorithms.
We consider the task of sorting a list of numbers into increasing (actually,nondecreasing) order. This can be generalized to sorting a list of any items for which we can define a total ordering. The first sort we consider is insertion sort, whose basic building block is a procedure to insert a number into the proper place in an already sorted list of numbers. Thus for example, (insert 12 '(1 5 9 14 17)) => (1 5 9 12 14 17). A Scheme program for this is the following.
(define insert
(lambda (item ls)
(cond
((null? ls) (list item))
((<= item (car ls)) (cons item ls))
(else (cons (car ls) (insert item (cdr ls)))))))
Looking at the tree of recursive calls for the example, we have
the following.
(insert 12 (1 5 9 14 17)) => (1 5 9 12 14 17)
|
(cons 1 (insert 12 (5 9 14 17))) => (1 5 9 12 14 17)
|
(cons 5 (insert 12 (9 14 17))) => (5 9 12 14 17)
|
(cons 9 (insert 12 (14 17))) => (9 12 14 17)
Counting comparisons of list elements, that is, calls
to the built-in procedure <=, we find that this example
uses 4 comparisons.
In general, the number of comparisons needed is the number of
elements of ls up to and including the first one that is greater
than or equal to the item being inserted.
In the worst case, this may be n comparisons for a list ls
of n numbers, because we may have to compare item with
every one of them.
By looking at the program, and noting that car, cdr, cons, null?,
and procedure calls take constant time, we see that the amount
of time used by the insert procedure will be bounded by a constant
times the number of comparisons used.
Thus, the worst case time of insert is Theta(n) for a list of n items.
We may then use the insert procedure to implement insertion sort as follows.
(define isort
(lambda (ls)
(cond
((null? ls) '())
(else (insert (car ls) (isort (cdr ls)))))))
To see this procedure in action, we look at the tree of its
recursive calls on a particular input, namely (4 2 3 1).
# of comparisons
(isort (4 2 3 1)) => (1 2 3 4)
|
(insert 4 (isort (2 3 1))) => (1 2 3 4) 3
|
(insert 2 (isort (3 1))) => (1 2 3) 2
|
(insert 3 (isort (1))) => (1 3) 1
|
(insert 1 (isort ())) => (1) 0
Note that all the comparisons are used by insert, and the
number used by each call to insert is shown above.
Thus, for this example, the total number of comparisons used
is 0+1+2+3 = 6. In general, the worst case number
of comparisons for isort to sort a list of n elements
is 1+2+3+...+(n-1) = n(n-1)/2.
This is because we are inserting into a list of length 0,
then length 1, then length 2, etc. up to length (n-1),
and by our analysis of insert, the worst case number
of comparisons to insert into a list of m elements is m comparisons.
This worst case is actually achieved for isort when sorting
a list in reverse order, eg, (5 4 3 2 1).
The best case is achieved for isort when sorting a list
that is already sorted, which takes (n-1) comparisons.
Using asymptotic notation, the number of comparisons (and
running time) of isort is Theta(n^2).
That is, insertion sort is a quadratic sort.
An information theoretic lower bound on comparison sorts can be obtained by thinking of a comparison sort as a decision tree in which the internal nodes compare two input elements and branch on the results, and the leaf nodes (nodes with no children) represent orderings of all the input elements. For example, a comparison-based decision tree algorithm to sort three inputs: x1, x2, x3, might look as follows.
x1 <= x2?
/ \
Y / \ N
/ \
x2 <= x3? x2 <= x3?
Y / \ N Y / \ N
/ \ / \
x1,x2,x3 x1 <= x3? x1 <= x3? x3,x2,x1
Y / \ N Y / \ N
/ \ / \
x1,x3,x2 x3,x1,x2 x2,x1,x3 x2,x3,x1
Note that each leaf represents one of the 6 = 3! possible orderings
of x1,x2,x3, indicating a correct nondecreasing ordering of the
input elements, on the basis of pairwise comparisons of them.
Two of the outcomes (x1,x2,x3 and x3,x2,x1) are reached after
making only two comparisons, while the other four are reached
after making 3 comparisons.
Thus, the worst-case number of comparisons for this sorting
algorithm is 3. This is also the depth of the tree, that is,
the maximum number of edges on a path from the root of the tree
to some leaf.
In general, any algorithm that uses pairwise element comparisons (<=) to sort n numbers can be represented by a binary decision tree of this kind. The number of leaves must be at least the number of different possible permutations of the input elements, that is, at least n! = n(n-1)(n-2)...1. However, a binary tree of depth d has at most 2^d leaves, so we must have 2^d >= n!, or d >= log_2(n!). Next lecture we will see that log_2(n!) is Theta(n log n), so this gives a lower bound of Theta(n log n) on the number of comparisons used in the worst case by ANY algorithm that sorts n elements using pairwise comparisons. This lower bound is smaller than the upper bound of Theta(n^2) that we achieved with insertion sort. Next lecture we will see a sort, merge sort, that in fact achieves a worst case bound of Theta(n log n) comparisons (and time) to sort n elements. Running time bounds are typically expressed using big-Oh (eg, O(n^2)), big-Theta (eg, Theta(n^2)) and big-Omega (eg, Omega(n^2)). For a precise definition, if f(n) and g(n) are functions from the natural numbers to the natural numbers, we say that f(n) is O(g(n)) (read: big-Oh of g of n) if there exist positive constants c and N such that for all n > N, f(n) <= cg(n). For example, if f(n) = 3n+17, then f(n) = O(n), witnessed by c = 20 and N = 1, because for all n >= 1, 3n+17 <= 20n. Thus, a positive constant multiple of g(n) is an UPPER BOUND for f(n) except possibly for finitely many initial values (those less than N.) We say that f(n) is Omega(g(n)) if g(n) = O(f(n)); this is the case when a positive constant multiple of g(n) is a LOWER BOUND for f(n) except possibly for finitely many initial values. Finally, we say that f(n) is Theta(g(n)) if both f(n) is O(g(n)) and f(n) is Omega(g(n)); this is the case when one positive multiple of g(n) is an upper bound for f(n) and another positive multiple of g(n) is a lower bound for f(n), except possibly for finitely many initial values n. We claimed above that n(n-1)/2 is Theta(n^2). To see this, note that n(n-1)/2 <= n^2 for all natural numbers n, which shows that n(n-1)/2 is O(n^2), the upper bound. We also need to show the lower bound. For this, we observe that some algebra verifies that (1/4)n^2 <= n(n-1)/2 for all n >= 2, so n(n-1)/2 is Omega(n^2), and thus we conclude that n(n-1)/2 is Theta(n^2). The quantities inside the big-Oh, big-Theta and big-Omega expressions are generally taken to be as simple as possible (that is, n^2 instead of 3n^2+4n+13) because we use them to get an understanding of how the operation count or running time scales as n increases. |
| 11/26/07 | Lecture 32. Circular lists; Searching a table.
The mutators set-car! and set-cdr! give us the ability to construct relatively arbitrary data structures with pointers. For example, we may construct a circular list by evaluating the following sequence of expressions in the Scheme interpreter. > (define anomaly '(a b)) > (set-cdr! (cdr anomaly) anomaly)There is nothing anomalous about the value of anomaly after the first expression is evaluated: it is simply a list with first element a and second element b. That is, there are two cons cells, the first with car a and cdr pointing to the second, and the second with car b and cdr (). However, after the second expression is evaluated, the cdr portion of the second cons cell points to the first cons cell, creating a circular list. Scheme implementations vary in how they will attempt to print out such a structure. For example, MIT Scheme will happily fill your screen with alternating a's and b's until you type a control C. Other implementations may attempt to detect the circularity and print out something else. How could we write a program to detect circularity? We'd need to avoid the predicate equal? and instead use eq? or eqv?. For example, (eq? anomaly (cddr anomaly)) => #t, because both anomaly and (cddr anomaly) are pointers to the SAME cons cell. (If you wish to learn about the subtle differences between eqv? and eq? please consult the Revised^5 Report on Scheme, available from the MIT Scheme website.) We also considered the topic of searching for a value in a table. Assume we have a table containing numbers, and our goal is to determine whether a specific number occurs in the table. You have solved something like this problem when you wrote lookup to determine whether a key occurs in some entry in a table. The method you used was probably linear search: compare the input number to each specific item in the table until the comparison returns equality (the input number occurs in the table) or until all items have been compared without finding equality (the input number does not occur in the table.) In the worst case, the number of comparisons used by this method is n, where n is the number of items in the table. The worst case running time is Theta(n), that is, bounded between cn and dn for some constants c and d, for all but finitely many values of n. If we are willing to put in the time to preprocess the table, in particular, sort it into increasing (or decreasing) order, then searching the table can be made much faster, Theta(log n) in the worst case, where n is the number of items in the table. This would be a good tradeoff if the cost of sorting the table once is less than the sum of the times for all the searches to be performed on it. The method to achieve time Theta(log n) is binary search: compare the input number with the middle number of the table. If it is equal, stop, because we have found a match. If the input number is smaller, then recursively look for the input number in the portion of the table before the middle element. If the input number is larger, then recursively look for the input number in the portion of the table after the middle element. To see that this involves about log n comparisons, consider a table that starts out with n = 2^k entries. Then k is the base 2 logarithm of n. Each comparison lets us "throw away" half of the remaining elements, so that we consider 2^k elements, then 2^(k-1) elements, then 2^(k-2) elements, all the way down to 2 elements and then 1 element. Thus, in k+1 comparisons, we can determine whether the input number is in the table. Note that the time to do this will be proportional to the number of comparisons IF we have something like an array data structure. What is special about an array, as opposed to a list? In an array, if we know the index j, in constant time we can access the j-th element of the array. (Note that this would involve address arithmetic and an indirect reference in the TC-201 -- constant time.) In a simple list, accessing the j-th element of the list involves following about j pointers until we get to the specified element -- not constant-time. Scheme has an array-like data type, namely vectors. There are built-in procedures for dealing with vectors, including make-vector, vector-ref and vector-set!. Can we do better than time proportional to log n for looking up an item in a table? Yes, hashing is a practical method that achieves expected constant time (Theta(1)) for looking up an element, under certain assumptions about the behavior of the hash function. In this case, we allocate a table rather larger than the maximum number of elements it will hold (eg, twice as large) and depend on a "hash function" -- a mathematical function that takes a value and returns an index in the hash table where that value should be stored. Thus, in the simple case, looking up a value entails applying the hash function to it (constant-time) to get an index into the hash table, and then looking to see if that entry in the hash table is the desired value (successful match) or an "undefined" value (no match). Of course, things are not quite so simple, because actual hash functions have "collisions" -- two different values mapping to the same index into the hash table. Such collisions require a strategy to deal with them. Two of the most popular strategies are "open addressing" and "chaining". In open addressing, when we add an element to the hash table and there is a collision with a value already there, we simply move sequentially down the table to find the first "undefined" entry to put the value we are adding in. This means that on lookup, we need to keep comparing the value being looked up to sequential positions in the hash table until we get a match or find the first "undefined" entry. Thus, we start a small linear search at the index given by the hash function. Under an assumption that the indices produced by the hash function behave "like" random values, the expected number of comparisons remains constant. In chaining, each defined entry points outside the table to a list of the elements that have hashed to this index, which is then searched linearly. Another preprocessed representation of a set of numbers is as a binary search tree. This is a binary tree with values from a total ordering (eg, numbers) at each node, arranged so that the values appearing in the left subtree of a node are all less than or equal to the value at the node, which is in turn less than or equal to all the values occurring in the right subtree of the node. This must be true at every node of the tree. As an example, we may store the values 3,17,20,22,25,32,37,39,40,51 in a binary search tree as follows.
25
/ \
17 37
/ \ / \
3 20 32 40
\ / \
22 39 51
Note that the numbers in the left subtree of the root: 3,17,20,22,
are all less than or equal to the value at the root: 25, and the
values in the right subtree of the root: 32,37,39,40,51, are all
greater than or equal to the value at the root.
Moreover, this is true of the left and right subtrees of any
node in the tree.
Recall from our discussion of representing binary trees as lists
that a Scheme list representation of this tree allows constant-time
access to the label at the root of a tree and to the left and right
subtrees of the tree.
To look up a value, say 38, in this tree, we compare this
value to the value at the root, 25.
Because 38 is larger than 25, we know that if 38 is in the
tree at all, it is in the right subtree, so we repeat this
process with the right subtree, and compare 38 with 37.
Because 38 is larger than 37, we again descend into the
right subtree to repeat the process.
Comparing 38 with 40, we see that 38 is less than 40, so we
now descend into the left subtree, and compare 38 with 39.
After this comparison, we'd like to descend into the left
subtree below the node with value 39, but there is no
left subtree, so we know that 38 does not occur in the tree.
The time cost is proportional to the number of comparisons
made, 4 in this case.
Thus, the worst case time to look up a value in a binary
search tree is Theta(d), where d is the number of nodes on
the longest path from the root to a leaf (node with no children).
If the tree is "balanced" (that is, about as many nodes in the
left subtree as in the right subtree at every node), then
d will be proportional to log n, where n is the total number
of values stored, and the lookup cost will
be similar to that for a sorted array: Theta(log n).
However, the binary search tree could be very "unbalanced", for example:
3
\
17
\
20
\
22
\
25
\
32
\
37
\
39
\
40
\
51
This satisfies the criterion of being a binary
search tree, but the worst-case time to search it will
be Theta(n).
Thus the issues of constructing balanced binary trees and
maintaining their balance under operations of inserting and
deleting elements (and possibly other operations) assume
considerable importance.
|
| 11/16/07 | Lecture 31. Depth-first search and breadth-first search of trees using
stacks and queues.
We considered the task of visiting the nodes of a rooted, ordered binary tree in depth-first order using stacks (both the implicit stack of procedure calls in Scheme and an explicit stack data structure) and in breadth-first order using a queue. We defined a tree data structure to represent rooted ordered binary trees. Constructors:(define empty-tree '()) (define make-tree (lambda (label tree1 tree2) (list label tree1 tree2)))and predicates and selectors: (define empty-tree? null?) (define root-label car) (define left-subtree cadr) (define right-subtree caddr)Thus, the tree depicted by
a
/ \
h n
/ \ /
d o u
/
t
could be constructed by the following:
(define ex-tree
(make-tree 'a
(make-tree 'h
(make-tree 'd empty-tree empty-tree)
(make-tree 'o empty-tree empty-tree))
(make-tree 'n
(make-tree 'u
(make-tree 't empty-tree empty-tree))
empty-tree)))
To visit the nodes of such a tree in depth-first order, we
visit the root node (labelled 'a in ex-tree), and recursively
visit all the nodes of the left subtree (rooted at the node labelled 'h
in ex-tree) and then visit all the nodes of the right subtree
(rooted at the node labelled 'n in ex-tree).
We can do this using a recursive Scheme program that returns the
labels of the nodes in the order visited, as follows.
(define dfs
(lambda (tree)
(if (empty-tree? tree)
'()
(append (list (root-label tree))
(dfs (left-subtree tree))
(dfs (right-subtree tree))))))
When this code is run on ex-tree, it returns the list of labels:
(a h d o n u t).
Scheme is using a stack of procedure calls to keep track of the
work remaining to be done after the current call to dfs returns.
(What would we have to do to make sure that the call on
the left subtree actually precedes the call on the right subtree?)
We could instead use an explicit stack data structure and non-recursive code to implement depth-first search. In "pseudo-code" such a procedure would look like the following.
1. Push the input tree onto the stack.
2. While the stack is non-empty, do
3. {Pop the top element of the stack and call it T.
4. If T is a non-empty tree then
5. {Print the root label of T.
6. Push the right subtree of T onto the stack.
7. Push the left subtree of T onto the stack.
8. }
9. }
Note that when we push a tree onto the stack, we are only
moving a pointer to the tree, not the whole tree.
Using the labels of the subtrees to represent these pointers,
we can watch the progress of the stack as follows.
stack: empty stack: a (Printed: a) stack: h n (Printed: h) stack: d o n (Printed: d) stack: empty-tree empty-tree o n (Printed: nothing) stack: empty-tree o n (Printed: nothing) stack: o n (Printed: o) stack: empty-tree empty-tree n (Printed: nothing) stack: empty-tree n (Printed: nothing) stack: n (Printed: n) stack: u empty-tree (Printed: u) stack: t empty-tree empty-tree) (Printed: t) stack: empty-tree empty-tree empty-tree empty-tree)Clearly from this last state, the algorithm empties the stack without printing anything further, and then halts. For breadth-first order, all the children of the root should be visited before any of the grandchildren of the root. To ensure that this happens, we can use a queue data structure instead of a stack. In "pseudo-code" we might have the following.
1. Enqueue the input tree on the queue.
2. While the queue is non-empty, do
3. {Dequeue the first element of the stack and call it T.
4. If T is a non-empty tree then
5. {Print the root label of T.
6. Enqueue the left subtree of T on the queue.
7. Enqueue the right subtree of T on the queue.
8. }
9. }
To be sure you understand this distinction, follow the
evolution of the queue as we did above for the stack.
The order in which the labels are printed should be:
a h n d o u t |
| 11/14/07 | Lecture 30. Mutators and queues.
The mutators set-car! and set-cdr! change the car field of a cons cell and the cdr field of a cons cell, respectively. We examined the implementation of a queue data structure given in the *textbook* using set-car! and set-cdr!. It gives constant-time implementations of the queue operations: (1) test for empty queue, (2) return the value of the first queue element, (3) dequeue: remove the first queue element, and (4) enqueue: add a new element to the end of the queue. See the excerpt from the textbook: queue.txt. |
| 11/12/07 | Lecture 29. Mutators, environments, objects, counters, and stacks.
We looked at the mutator set! and how it may be used in conjunction with the behavior of Scheme environments, procedure values and procedure calls to implement "objects" such as counters and stacks. See the two writeups in mutators and stacks. |
| 11/9/07 | Lecture 28. Compilers, continued.
We considered the problem of translating a parse tree into an assembly-language program. The grammar we developed in Lecture 26 was ambiguous, that is, there was a program with two different parse trees. An unambiguous version of the grammar, with an "if" statement added, is given in homework #7 as the grammar for the tiny higher level language THL. (As an aside, the computational problem of deciding whether a given context-free grammar is ambiguous, as useful as it might be, is undecidable -- as hard as the halting problem.) We used the grammar to parse two THL programs and then write equivalent assembly language programs. The first one was: begin sum = 0 : sum = sum + 1 end(Note the use of colons instead of semicolons to avoid Scheme's issues with semicolons.) A plausible assembly-language translation of this THL program might be:
load zero
store sum
load sum
add one
store sum
halt
zero: data 0
sum: data 0
one: data 1
The second one was:
begin
if n > 0 then begin abs = n end
else begin abs = 0 - n end
end
A plausible translation of this to assembly language might be:
load n
sub zero
skippos
jump false
jump true
true: load n
store abs
jump done
false: load zero
sub n
store abs
done: halt
n: data 0
zero: data 0
abs: data 0
We saw that generating code like this might be done by recursion
on the structure of the parse trees.
One task is to collect up all the variables and constants from
the program in order to provide the data statements at the end.
(Names for constants might have to be more "generic", eg,
constant3 for 3, rather than three.)
To generate instructions for an assignment statement: generate
the code to calculate the value of the expression in the accumulator,
and then put a store instruction into the variable on the left-hand
side of the assignment.
Code for while and if statements can use skipzero, skippos and jumps
to effect the correct control flow.
Note that the assembly-language programs thus generated might be very inefficient, with redundant loads, stores, and jumps. Techniques for optimizing the resulting assembly language, both locally (detecting and deleting redundant stores and loads, for example) and globally (detecting and deleting unreachable code, for example) are a major part of modern compilers. One crucial task (that doesn't come up in the TC-201 because it has only one data register, the accumulator) is scheduling which variables reside in the registers for which parts of the computation. |
| 11/7/07 | Lecture 27. Compilers.
(See also the extract from the Wikipedia article Compiler.) We consider two computational problems related to context free grammars and languages: the parsing problem and the equivalence problem. The Parsing Problem. Given a context-free grammar G and a string w, does G derive w, that is, is w an element of L(G)? This problem is solvable, even efficiently solvable by an algorithm that returns a parse tree for w if w is in L(G). This is the basis for using context free grammars in specifying programming languages. The Equivalence Problem. Given two context-free grammars G1 and G2, is G1 equivalent to G2, that is, is L(G1) = L(G2)? This problem is undecidable -- as hard as the Halting Problem. Fortunately, compilers don't have to solve this problem. What does a compiler do? It transforms a program in a higher-level language (Java, C, ML) into an "equivalent" assembly-language program. We use the excerpt from the Wikipedia Compiler entry (see link above) to point out the lexical analysis phase (which breaks up the input string of characters into tokens, for example, numbers, variables, keywords, symbols), the syntactic analysis phase (which parses the stream of tokens into a parse tree) and the semantic analysis phase (which "decorates" the parse tree with semantic information, for example, types and values), the optimization phase and the code generation phase. To illustrate how we might generate assembly language from a parse tree, the class designed a grammar for a tiny higher-level language, given below in extended BNF notation.
<statement> ::= <statement> { ; <statement>} |
<assignment statement> |
<while statement>
<assignment statement> ::= <variable> = <expression>
<while statement> ::= while <boolean expression> do <statement>
<expression> ::= <number> | <variable> |
<expression> <op> <expression>
<op> ::= + | -
<boolean expression> ::= <expression> <relop> <expression>
<relop> ::= == | <> | <= | < | >= | >
Note that this assumes that the lexical analysis (tokenizer) phase
will deliver tokens for variables, numbers, keywords (while, do), and
symbols (;, +, -, =, ==, <>, <=, >=, >).
Then we wrote a program to sum up the numbers from 1 to n (assuming the correct nonnegative integer value of n is already present in the variable n when the program starts.) sum = 0; while n > 0 do sum = sum + n; n = n - 1In trying to figure out a parse tree for this program, we noticed a possible ambiguity in the grammar in terms of how statements are parsed. The indenting above suggests we are thinking about the body of the while being the two following assignments, but it could also be parsed as: sum = 0; while n > 0 do sum = sum + n; n = n - 1which is not what we meant to write. Because the indenting is not treated as significant in our language, these two result in the same string of tokens. Thus, we need to modify the grammar to remove this source of ambiguity. One suggestion is to require that the body of the while be marked with something indicating the beginning and end. Another solution is to require something like Pascal's begin and end keywords delimiting the extent of a compound statement. |
| 11/5/07 | Lecture 26. Context-free grammars and programming languages.
(See also the writeup in cfgs.) We took up the problem of writing a context-free grammar for the set of all strings over the alphabet {a, b} that contain an equal number of a's and b's. This set contains strings like: the empty string, ab, aabb, abab, aabababbab, aaaabbbb, and so on. Two grammars were proposed: (I) S -> <lambda> | abS | baS | Sab | Sba | aSb | bSaand (II) S -> <lambda> | SaSbS | SbSaSIt is clear for both grammars that only strings with an equal number of a's and b's can be derived from S. So for each the problem is to show that every string with an equal number of a's and b's can be derived from S. As an example, we can parse aabababbab using grammar (II). S -> SaSbS -> aSbS -> aSaSbSbS -> aaSbSbS -> aaSbSaSbSbS -> aabaSbSbS -> aababSbS -> aababSaSbSbS -> aababaSbSbS -> aabababSbS -> aabababbS -> aabababbSaSbS -> aabababbaSbS -> aabababbabS -> aabababbab. (In lecture we saw a "parse tree" representing such a parse.) In general, we need a proof of this property. In computer science, one practical use of context free grammars is in specifying the syntax of programming languages. The notation called "Backus Naur Form" (or BNF) is an alternate notation for specifying context free languages. We looked in some detail at the BNF specification of the programming language Pascal from the Pascal User Manual and Report, 2nd edition, by Jensen and Wirth. Pascal was a higher-level programming language based mostly on ideas from Algol; it was designed primarily as a "teaching" language and was frequently used in introductory programming courses in the 80's and 90's until supplanted by Java. In BNF notation, a nonterminal symbol can be denoted by a phrase within angle brackets, and alternative rules for rewriting one nonterminal can be listed, separated by the symbol |. In addition, the syntax we were looking at had the convention of using curly braces for zero or more occurrences of something. Thus, the following rules specify that an identifier is a letter followed by zero or more letters or digits.
<identifier> ::= <letter> {<letter or digit>}
<letter> ::= <letter> | <digit>
The following rules specify that an unsigned integer is a digit
followed by zero or more digits -- so leading zeros are permitted
in a number.
Also, an unsigned number is either an unsigned integer or
an unsigned real.
<unsigned integer> ::= <digit> {<digit>}
<unsigned number> ::= <unsigned integer> | <unsigned real>
The following rules specify that a statement may have a
label or be unlabelled, and that it may be a simple statement
or a structured statement. Possibilities for a structured
statement include a compound statement, a conditional statement,
a repetitive statement, or a with statement.
<statement> ::= <unlabelled statement> |
<label> : <unlabelled statement>
<unlabelled statement> ::= <simple statement> |
<structured statement>
<simple statement> ::= <assignment statement> |
<procedure statement> |
<go to statement> |
<empty statement>
<structured statement> ::= <compound statement> |
<conditional statement> |
<repetitive statement> |
<with statement>
For an assignment statement, we may have a variable, an assignment operator
(:=) and an expression. The possibilities for a variable include
an identifier (as specified above.)
<assignment statement> ::= <variable> := <expression> |
<function identifier> := <expression>
<variable> ::= <entire variable> |
<component variable> |
<referenced variable>
<entire variable> ::= <variable identifier>
<variable identifier> ::= <identifier>
Expressions, on the right hand side of an assignment, can be
such things as integer constants, variables, and sums and products
of simpler expressions and comparisons between expressions.
Note that a "factor" can be formed by putting parentheses around
an expression. Note the separation of adding operators and terms
and multiplying operators and factors in the grammar, to permit
the correct order of operations in an unparenthesized arithmetic
expression.
<expression> ::= <simple expression> |
<simple expression> <relational operator> <simple expression>
<relational operator> ::= = | <> | < | <= | >= | > | in
<simple expression> ::= <term> | <sign> <term> |
<simple expression> <adding operator> <term>
<adding operator> ::= + | - | or
<term> ::= <factor> | <term> <multiplying operator> <factor>
<multiplying operator> ::= * | / | div | mod | and
<factor> ::= <variable> | <unsigned constant> |
( <expression> ) | <function designator> |
<set> | not <factor>
<unsigned constant> ::= <unsigned number> | <string> |
<constant identifier> | nil
A compound statement consists of the keyword begin, followed
by one or more statements separated by semicolons, followed by the
keyword end.
<compound statement> ::= begin <statement> {; <statement>} end
This permits several statements to be combined together into one "block"
that functions as a statement. Blocks can be nested within other blocks.
A repetitive statement may be a while, a repeat, or a for statement,
and the rule giving the syntax of a while statement is also given.
<repetitive statement> ::= <while statement> |
<repeat statement> |
<for statement>
<while statement> ::= while <expression> do <statement>
With this much of the grammar rules, we can check that the following
will parse as a legal Pascal statement:
while (a < 10) do
begin
x1 := y2;
x := y < z;
y := 7;
a := x * z + 2
end
Next time we'll look at how a compiler uses parsing and other methods
to translate a higher-level language program into an equivalent assembly
language program.
|
| 11/2/07 | Lecture 25. Context-free grammars and languages.
(See writeup in cfgs.) |
| 10/31/07 | Lecture 24. Regular expressions and finite state machines.
(See writeup in regexp.) |
| 10/29/07 | Lecture 23. Regular expressions and finite state machines.
(See writeup in regexp.) |
| 10/26/07 | Lecture 22. The TC-201 concluded; start of regular expressions.
Further explanation of loadi (load indirect) and storei (store indirect). Procedures read, display and newline, and their use in implementing the read and write instructions of the TC-201. A sample procedure to read in a number, print it out, and return its value:
(define test
(lambda ()
(display "input = ")
(let ((number (read)))
(display "output = ")
(display number)
(newline)
number)))
Note that we use a let to name the number read in, and that the
last expression, number, returns its value as the value of the procedure
call (test).
Three possibilities for computer arithmetic when we want positive, zero and negative numbers: (1) sign/magnitude (this the the choice in the TC-201 this year), (2) two's complement (this is typical for real machines) (3) one's complement (less typical, but has been used.) In sign/magnitude, we treat the first bit as a sign bit, with 0 for + and 1 for -, and the remaining bits as specifying the unsigned magnitude (absolute value) of the number in binary. In two's complement, the nonnegative numbers are those whose first bit is 0, and the negative of a number n is obtained by subtracting it from 2^b, where b is the number of bits in the representation. In one's complement, the nonnegative numbers are those whose first bit is 0, and the negative of a number n is obtained by complementing each bit individually. Because with b bits we have 2^b bit patterns (an even number), and in the range of integers from -m to +m we have a odd number of numbers, each of these systems has a "glitch". For sign/magnitude and one's complement, the glitch is -0. For two's complement, the glitch is a negative number (a 1 followed by all 0's) whose positive value is not representable. Examples of all three number systems for b = 4 bit numbers; bits unsigned integer sign/magnitude 2's complement 1's complement 0000 0 0 0 0 0001 1 1 1 1 0010 2 2 2 2 0011 3 3 3 3 0100 4 4 4 4 0101 5 5 5 5 0110 6 6 6 6 0111 7 7 7 7 1000 8 -0* -8* -7 1001 9 -1 -7 -6 1010 10 -2 -6 -5 1011 11 -3 -5 -4 1100 12 -4 -4 -3 1101 13 -5 -3 -2 1110 14 -6 -2 -1 1111 15 -7 -1 -0*The "glitches" are marked with *. Note that the 2's complement numbers are equivalent to arithmetic mod 16: 3+13 = 0 (mod 16), so 13 acts as the additive inverse of 3. This means that addition of positive and negative numbers can be done by the same circuit. Example of the use of the unix utility egrep and regular expressions to search for occurrences of caar, cadr, cdar, cddr, caaar, etc. in a set of files: > egrep 'c(a|d)(a|d)(a|d)*r' *.tex |
| 10/24/07 | Lecture 21. The TC-201 continued.
Perlis aphorism #27: Once you understand how to write a program get someone else to write it. Perlis aphorism #31: Simplicity does not precede complexity, but follows it. (Explanation of the solution of the last problem on the midterm.) We wrote a self-modifying (!) program to read in a zero-terminated sequence of numbers and store them in consecutive memory locations. We then rewrote the program using the new instructions: loadi (load indirect) and storei (store indirect). |
| 10/22/07 | Lecture 20. The TC-201 continued.
We continued on the architecture and instructions of the TC-201, first writing and simulating a program at the machine language level, and then moving up to the assembly-language level. Description of a simple two-pass assembler: construct the symbol table, giving the translations of symbolic addresses, and then use the symbol table to translate the instructions and data statements into machine language. |
| 10/19/07 | Lecture 19. Architecture of the TC-201.
A summary of the TC-201 architecture and instructions is available as: tc-201.txt. Perlis aphorism #44: Sometimes I think that the only universal in the computing field is the fetch-execute cycle. We looked at the first pages of the Wikipedia entry on Von Neumann architecture in Spanish and English, specializing its explanation to the toy computer architecture (the TC-201) that will be the focus of assignment #5. To understand the fetch-decode-execute cycle, the execution of a program was described at the register-transfer level. In particular: the program counter (PC) is copied to the memory address register (MAR) and a read is done: the contents of the memory data register (MDR) are copied to the instruction register (IR). The opcode (first 4 bits in the TC-201) of the instruction in the IR are decoded to determine which instruction should be executed. For example, for a load instruction, the address field (the bottom 12 bits in the TC-201) of the instruction in the IR are copied to the MAR, and a read is issued. The contents of the MDR are then copied to the accumulator (ACC). Then the PC is incremented and the process repeats. The "fetch" is getting the instruction from memory, the "decode" is done by the control unit, generating control signals appropriate for this instruction, and the "execute" is done by the CPU, possibly with further accesses to memory, accomplishing the desired effect of the instruction. |
| 10/17/07 | No Lecture: Exam 1. |
| 10/15/07 | Lecture 18. Memory.
We reviewed the design and behavior of a D-flipflop: it can be used to store one bit of information. An array of four D-flipflops with a common enable signal can be used as a 4-bit register. We then organized four 4-bit registers in a tiny random-access memory. Two bits of address determine which of the four registers should be read from or written to. (See also the writeup memory.) |
| 10/12/07 | Lecture 17. Gates and circuits, continued.
We continued on the topic of circuits with feedback loops, and looked at the design of an RS-latch using two cross-coupled NOR gates. It is capable of storing one bit. We also briefly looked at the design of a D-flipflop, which stores one bit and has an enable signal. (See also the writeup memory.) |
| 10/10/07 | Lecture 16. Gates and circuits, continued.
We continued and completed our design of a 4-bit binary addition circuit taking two 4-bit numbers (x3,x2,x1,x0) and (y3,y2,y1,y0) to their sum (z4,z3,z2,z1,z0). (See the writeup addition-circuit.) We also began looking at non-combinational circuits, that is, circuits that have "loops" of wires. We continued with that topic in Lecture 17. |
| 10/8/07 | Lecture 15. Gates and circuits, continued.
(See also the writeup gates-and-circuits.) We consider designing a simple circuit to sound an alarm (in an auto) if the key is in the ignition and the door is open, or the key is in the ignition and the seat belt is not fastened. The conditions of the key, door and seatbelt can be sensed (for example, by a button switch on the door that closes a circuit when the door is closed), and return the Boolean values: k = 1 if the key is in the ignition, d = 1 if the door is closed, and b = 1 if the seatbelt is fastened. We want a = 1 if and only if the alarm is to sound. Translating the prose description of the conditions for a = 1 directly, we get a = (k * d') + (k * b'). We could simplify this by factoring out the k, to give a = k * (d' + b'), which we can implement using two NOT gates, an OR gate and an AND gate. We could also observe that (d' + b') = (d * b)', and implement the function a = k * (d * b)' using just an AND gate and a NAND gate (or, if we don't have a NAND gate, two AND's and a NOT gate.) Next we considered the task of designing a circuit to compute z, where z = 1 if and only if x1 = y1 and x2 = y2 and x3 = y3 and x4 = y4. We can make use of last lecture's implementation of a function to test whether two bits are equal, namely w = (x * y) + (x' * y'), or, we can observe that this is also w = (x XOR y)', which gives us an implementation with an XOR gate and a NOT gate. We draw a box around this subcircuit and call it EQ(x,y). Then what we want is z = EQ(x1,y1) * EQ(x2,y2) * EQ(x3,y3) * EQ(x4,y4). We considered two alternatives for how to arrange the 3 AND gates to compute this, corresponding to the two expressions (((w1 * w2) * w3) * w4) and ((w1 * w2) *(w3 * w4)). We might prefer the second implementation to the first as being "faster." That is, if we imagine that 1 unit of gate delay must pass between the time that the inputs to a (basic) gate are stable and the time that we can assume the output(s) are stable, then the layer of EQ(xi,yi) gates all operate in parallel, so that we can assume that all their outputs are available after 2 gate delays. Then the structure (((w1 * w2) * w3) * w4) incurs 3 more gate delays, one until (w1 * w2) is stable, another until ((w1 * w2) * w3) is stable, and another until (((w1 * w2) *w3) * w4) is stable. The other structure incurs only 2 gate delays, because after the first one, both (w1 * w2) and (w3 * w4) become stable (in parallel), and after the second one, ((w1 * w2) * (w3 * w4)) becomes stable. Three versus two gate delays may not seem like a big deal, but if we were computing the equality of two n-bit quantities x1, x2, ..., xn and y1, y2, ..., yn, then the "sequential" solution incurs n+1 gate delays (2 for the EQ layer and then n-1 more for the successive AND's) while the "parallel" solution incurs 2+log n gate delays (2 for the EQ layer and then log n more parallel layers to reduce n down to 1 by cutting it in half each time, assuming for simplicity that n is a power of 2.) When n becomes large, the difference between n+1 and 2+log n becomes substantial. (The current Zoo computers have 64-bit data paths, that is, n = 64.) As another example of designing a circuit, we considered the problem of binary addition of two n-bit quantities to find an (n+1)-bit sum. (See the writeup addition-circuit.) |
| 10/5/07 | Lecture 14. Boolean functions and Boolean expressions concluded;
gates and circuits begun.
(See also the writeup gates-and-circuits.) Last time we saw that the set {AND, OR, NOT} is a complete basis for the Boolean functions, and asked which subsets of this set might also be complete bases for the Boolean functions. The two sets {AND, NOT} and {OR, NOT} are complete bases. Using DeMorgan's laws and the law of double negation, we observe that (a + b) = (a' * b')' and (a * b) = (a' + b')'. Thus, if we have an expression E that uses AND, OR, and NOT, we can replace every OR using the equivalent expression with just AND and NOT, and obtain an expression equivalent to E that uses just AND and NOT. (And dually for the basis {OR, NOT}.) However, the set {AND, OR} is not a complete basis for the Boolean functions. In particular, every expression that we construct using constants, variables, AND, and OR, represents a monotonic Boolean function. (A Boolean function is monotonic if it is never the case that changing an input from 0 to 1 changes the value of the function from 1 to 0.) It is not hard to see that constants and variables represent monotonic Boolean functions. Then we show that if E1 and E2 represent monotonic Boolean functions, so do (E1 * E2) and (E1 + E2), and conclude by induction on the structure of Boolean expressions not using NOT that they all represent monotonic Boolean functions. Since NOT is not a monotonic Boolean function (changing its argument from 0 to 1 changes its value from 1 to 0), we see that we cannot define NOT from constants, variables, AND and OR. Because {AND, OR} is not a complete basis, neither {AND} nor {OR} can be one either. It is clear that {NOT} cannot be a complete Boolean basis, because expressions not using AND and OR cannot represent Boolean functions of more than one argument. Hence the subsets of {AND, OR, NOT} that are minimal Boolean bases are just {AND, NOT} and {OR, NOT}. However, if we venture outside of this set, we can find complete Boolean bases that consist of a single two-input Boolean function, namely, {NAND} and {NOR}. The NAND function is "not-and", defined as (x NAND y) = (x * y)'. Dually, the NOR function is "not-or", defined as (x NOR y) = (x + y)'. Truth tables for these are as follows. x y (x NAND y) (x NOR y) 0 0 1 1 0 1 1 0 1 0 1 0 1 1 0 0To see that NAND by itself is a complete Boolean basis, we observe that x' = (x NAND x) and (x * y) = ((x NAND y) NAND (x NAND y)). (Or, assuming constants are available: x' = (x NAND 1) and (x * y) = ((x NAND y) NAND 1).) Thus we can replace NOT and AND with expressions involving only NAND (or only NAND and constants), so because {AND, NOT} is a complete Boolean basis, so is {NAND}. To see that {NOR} is also a complete Boolean basis, we can argue dually. Gates and circuits. We now turn to a model of computation of Boolean functions. A gate has inputs and output(s), where each output is a specified Boolean function of its inputs. We consider the basic gates: AND, OR, NOT, XOR, NAND, NOR, each of which has its characteristic symbol (illustrated on the board but not here.) A circuit consists of gates and wires. Each wire is either an input of the circuit or the output of some gate. Each gate's inputs are wires. Some of the wires of the circuit are designated as its outputs. For now, we consider only "combinational circuits", which have no "loops" of wires. We considered a diagram of a circuit for the Boolean function z = (x * y) + (x' * y'), where the output z is 1 if and only if the inputs x and y are equal to each other. That is, it has the truth table x y z = (x * y) + (x' * y') 0 0 1 0 1 0 1 0 0 1 1 1Given a circuit with one output, we can compute a Boolean expression for the Boolean function it computes recursively, starting with the output wire, as follows. The base case is a constant or a variable: the expression is just that constant or variable. Otherwise, the output wire is the output of some gate. We recursively compute the Boolean expressions for its inputs, and then substitute them into a Boolean expression for the function computed by the output gate itself. Note that this procedure would run into trouble if we permitted "loops" of wires. |
| 10/3/07 | Lecture 13. More on Boolean functions and Boolean expressions.
(See also the writeup boolean-exps.) Two Boolean expressions are equivalent, denoted E1 = E2, if in every environment they have the same value. This can be verified by looking at their truth tables. For example, computing the truth tables of (x + y')' and (x' * y): x y (x + y')' (x' * y) 0 0 0 0 0 1 1 1 1 0 0 0 1 1 0 0we see that these two expressions have the same truth value for every environment, so (x + y')' = (x' * y). Other terminology for Boolean expressions: an expression E is satisfiable if there is some environment in which it takes the value 1, i.e., if there is at least one 1 among the values in its truth table; an expression E is a tautology if takes the value 1 in every environment, that is, if E = 1; an expression E is unsatisfiable, or a contradiction, if it takes the value 0 in every environment, that is, if E = 0; E is contingent if it takes the value 0 in some environments and the value 1 in some environments, that is, is not equivalent to 0 and is not equivalent to 1. Axioms for Boolean algebra. A handout with 16 axioms for Boolean algebra equivalences. Every axiom states a true equivalence: we can verify each equivalence by making truth tables for the left-hand and right-hand sides of each equivalence and verifying that they are the same. The axioms are used in a proof system in which every substitution instance of an axiom can be asserted, and any subexpression of an expression may be replaced by something equivalent to it, and the result is equivalent to the original expression. As an example, consider the following derivation of axiom (A2) a = a + a, from the other axioms. First, a + (a * a) = a, is a substitution instance of (A8) (with a replaced by a and b replaced by a.) Then, by (A1) we have a * a = a, so we may replace (a * a) by a in this expression to get a + a = a, that is (A2). Hence, this set of axioms is not minimal (we could leave out A2 and rederive it, for example.) It is however, sound (only true equivalences can be proved) and complete (every true equivalence can be proved.) It also exhibits duality: the dual of a Boolean formula is obtained by replacing 0 by 1, 1 by 0, + by * and * by +. The axioms come in dual pairs, for example, (A1) a * a = a is dual to (A2) a + a = a, and (A11) 0 * a = 0 is dual to (A14) 1 + a = 1. In fact, if we have an equivalence and take the dual of both sides, the resulting pair of formulas is also an equivalence. Some useful equivalences provable from the axioms: DeMorgan's Laws: (1) (a + b)' = (a' * b') and (2) (a * b)' = (a' + b'), and the law of double negation: (a')' = a. Using these, we can prove that (x + y')' = (x' * (y')') = (x' * y) rather than verifying this with truth tables. To see that every Boolean function can be expressed as a Boolean expression (once we assign variables to the arguments of the Boolean function), we considered the Sum-of-products algorithm, which produces a formula in disjunctive normal form from a truth table representation of a Boolean function. (There is also a dual algorithm, the Product-of-sums algorithm.) For example, for the function f(x,y,z) given by the following truth table: x y z f(x,y,z) 0 0 0 1 0 0 1 0 0 1 0 1 0 1 1 0 1 0 0 1 1 0 1 0 1 1 0 0 1 1 1 1we process each 1 in the value column, producing a conjunction of variables and their negations that is 1 on just that one row of the truth table. x y z f(x,y,z) x'y'z' x'yz' xy'z' xyz 0 0 0 1 1 0 0 0 0 0 1 0 0 0 0 0 0 1 0 1 0 1 0 0 0 1 1 0 0 0 0 0 1 0 0 1 0 0 1 0 1 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 1 1 1 1 0 0 0 1Then by taking the disjunction (OR) of all these terms, we get an expression that has 1's exactly where f(x,y,z) does. That is, f(x,y,z) = x'y'z' + x'yz' + xy'z' + xyzis the Boolean formula produced by the sum-of-products algorithm for this function. Hence {AND, OR, NOT} is a "complete basis" for the Boolean functions. What about {AND, OR} or {AND, NOT} or {OR, NOT}? Are any of these a complete basis for the Boolean functions? |
| 10/1/07 | Lecture 12. Boolean functions and Boolean expressions.
Boolean values: 0 and 1 may be interpreted as false and true. A Boolean function of n arguments maps each n-tuple of 0's and 1's to a 0 or a 1. Examples of Boolean functions: and, or, not, exclusive or. How many Boolean functions of n arguments are there? 2^(2^n), a very rapidly growing function of n: 2, 4, 16, 256, 65536, ... (Note that each value is the square of the preceding value.) Boolean expressions. The syntactic definition is recursive: (1) The constants 0 and 1 are Boolean expressions, (2) Each variable x, x1, x2, ..., y, y1, y2, ..., z, z1, z2, ... is a Boolean expression. (3) If E1 is a Boolean expression, then so is (E1)', the negation of E1, (4) If E1 and E2 are Boolean expressions, then so are (E1 + E2) (the disjunction of E1 and E2) and (E1 * E2) (the conjuction of E1 and E2.) Examples of Boolean expressions: 0, 1, x, (x)', (x + y), ((x)' + y), ... Note that by making a convention about precedence, we can leave out some of the parentheses and guarantee that the parentheses can be restored to the expression in a unique way. The convention is that negation takes precedence over conjuction, which in turn takes precedence over disjunction. For example, x' + y*z' is intepreted as ((x)' + (y * (z)')). We assign a semantics to Boolean expressions as follows. An "environment" is an assignment of a 0 or 1 to every variable. (In fact we will only ever be interested in a finite set of variables, but it is tidier to imagine every variable receiving an assigned value of 0 or 1.) To find the value of a Boolean expression E in an environment, we proceed according to the recursive definition of the syntax of the expressions. That is, if E is a constant 0, then its value in any environment is 0, and similarly for E being the constant 1. If E is a variable, then the value of E in the environment is just the value (0 or 1) assigned to that variable in the environment. If E is a negation, say E = (E1)', then we (recursively) find the value of E1 in the environment, and the value of E is the NOT of the value of E1. If E is a disjunction, say E = (E1 + E2), then we find the values of E1 and E2 in the environment, and the value of E is the OR of the values of E1 and E2. Finally, if E is a conjuction, say E = (E1 * E2), then we find the values of E1 and E2 in the environment, and the value of E is the AND of the values of E1 and E2. We can summarize the meaning of a Boolean expression E by means of a truth table giving the values of the expression in all possible environments. Because the value of E in an environment does not depend on the values of variables that do not occur in E, the truth table only needs to show all possible assignments to the variables that occur in E. For example, for the expression (x + y')' we have the following truth table. x y (x + y')' 0 0 0 0 1 1 1 0 0 1 1 1 |
| 9/28/07 | Lecture 11. The existence of uncomputable functions and the Halting Problem.
See also the writeup of the Halting Problem for Turing machines: halting-problem. We give two different proofs that uncomputable functions exist. The first shows that there are "more" functions than Turing machines, and since each Turing machine can compute at most one function, there must be functions that are not computed by any Turing machine. More formally, we show that the set of all Turing machines is countable, while the set of all functions from the natural numbers to the natural numbers is uncountable.For concreteness, we consider Turing machines with tape alphabet: blank, 0, and 1; we'll also assume that the states are labelled q1, q2, q3, ... How many 2-state Turing machines of this sort are there? There are two states and three alphabet symbols, so each such machine can have at most 6 instructions, one starting with (q1, b, ...), one starting with (q1, 0, ...), and so on, up to one starting with (q2, 1, ...). How many different ways can the "..." portions be filled in? There are two choices for the next state, three choices for the symbol to write, and two choices for the direction to move. Thus, there are 12 possible instructions that start (q1, b, ...), 12 that start (q1, 1, ...), and so on. In addition to these 12 possibilities, we could have NO instruction starting with (q1, b, ...) or NO instruction starting with (q1, 1, ...), and so on. Thus, there are 13 possibilities for each of the 6 possible instructions, leading to 13^6 = 4,826,809 possible 2-state Turing machines over the tape alphabet b, 0, and 1. This number, though large, is finite. A similar argument would apply for 1-state machines, 3-state machines, and so on. Thus, the number of n-state machines for any positive integer n is finite. This gives us a systematic way of listing every possible Turing machine: first list all the 1-state machines (in some order), then all the 2-state machines, then all the 3-state machines, and so on. Thus, we come up with a list: M0, M1, M2, M3, ....of every possible Turing machine. A set is countable if it is either finite or can be put into 1-to-1 correspondence with the natural numbers. This listing shows that the Turing machines can be put into 1-to-1 correspondence with the natural numbers. Thus, the set of all Turing machines is countable. By contrast, the set of all functions from the natural numbers to the natural numbers is uncountable, that is, cannot be put into 1-to-1 correspondence with the natural numbers. We can prove this using Cantor's diagonal argument as follows. Let S be any countable set of functions, say S = f0, f1, f2, f3, f4, ...We will show that there is some function g that is not in S, which means that the set of all functions is uncountable. Consider the infinite table whose rows are indexed by m = 0, 1, 2, .. and whose columns are indexed by n = 0, 1, 2, ... and whose (m,n) entry is the value of fm(n):
0 1 2 3 4 5 ...
f0 0 0 1 1 2 2 ...
f1 1 0 1 0 1 0 ...
f2 16 14 39 22 6 11 ...
f3 256 128 64 32 16 8 ...
.
.
.
where for concreteness I have shown particular values for the functions.
(Thus, for example, f0(0) = 0, f0(1) = 0, f0(2) = 1, f0(3) = 1, f0(4) = 2,
and f0(5) = 2, which means that f0 *might* be the function (quotient n 2).)
The way we define a function g not in this table is as follows:
For all n, g(n) = 1 + fn(n).So in our example, g(0) = 1 + f0(0) = 1 g(1) = 1 + f1(1) = 1 g(2) = 1 + f2(2) = 40 g(3) = 1 + f3(3) = 33 ...Note that for every function in the table, g(n) differs from it for at least one argument, namely, if the function is fi, then g(i) is one more than fi(i). Thus, g cannot be in the table. Hence, no countable set can contain all the functions from the natural numbers to the natural numbers, so this set is not countable, that is, uncountable. The conclusion of these two arguments is that uncountably many functions from the natural numbers to the natural numbers are uncomputable. This is somewhat unsatisfying: we would like to see a particular example of an uncomputable function, rather than being assured that they are "all around us," as it were. The next argument shows that a particular function, the Halting Problem, is uncomputable. The Halting Problem for Turing machines is: given a Turing machine T and an input string t of 0's and 1's (both properly encoded), determine whether the machine T eventually halts when started in state q1 with its head on the leftmost symbol of t, or not. If so, halt with output 1; otherwise, halt with output 0. We can consider also the Halting Problem for Scheme programs, expressed as follows. Can we write a procedure (halts? proc exp) that takes as input a procedure proc and an expression exp, and returns #t if (proc exp) halts, that is, eventually returns some value, and #f otherwise, that is, (proc exp) would never return a value? We show using a proof by contradiction that it is impossible to write a procedure halts? with this behavior. Assume that there exists a procedure halts? with the specified behavior. Then we can define the following procedure:
(define q
(lambda (proc exp)
(if (halts? proc exp)
(q proc exp)
#f)))
The procedure takes as input a procedure proc and an expression
exp, and applies the halts? procedure to them.
If (halts? proc exp) returns #t, then q makes a tail recursive
call to itself on the same arguments, which means that
it will call itself over and over, and never halt, that is,
never return a value.
If, instead, (halts? proc exp) returns #f, then q returns
the value #f, that is, it halts.
We can summarize this behavior as: (q proc exp) halts
if and only if (proc exp) doesn't halt.
Now we define another procedure (s exp) as follows:
(define s
(lambda (exp)
(q exp exp)))
All that (s exp) does is call (q exp exp).
Now consider the application (s s), which
clearly has the same behavior as (q s s).
So (s s) halts if and only if (q s s) halts,
and this is true (by the argument above) if and only
if (s s) doesn't halt.
This is a CONTRADICTION: (s s) halts if and only
(s s) doesn't halt
This contradiction shows that the assumption
that the halts? procedure exists is incorrect,
that is, the behavior required of the halts?
procedure is uncomputable.
Why would we be so perverse as to call a program on its own code? Imagine a style-checker for Java programs, written in Java -- it would be perfectly natural to call it on its own code, to see if it was following the style conventions it was checking for. When a new programming language L is introduced, it is often a significant milestone when it becomes possible to write a compiler for L in L, and then use it to compile its own code. A solution to the halting problem might be a very useful thing, except that it doesn't exist. See also the writeup of the Halting Problem for Turing machines: halting-problem. The Turing machines H, Q, and S correspond to halts?, q, and s in the Scheme treatment above. |
| 9/26/07 | Lecture 10. Computability and uncomputability.
Perlis epigrams: 83. What is the difference between a Turing machine and the modern computer? It's the same as that between Hillary's ascent of Everest and the establishment of a Hilton hotel at its peak. 21. Optimization hinders evolution. The natural numbers are the nonnegative integers: 0, 1, 2, ... We consider functions from the natural numbers to the natural numbers. Three examples of such functions are: f(n) = n^2 + 1, g(n) = 1 if n is odd and 0 if n is even, and h(n) = 1 if n is a prime number and 0 if n is not a prime number. A function f(n) is computable if there is a program P such that for every natural number n, when P is run on input n, it halts and its output is the value f(n). To make this definition mathematical, we need to have definitions for the concepts of a program, the input to a program, the output of a program, and what it means for a program to run and to halt. In the 1930's logicians were attempting to formalize these notions, by specifying what it meant to be an "effective procedure." There were a number of competing definitions, including Church's lambda calculus (the ancestor of LISP and Scheme), Turing machines (see last lecture), Markov algorithms, Kleene's recursive functions, and Post's rewriting systems. Each claimed to be the right way to formalize the notion of "effective procedure" or program. In the excerpt of Turing's paper, you can see his arguments that Turing machines properly capture our intuitive notion of effective procedure. For concreteness, we shall assume that the input n to a Turing machine is presented in unary by a string of n 1's, with the head positioned on the leftmost 1 and the Turing machine started in state q1. Similarly, the output, f(n), should be represented by a string of f(n) 1's on the tape, with no other non-blank symbols and the Turing machine head positioned on the leftmost 1 of the output when it halts. Note that 0 will be represented by a completely blank tape, with the Turing machine head positioned on any blank symbol. You will write in the homework a Turing machine that multiplies two unary numbers, so it should be clear (with the copying machine developed in the last lecture) that there is a Turing machine to compute the function f(n) = n^2 + 1. A little more thought should convince you that the functions g(n) and h(n) (defined above) are also computable by a Turing machine. It turned out that each of these quite different formalisms defined exactly the same set of computable functions, giving us considerable confidence that this set is quite natural and is, in fact, a good formal correlate of the intuitive notion of "effective procedure." This idea is captured in the "Church-Turing thesis," which states that the intuitive notion of effective procedure is properly represented by Turing machines (or lambda calculus expressions, or ..) It is not a "theorem" because it relates an intuitive notion to a formal one. There was a brief excursion on the topic of how we might come to reject the Church-Turing thesis, if, for example, some physical phenomenon made possible types of computations that were poorly represented by our current models. However, there were theorems involved in supporting it, namely, the theorems that proved that every function computable by a Turing machine is computable by a lambda expression, and vice versa, and so on, for each pair of formalisms considered. These theorems were proved by the technique of "simulation" -- intuitively, one computer "pretending to be" another one. For example, by defining a Turing machine that could take as input a representation of one of Kleene's recursive functions and an argument for the function, and compute the result of applying the recursive function to the argument, it can be shown that any function computable by a recursive function is computable by a Turing machine, and hence, that Turing machines are at least as powerful as Kleene's recursive functions. There is a natural question that arises: Are there functions that are uncomputable? Next lecture we'll see two answers: (1) Yes, because there are "more" functions than Turing machines (that is, the number of functions is uncountable while the number of Turing machines is countable.) (2) Yes, because we'll see a specific uncomputable function (the halting problem.) |
9/24/07 | Lecture 9. Turing machines.
We start a new topic: Computability, Unsolvability, and Turing machines. Turing machines are covered in the pdf document: Lecture 9. |
| 9/21/07 | Lecture 8. Scheme: debugging, apply, map, let, let*, and environments.
First, some useful debugging aids. When (display exp) is evaluated, it prints out the value of exp. When (newline) is evaluated, it prints out a newline character. The special form (begin exp1 exp2 ... expk) evaluates the expressions exp1, exp2, ..., expk in left-to-right order, and the value of the last one is the value of the whole expression. These may be helpful for examining intermediate values when you are debugging your code. However, THEY SHOULD NOT APPEAR in your final solutions. The procedure (apply proc lst) applies the procedure proc to the list of arguments lst. For example, (apply + '(2 3 4 5)) => 14 and (apply append '((a) (b c) (d))) => (a b c d) and (apply cons '(1 2)) => (1 . 2). It may be useful if the procedure and/or the list of arguments are the values of other computations. For example, (apply (if (> 2 1) append list) '((a) (b c) (d))) => (a b c d) and (apply (if (> 1 2) append list) '((a) (b c) (d))) => ((a) (b c) (d)). The procedure (map proc lst) applies the procedure proc to each of the top-level elements of lst and returns the list of results. For example, (map car '((a) (b c) (d))) => (a b d) and (map cdr '((a) (b c) (d))) => (() (c) ()). As another example, (map odd? '(2 3 4 6 7 10)) => (#f #t #f #f #t #f). The procedure to be mapped need not be built-in or even named, for example, (map (lambda (x) (* 2 x)) '(13 4 2 5)) => (26 8 4 10). This has the same effect as our earlier procedure double-each. Recall double-each:
(define double-each
(lambda (lst)
(if (null? lst)
'()
(cons (* 2 (car lst)) (double-each (cdr lst))))))
In fact, by generalizing this pattern, we can write our own
version of map as follows.
(define our-map
(lambda (proc lst)
(if (null? lst)
'()
(cons (proc (car lst)) (our-map proc (cdr lst))))))
Note that the generalization consists of making the procedure proc
an additional argument, and cons'ing (proc (car lst)) instead
of (* 2 (car lst)) to the result of the recursive call with (cdr lst).
The built-in procedure map is somewhat more general than this,
for example we have: (map + '(1 2 3) '(4 5 6)) => (5 7 9).
Now we consider the special forms let and let*. The purpose of let is to create a new local environment with some symbols bound to some values for the purpose of evaluating some expression. For example, when (let ((a 1) (b 2)) (list a b)) is evaluated, a new local environment containing symbol a with value 1 and symbol b with value 2 is created. In this new local environment, the expression that is the body of the let, namely, (list a b), is evaluated. When the new local environment is created, it has a "search pointer" bock to the environment in which the let expression is being evaluated (which we assume is the top-level, or global, environment in the example.) Step-by-step evaluation of (list a b) in that environment: this is an application, so the symbol list is looked up in the local environment and not found there, so the search pointer is followed to the top-level environment, where list is found to be a built-in procedure. Similarly, a and b are looked up in the local environment, and found to have values 1 and 2 respectively. Then the list procedure is called with the arguments 1 and 2, resulting in the value (1 2), which is returned as the value of the let. That is, (let ((a 1) (b 2)) (list a b)) => (1 2). Thus, in finding the value of a symbol, "the relevant environment" is found by searching for the symbol in the current environment, then following the "search pointer" to another environment, and so on, until the value of the symbol is found in some environment. If it is not found in any environment in this search, the result is the "unbound variable" error message; otherwise, it is the (first) value found for the symbol in the search. What if we try (let ((a 1) (b (+ a 1))) (list a b))? Assuming that a does not have a value in the top-level environment, this will result in an error message, because the expressions giving the values of the symbols are evaluated in the current environment (not the partially-built new local environment.) This may be avoided by nesting the lets, as (let ((a 1)) (let ((b (+ a 1))) (list a b))) => (1 2). Here the outer let creates a new local environment, E, in which a has value 1. The search pointer for E points to the top-level environment. In the environment E the inner let is evaluated, which creates a new local environment E', in which b has value (+ a 1), or 2, where a is looked up in E. The search pointer for E' points to E. Then finally (list a b) is evaluated in E', where list is looked up (and found in the top-level environment), a is looked up (and found in E) and b is looked up (and found in E'.) Another way to accomplish this same effect is with (let* ((a 1) (b (+ a 1))) (list a b)). The effect of let* is the same as a nested sequence of lets, with one let for each variable, each one nested inside the let for the variable to its left, with the body of the let* nested inside all of the lets. Syntactic sugar reconsidered. The special form let* is clearly syntactic sugar for nested lets. However, let itself is syntactic sugar for a particular use of lambda and application. That is, ((lambda (a b) (list a b)) 1 2) has exactly the same effect as (let ((a 1) (b 2)) (list a b)), and some Scheme implementations may simply translate the latter into the former. To see why this is true, we need to know three more things about lambda expressions and applications: (1) An application of a procedure creates a new local environment containing the symbols that are the formal arguments of the procedure bound to their actual argument values. (2) The "search pointer" for the new local environment created by an application is simply a copy of the "birth pointer" of the procedure being applied; the body of the procedure is evaluated in this local environment. (3) The "birth pointer" of a procedure points to the environment in which the procedure was created, that is, the environment in which the lambda expression that created this procedure was evaluated. To see why ((lambda (a b) (list a b)) 1 2) has the same effect as (let ((a 1) (b 2)) (list a b)), note that it is an application. The expression (lambda (a b) (list a b)) is evaluated in the top-level environment, and creates a procedure with formal arguments (a b), body (list a b) and "birth pointer" pointing to the top-level environment. The arguments 1 and 2 evaluate to the numbers 1 and 2, and the application causes a new local environment to be created with the formal arguments of the procedure, namely, a and b, having values equal to the actual arguments, 1 and 2. Then in this new local environment, the body of the procedure, (list a b), is evaluated, returning the value (1 2). Note that for either the let expression or the application of the lambda expression, the local environment is no longer needed, and will be "garbage collected" (the storage reclaimed), once the Scheme system gets around to it. For most people, the let expression is a lot easier to parse and understand than the application of the lambda expression; syntactic sugar is sometimes quite desirable. We'll revisit environments again when we consider object-oriented methods in Scheme. |
| 9/19/07 | Lecture 7. Scheme: iterative and recursive processes.
We now make a distinction between different uses of recursion depending on whether there is something remaining to be done when a recursive call returns. For example, consider again the procedure our-length:
(define our-length
(lambda (lst)
(if (null? lst)
0
(+ 1 (our-length (cdr lst))))))
There is one recursive call, (our-length (cdr lst)), and something
remains to be done, namely adding 1 to the value returned, when
the recursive call returns its value.
This procedure generates a "recursive process."
By constrast, consider again the procedure first-digit:
(define first-digit
(lambda (n)
(if (< n 10)
n
(first-digit (quotient n 10)))))
In this case there is one recursive call, (first-digit (quotient n 10)),
but nothing remains to be done to the value it returns in order
to return it as the value of the top-level call.
This procedure generates an "iterative process."
In some circumstances we prefer the efficiency of an iterative
process (which doesn't take up stack space to remember what needs
to be done when the recursive call returns) to a recursive process
(which does need such stack space.)
This is not to say that you should worry constantly about writing
procedures that generate iterative rather than recursive processes!
There is a syntactic condition, being "tail recursive", that implies that the procedure will generate an iterative process. There may be several cases in which a procedure calls itself, but if in each case there is just one recursive call, and the value of the recursive call is returned as the value of the top-level procedure, then the procedure is tail recursive. A correct implementation of Scheme must treat a tail recursive procedure properly, that is, by making sure that it generates an iterative process. The procedure first-digit is tail-recursive, the procedure our-length is not tail-recursive. In some cases it is possible to rewrite a non-tail-recursive procedure in a tail-recursive form, thus ensuring that it generates an iterative process. As an example of this, we re-write our-length as it-length, as follows.
(define it-length
(lambda (lst)
(it-length-helper lst 0)))
(define it-length-helper
(lambda (lst acc)
(if (null? lst)
acc
(it-length-helper (cdr lst) (+ 1 acc)))))
The procedure it-length is a wrapper, that simply calls it-length-helper with the original list lst and the initial value 0 of the "accumulator" acc. The role of the argument acc is to accumulate the partial results in the recursive calls to it-length-helper. Note that it-length-helper is tail-recursive; the value of the recursive call is returned unchanged. To see why this works, we consider the invariant equal to the sum of the length of lst and the value of acc. Because the recursive call reduces the length of lst by 1 and increases the value of acc by 1, the invariant is preserved in all the recursive calls. Because the invariant is initially the length of the input list (length of lst plus 0), this means that when the base case is reached, the value of acc is the length of the input list (because the length of lst at this point is 0.) For additional insight, we consider the successive calls and returns if we trace it-length-helper and evaluate (it-length '(a o u)):
(it-length-helper (a o u) 0)
(it-length-helper (o u) 1)
(it-length-helper (u) 2)
(it-length-helper () 3) => 3
(it-length-helper (u) 2) => 3
(it-length-helper (o u) 1) => 3
(it-length-helper (a o u) 0) => 3
Another procedure we wrote that generates a recursive process is factorial. Recall:
(define factorial
(lambda (n)
(if (= n 1)
1
(* n (factorial (- n 1))))))
This generates a recursive process because the result of the
recursive call is multiplied by n before it is returned.
To write a tail recursive version of this procedure,
we proceed as for it-length to add an extra argument to
accumulate the partial results, and a wrapper procedure
to present the correct interface (one positive integer argument):
(define it-factorial
(lambda (n)
(it-factorial-helper n 1)))
(define it-factorial-helper
(lambda (n acc)
(if (= n 1)
acc
(it-factorial-helper (- n 1) (* n acc)))))
Note that it-factorial-helper is tail recursive
and therefore generates an iterative process.
To see why this works, we note that the value of
n! times the value of acc remains invariant.
That is, in the recursive call the product is
(n-1)!*n*acc, which is the same as n!*acc.
At the top level acc = 1, so the product n!*acc = n!.
Thus, when the base case is reached, acc must
have the value n!.
Some intuition may be gained from looking at a trace
of the procedure for (it-factorial 4):
(it-factorial-helper 4 1)
(it-factorial-helper 3 4)
(it-factorial-helper 2 12)
(it-factorial-helper 1 24) => 24
(it-factorial-helper 2 12) => 24
(it-factorial-helper 3 4) => 24
(it-factorial-helper 4 1) => 24
These have been examples where the accumulator is an integer value, but we can also make the accumulator a list or a more complicated data structure. As an example of a list accumulator, consider the procedure to reverse (the top-level) elements of a list: (reverse '(a o u)) => (u o a). We will write our own iterative version of reverse:
(define it-reverse
(lambda (lst)
(it-reverse-helper lst '())))
(define it-reverse-helper
(lambda (lst rlst)
(if (null? lst)
rlst
(it-reverse-helper (cdr lst) (cons (car lst) rlst)))))
It is clear that it-reverse-helper is tail recursive, and generates
an iterative process.
To see that it is correct, consider the invariant equal
the appending the reverse of the list lst and the list rlst.
If lst is nonnull, then the reverse of lst is equal to
the reverse of (cdr lst) with the (car lst) appended to the
end.
Therefore, if we cons (car lst) to rlst, the invariant
is preserved.
Hence, when the base case is reached, rlst is equal to the
reverse of the orignial list.
Example of a trace:
(it-reverse-helper (a o u) ())
(it-reverse-helper (o u) (a))
(it-reverse-helper (u) (o a))
(it-reverse-helper () (u o a)) => (u o a)
(it-reverse-helper (u) (o a)) => (u o a)
(it-reversse-helper (o u) (a)) => (u o a)
(it-reverse-helper (a o u) ()) => (u o a)
The last part of the lecture was devoted to demonstrating the game "Shut the Box", which is the subject of the second half of homework #2. |
| 9/17/07 | Lecture 6. Scheme: paradigms for recursion.
Perlis epigrams: 40. There are two ways to write error-free programs; only the third one works. 1. One man's constant is another man's variable. Last lecture we wrote our own version of the procedure to find the length of a list; this had a base case of 0 for the empty list, and added 1 for every element in the list. Another pattern is to "do something" to every element of a list and make a list of the results. For example, we write (double-each lst) to double each element of a list of numbers. The base case is the empty list, which should return the empty list. The recursive case is a non-empty list, in which case we double the first element (the car of the list) and cons it into the list obtained by a recursive call on the rest of the input elements (the cdr of the list.)
(define double-each
(lambda (lst)
(if (null? lst)
'()
(cons (* 2 (car lst)) (double-each (cdr lst))))))
Note that instead of '(), we could return lst itself (because in that
case it is empty.) However, this form makes it obvious to the reader
what value is being returned.
Another paradigm for recursion on lists is a search through the elements of the list until some condition is satisfied. For example, we write the procedure (member? item lst) to return #t if item is a top-level element of lst, and #f otherwise. The base case is when lst is the empty list. Since the empty list has no elements, the result should be #f no matter what item is. Otherwise, the list lst is not empty, and we compare the first element of the list with item. If they are equal, then the value returned should be #t (because we have found at least one top-level element equal to item -- we do not need to look any further.) Finally, if they are unequal, we should continue the search in the rest of the list; this is a recursive call to member? with item and the rest of the list (the cdr of the list.)
(define member?
(lambda (item lst)
(cond
((null? lst) #f)
((equal? item (car lst)) #t)
(else (member? item (cdr lst))))))
Note that there is a built-in procedure member (no question mark),
which returns #f if item is not a top-level element of the list
lst, and returns the rest of the list from the first occurrence
of item if it is.
This is not strictly a predicate (which would return only #f or #t),
hence no question mark in its name.
The difference can be illustrated:
(member? 'b '(a b c)) => #t (member 'b '(a b c)) => (b c) Another paradigm for recursion on lists is to build up a list element by element. For example, we write the procedure (countdown n) which is exemplified by (countdown 3) => (3 2 1 blast-off). The base case is when n is 0, the result should be the list (blast-off). The recursive case is when n > 0, in which case we take an "executive approach" to the recursion: the call (countdown (- n 1)) will return the list starting with n-1 and ending with blast-off, into which we merely have to cons the value of n to get the final result for n.
(define countdown
(lambda (n)
(if (= n 0)
'(blast-off)
(cons n (count-down (- n 1))))))
Here, as for the factorial function, the progress towards
the base case is by subtracting 1 from n.
Another useful procedure combines two lists into one longer list. For the built-in procedure append, we have (append '(a o u) '(e i)) => (a o u e i). That is, given two lists, append makes a new list in which the elements of the first list are followed by the elements of the second list. Note the difference from cons: (cons '(a o u) '(e i)) => ((a o u) e i). We write our own version of append: (our-append lst1 lst2). There are different possibilities for base cases. When lst1 is the empty list, the returned value should be just lst2. Similarly, when lst2 is the empty list, the returned value should be just lst1. In fact, we can use just the first of these as our base case. Then the recursive case is when lst1 is not empty. If the recursive call is with the cdr of lst1 and lst2, then the returned value is the result of appending those two, and all we need to do is to cons the first element of lst1 into the result.
(define our-append
(lambda (lst1 lst2)
(if (null? lst1)
lst2
(cons (car lst1) (our-append (cdr lst1) lst2)))))
Note that progress towards the base case of lst1 being empty
is made by taking the cdr of lst1, and lst2 is unchanged in the
call.
The examples above have been examples of "flat" recursion -- doing something for some or all of the top-level elements of a list. We can also consider "deep" or "tree" recursion, which typically involves recursively treating elements of the list that are themselves lists. As an example, we write a procedure (count-symbols lst) which takes a list and counts the number of occurrences of symbols (as opposed to numbers, booleans, or other values) in lst, at whatever depth they occur. For example, (count-symbols '(13 the (best 4 5) deal is ((ours!)))) => 5, because there are 5 occurrences of symbols: the, best, deal, is, ours!, even though some of them do not occur as top-level elements of the list lst. There is an easy base case: when lst is the empty list, there are 0 occurrences of symbols in it. When lst is not empty, there are some additional cases we must distinguish. For example, if the first element of the list is a symbol (tested with built-in predicate symbol?), then we can add 1 to the result of a recursive call of count-symbols on the rest of the list. If the first element of the list is itself a list (tested with predicate list?), then we need to do *two* recursive calls of count-symbols, one on the first element, and one on the rest of the elements, and add the results together. If none of these cases applies, then the first element of the list is neither a symbol nor a list, so we can safely ignore it and just return the value of a recursive call of count-symbols on the rest of the list.
(define count-symbols
(lambda (lst)
(cond
((null? lst) 0)
((symbol? (car lst)) (+ 1 (count-symbols (cdr lst))))
((list? (car lst)) (+ (count-symbols (car lst))
(count-symbols (cdr lst))))
(else (count-symbols (cdr lst))))))
|
| 9/14/07 | Lecture 5. Scheme: lists.
Perlis epigrams: 3. Syntactic sugar causes cancer of the semicolon. 15. Everything should be built top-down, except the first time. A list is a finite sequence of elements, where each element is a Scheme value. We describe lists, operations on them, and how to represent them with box-and-pointer diagrams. (Please see the text for box-and-pointer diagrams, to avoid overtaxing my ASCII art skills.) The empty list (with no elements) is printed out as (), and may be quoted as '(). A list with one element 19 is printed out as (19) and may be quoted as '(19). A list with three elements, the symbol a, the symbol o, and the symbol u is printed out as (a o u), and quoted as '(a o u). Lists may have lists as elements. Thus a list with two elements, the first of which is the list (a o u) and the second of which is the list (e i) is printed out as ((a o u) (e i)) and quoted as '((a o u) (e i)). Lists may also be created using the built-in procedure list. The expression (list e1 e2 ... ek) evaluates all of its arguments, and makes a list of them in order. Thus (list (+ 1 1) 'a (* 2 3)) => (2 a 6). Contrast this behavior with the following: '((+ 1 1) 'a (* 2 3)) => ((+ 1 1) (quote a) (* 2 3)). The basic selectors for lists are car and cdr. The expression (car lst) returns the first element of the list lst. The expression (cdr lst) returns a list equal to lst without its first element. Thus (car '(a o u)) => a (cdr '(a o u)) => (o u) (car '((a o u) (e i))) => (a o u) (cdr '((a o u) (e i))) => ((e i)) The basic constructor for lists is cons. Thus (cons item lst) returns the list that would be obtained from inserting item as the first element of lst. For example: (cons 'a '()) => (a) (cons 'a '(o u)) => (a o u) (cons '(a o u) '((e i))) => ((a o u) (e i))In terms of box-and-pointer representation, what cons does is create a *new* box (also called a cons cell), and put its first argument into the left half of the box, and its second argument into the right half of the box. Examples of lists containing the null list: '(1 () 3) is a list with three elements, the first being 1, the second being the null list, and the third being 3. '(()) is a list containing one element, namely the null list -- its box-and-pointer representation consists of one box with () in both left and right halves. Dotted pairs. It is not an error to evaluate (cons 1 2). The result is the "improper list" consisting of a single box (cons cell) with 1 in the left half and 2 in the right half. Scheme prints out this value with a dot, as (1 . 2), hence the name "dotted pair." In this course, seeing this in your output will usually be a signal that you've made an error with list operations, because we will mostly be dealing with "proper" lists. Predicates to test the type of values. The predicate (null? exp) tests whether exp is the null list, (), and returns #t if so and #f otherwise. The predicates number?, symbol?, boolean? test whether the value of their argument is a number, or a symbol or a boolean. The predicate list? tests whether its argument is a proper list. The predicate pair? tests whether its argument "starts with" a cons cell. Thus, (pair? '(1 2)) => #t and (pair? '(1 . 2)) => #t, while (list? '(1 2)) => #t and (list? '(1 . 2)) => #f. The most basic kind of recursive procedure on a list is to do something for each element of the list. For example, there is a built-in procedure length, where (length lst) returns the number of top-level elements of the list lst. For example, (length '(a o u)) => 3 (length '((a o u) (e i))) => 2 (length '()) => 0We can write our own version of the length procedure using recursion on lists as follows. The base case is the empty list -- in that case, the length procedure should return 0. The recursive case is when the list is non-empty. In that case, we call the procedure recursively on the cdr of the list, which will return a value one less than the length of the argument. We then add 1 and return the resulting value. The procedure:
(define our-length
(lambda (lst)
(if (null? lst)
0
(+ 1 (our-length (cdr lst))))))
If we were to trace the procedure our-length
and then make the call (our-length '(a o u))
we would see something like:
(our-length (a o u))
(our-length (o u))
(our-length (u))
(our-length ())
(our-length ()) => 0
(our-length (u)) => 1
(our-length (o u)) => 2
(our-length (a o u)) => 3
showing the successive recursive calls
of our-length and what they finally return.
|
| 9/12/07 | Lecture 4. Scheme, continued.
Perlis epigram: 12. Recursion is the root of computation since it trades description for time. (What does this one mean?) In using recursion to accomplish (terminating) computation, we want to have: one or more base cases, that don't involve a recursive call of the function, and recursive calls that make "progress" towards the base cases. Our factorial procedure had one base case (n = 1) and the recursive case makes "progress" towards it by reducing n by 1 at each call. Two useful built-in procedures for dealing with numbers: (quotient n d) returns the integer quotient of n divided by d and (remainder n d) returns the remainder when n is divided by d. For example, (quotient 11 4) => 2 and (remainder 11 4) => 3 because 11 = 2*4 + 3. We may use these to write procedures to find the first digit and the last digit of a non-negative number in decimal (base 10). For example, (last-digit 437) => 7 and (first-digit 437) => 4. The last decimal digit of a number is just the remainder when it is divided by 10, so
(define last-digit
(lambda (n)
(remainder n 10)))
gives us a procedure for the last digit of n.
(Note that because 8 = 0*10 + 8 we have (quotient 8 10) => 0 and
(remainder 8 10) => 8.)
For the first digit, we use the idea that if n is greater than or
equal to 10, then the first digit of n is the *same* as the first
digit of (quotient n 10).
Our base cases will be the inputs n that are less than 10.
This gives us the following procedure:
(define first-digit
(lambda (n)
(if (< n 10)
n
(first-digit (quotient n 10)))))
Consider the calls and returns if we trace first-digit and then
evaluate (first-digit 437):
(first-digit 437)
(first-digit 43)
(first-digit 4)
(first-digit 4) => 4
(first-digit 43) => 4
(first-digit 437) => 4
Note that once the first digit is determined in the base case,
each of the other calls just returns that value back up to the
top level.
This is an example of an "iterative process" -- nothing to be done
to the returned value of the recursive call -- in contrast to
a "recursive process" -- something remains to be done to the
returned value of a recursive call.
(We'll see this concept in more detail later.)
More useful special forms (especially for doing the homework.) To prevent an expression from being evaluated, use the special form quote. For example, (quote apple) => apple. That is, the value of (quote apple) is just the symbol apple; no attempt is made to look up the symbol in any environment. An abbreviation for (quote apple) is 'apple. (Yes, just one single quote, on the left; no matching quote.) You can quote more complicated expressions, like lists (to be covered soon.) For example, '(1 2 3) => (1 2 3). As an example of using quote, we write a procedure (compare x y) that takes two integers and returns the symbol greater if x > y, the symbol equal if x = y and the symbol less if x < h.
(define compare
(lambda (x y)
(if (> x y)
'greater
(if (= x y)
'equal
'less))))
Note that nested ifs are hard on the human brain.
We may instead use the special form cond.
The syntax of cond is (cond (c1 e1) (c2 e2) ... (ck ek))
or (cond (c1 e1) (c2 e2) ... (else ek)).
Each of ci and ei is an arbitrary Scheme expression.
The rule of evaluation is that c1 is evaluated,
and if it is not #f, then e1 is evaluated and its
value returned as the value of the cond.
(In this case, no other expressions are evaluated.)
If c1 is #f then e1 is not evaluated, but c2 is evaluated.
If c2 is not #f, then e2 is evaluated and its value returned
as the value of the cond.
This continues until some ci evaluates to not #f, when
ei is evaluated and returned as the value of the cond.
If no ci evaluates to not #f, then the cond has an
"unspecified value."
The "else" syntax gives a condition that evaluates to
not #f -- if all the other conditions evaluate to #f,
then the expression ek is evaluated and returned as the
value of the cond.
(You could just substitute #t for else if you don't like the
special syntax.)
Rewriting the compare procedure (above) using cond instead
of nested ifs, we get the following somewhat more readable code:
(define compare
(lambda (x y)
(cond
((> x y) 'greater)
((= x y) 'equal)
(else 'less))))
Other special forms dealing with boolean values. The special forms and, or allow the logical combination of boolean values. They are special forms for historical reasons, and have a carefully defined order of evaluation. For example, (and e1 e2 ... ek) is defined as follows. Expression e1 is evaluated, and if it is #f, then the value #f is returned as the value of the and expression (without evaluating e2, ..., ek.) If e1 is not #f, then e2 is evaluated, and if the value of e2 is #f, the value #f is returned as the value of the and expression (and e3, ..., ek are not evaluated.) This continues until the first ei that evaluates to #f, which causes #f to be returned as the value of the and expression. If none of the ei's evaluates to #f, then the value of the and expression is the value of the last expression, ek. For example, we have (and (= 1 2) (> 4 3)) => #f (this does not evaluate (> 4 3)) (and (> 4 3) (= 1 2)) => #f (this does evaluate (> 4 3)) (and (> 4 3) (< 1 2)) => #t (both (> 4 3) and (< 1 2) evaluated) (and (> 4 3) 16) => 16 (the value of 16 is not #f)There is an analogous rule for the special form or. |
| 9/10/07 | Lecture 3. Scheme, continued.
Perlis epigram: 55. A LISP programmer knows the value of everything, but the cost of nothing. New procedures are created by evaluating a lambda expression. The most basic form of a lambda expression is: (lambda list-of-arguments body), where the keyword lambda signals a special form, list-of-arguments gives a list of variables that are the formal arguments of the procedure, and body is an expression that indicates what value the procedure should return (as a function of its actual arguments.) For example, evaluating the expression (lambda (n) (* n n)) creates a procedure that takes one formal argument: n, and returns the value of n times n when it is called. Thus, we could write ((lambda (n) (* n n)) 3), which is evaluated as follows: the expression (lambda (n) (* n n)) evaluates to a procedure (as just described) and the expression 3 evaluates to the integer 3, and then (because this is an application), the procedure is called on the argument 3 and returns the value 9, which is returned as the value of the expression. That is, ((lambda (n) (* n n)) 3) => 9. We can combine this with the special form define to give a name to the procedure, thus: (define square (lambda (n) (* n n))). This creates an entry in the top-level environment for the symbol square, and makes its value the procedure that results from evaluating the lambda expression. After the define is evaluated we have (square 3) => 9. In detail: because square is not a keyword, the parentheses signal an application. The symbol square is looked up in the top-level environment, and its value is found to be the procedure, and 3 (being a constant) evaluates to 3, and then the procedure is called on the actual argument 3 and returns 9, which is returned as the value of the application. We can now create a program that does not halt via (define infinite (lambda () (infinite))). This defines the symbol infinite to be a procedure that takes no arguments (signified by the empty list of arguments: ()), and calls itself (with no arguments) when it is called. The definition of this procedure proceeds normally, with the symbol infinite being defined in the top-level environment, with the indicated procedure value. When we call infinite, by evaluating (infinite), we find that it DOES NOT HALT, but gets caught in an infinite recursive loop of calling itself. On the Linux systems, control C will escape from such a condition. To write recursive programs that halt and return a value, we need boolean values, comparisons, and the special form if. Scheme's boolean values are the constants #t (for true) and #f (for false.) Note that MIT Scheme treats #f as equivalent to the empty list (), and, in particular, prints out () instead of #f. Number comparison operations include = (equal), < (less than) > (greater than), <= (less than or equal to), >= (greater than or equal to). Thus (= 1 1) => #t, (> 2 1) => #t, and (<= 2 1) => #f. Finally, the special form if has the following syntax: (if exp1 exp2 exp3), where exp1, exp2 and exp3 are arbitrary Scheme expressions. The evaluation rule is: first exp1 is evaluated, then if it is a non-false value (anything but #f), exp2 is evaluated and its value returned as the value of the expression, and if the value of exp1 is #f, exp3 is evaluated and its value returned as the value of the expression. Thus, depending on the value of exp1 either exp2 or exp3, BUT NOT BOTH, is evaluated. Note how this rule of evaluation differs from that of an application (which would evaluate all three of exp1, exp2, and exp3.) Now we have enough to implement the recursive definition of the factorial function from Algebra II: n! = 1 if n = 1, and n! = n * (n-1)! if n > 1. As a Scheme procedure, this definition becomes: (define factorial (lambda (n) (if (= n 1) 1 (* n (factorial (- n 1)))))). Note that we are assuming that the input will be a positive integer, and have not made any provision for other inputs. One useful Scheme utility is trace and untrace. If we evaluate (trace factorial), then every subsequent call of factorial will be "traced" -- that is, each call to and return of the procedure will be printed out as it occurs. In particular, for (factorial 4) we would see the calls to (factorial 4), (factorial 3), (factorial 2), (factorial 1) being made, and the returns (factorial 1) => 1, (factorial 2) => 2, (factorial 3) => 6, and (factorial 4) => 24. To turn off tracing behavior, we call (untrace factorial). |
| 9/7/07 | Lecture 2. Scheme.
Perlis epigram: 17. If a listener nods his head when you're explaining your program, wake him up. Perlis epigram: 23. To understand a program you must become both the machine and the program. The goal of this part of the course is to install a compact and intelligible version of the Scheme interpreter in your heads. You will interact with Scheme via a read-eval-print loop (REPL), in which you type in an expression and Scheme evaluates it and prints out the value. The heart of the process is the rules for evaluation of expressions. The rules we covered in this lecture: (1) a constant evaluates to itself, (2) a symbol is looked up in the relevant environment, and the value found there is its value, (3) an application evaluates its components (in some order); the first component should evaluate to a procedure, which is then called on the values of the rest of the components as its arguments, and the resulting value is returned as the value of the application, (4) these rules are used recursively on sub-expressions, (5) the special form define may be used to add a symbol to the top-level environment with a particular value. For example, to evaluate (+ 3 4), the interpreter uses the rule for applications (because + is a symbol but not a keyword like define), looks up the value of + in the top-level environment (and finds the value which is the built-in procedure to add numbers), evaluates the constants 3 and 4 to the integers 3 and 4 respectively, calls the built-in addition procedure on the arguments 3 and 4, and takes the value, 7, returned by the addition procedure and returns it as the value of the application. Shorthand for "(+ 3 4) evaluates to 7" is (+ 3 4) => 7. Error messages "the value xxx is not applicable" (when the first component of an application does not evaluate to a procedure) and "the variable xxx is unbound" (when the attempt to look up a symbol in the relevant environment does not find it) were exemplified. How does the wily Scheme programmer create new procedures? With the special form lambda, to be discussed next. For example, (define add-100 (lambda (n) (+ n 100))) will create a procedure to add 100 to a number and make it the value of the symbol add-100 in the top-level environment. |
| 9/5/07 | Lecture 1. Introduction.
Perlis epigram: 19. A language that doesn't affect the way you think about programming, is not worth knowing. Discussion of introductory computer science courses (paper handout) and syllabus for 201 (paper or website). Distribution and collection of the student questionnaire (paper) and collection of signups for ID validation for after-hours access to AKW and the Zoo (paper, but you can also sign up by going to 009 AKW and signing up there.) Discussion of the overall structure of the course and why it is taught in Scheme. Extras of paper handouts will be available outside DA's office (414 AKW) as well as in subsequent classes. Introduction to Scheme starts in Friday's lecture. |