CS 201 - Fall 2021.

Computability

Computability I.

Summary:

the Collatz ("3x+1") Problem and conjecture,
Hilbert's 10th problem.
mathematical definitions of computation.
Turing machines. See also the notes: Turing machines.

In an early lecture, we presented a racket procedure for the Collatz conjecture (collatz.rkt) which is roughly the following:

    (define (f n)
      (cond
        [(= n 1)
         1]
        [(even? n)
         (f (quotient n 2))]
        [else
         (f (+ 1 (* 3 n)))]))

As an example, we have

(f 10) => (f 5) => (f 16) => (f 8) => (f 4) => (f 2) => (f 1) => 1

It is clear that if we manage to get to a power of 2, then the second cond clause will repeatedly apply until n is reduced to 1, and the result will be 1. It is also clear that if the procedure terminates with any value, that value will be 1. However, it is *unknown* whether (f n) will terminate for every positive integer n. This question is known as the Collatz Problem (or Collatz conjecture, for the conjecture that it does terminate for every positive integer n) or 3x+1 problem. Many many hours have been spent on this problem without resolving it. Termination is known for values of n up through some quite large numbers, but we have no proof or counterexample to establish the truth or falsity of the conjecture. This is not exactly an earth-shattering problem, but it does illustrate how even a very simple computational procedure can elude our complete understanding.

Discussion of how (f n) might fail to halt: (1) looping, (2) reaching ever larger values of n without halting.

Hilbert's 10th problem.

In 1900 the mathematician David Hilbert published a list of problems meant to stimulate and focus mathematical research for the 20th century. The 10th problem on his list can be formulated as follows.

    Given a Diophantine equation with any number of integer coefficients:
    to devise a process according to which it can be determined in a
    finite number of operations whether the equation is solvable in integers.

A Diophantine equation is an equation containing a finite number of variables that may be raised to positive integer powers and multiplied together, with positive or negative integer coefficients. The question to be answered is whether or not there are positive or negative integer values of the variables that satisfy the equation. As examples of Diophantine equations, we have

    x² + 3xy + y² = 11
    x³ - y³ = 6

For the first equation, the answer should be "yes" because x = 2 and y = 1 gives a solution (4+6+1 = 11). For the second equation, the answer is "no".

What Hilbert was asking for in the 10th problem was, in effect, an *algorithm* that could take in an arbitrary Diophantine equation as input, and answer "yes" or "no" according to whether the equation has a solution in integers. It took until 1970, but the answer to Hilbert's question turned out to be "there is no such algorithm" because the problem as stated is uncomputable -- no algorithm can exist to solve it.

How did this answer come about? In the 1930's, logicians and other mathematicians worked hard to give a formal definition of what it means to be an effective process or algorithm, and thereby establish a criterion for when a function is computable or uncomputable. A variety of different formalisms were proposed, among them the following.

    Church's lambda calculus
    Kleene's general recursive functions
    Markov's algorithms
    Post's rewriting systems
    Turing's machines

One of the outgrowths of Church's lambda calculus is the LISP-Scheme-Racket family of programming languages. These formalisms were apparently very different, but mathematicians found that they all defined the *same* set of computable functions. This was established by means of simulations -- for example, for each Turing machine, a lambda expression could be defined that would simulate the computation of the Turing machine. The resulting set of computable functions was then taken to be the definition of what we mean by a computable function.

We will look in detail at one of the formal definitions of computation: Turing machines. These were defined in a 1936 paper by Alan Turing written when he was 23 years old, titled On Computable Numbers with an Application to the Entscheidungsproblem. Despite the formidable title of the paper, part of it consists of an appeal to intuition to establish that the abstract machines he proposed (later named "Turing machines") captured the essence of the operations of a human computer following a specific series of steps in a mathematical procedure. He takes as his model of storage a child's exercise book, ruled into squares, meant to keep the columns of an addition or multiplication problem lined up by putting one digit in each square.

Instead of a two-dimensional array of squares, Turing assumes that the machine has a tape, ruled into squares, and each square may contain one symbol from a finite alphabet of symbols. A typical alphabet might be blank, 0 and 1. The length of the tape is indefinite -- we imagine that it extends as far as needed in both directions. In general, there will be a finite number of squares containing non-blank symbols, and all the other squares are assumed to contain the blank symbol. To operate on the tape, the machine has a read/write head located at one particular square of the tape, and the symbol on that square of the tape is the "current symbol." Only the current symbol may be read or written -- to operate on other symbols on the tape, the machine may move the head left or right one square at a time.

In addition to the storage of symbols on the tape, the machine has some "temporary memory", which consists of a state from a finite set of states, which we will denote q1, q2, q3, and so on. At any time, the machine is in one of the possible states, which is the "current state." As it operates, it may change from one state to another to "remember" some information (for example, what part of the computation is it in.) There is a designated "start state" of the machine, typically q1.

The Turing machine has a finite set of instructions, which determine how its computation will proceed. Each instruction consists of five parts:

(current state, current symbol, new state, new symbol, head direction)

where the current state and the new state are states from the finite set of states for the machine, the current symbol and new symbol are symbols from the finite alphabet of the machine, and the head direction is either L (for left) or R (for right). So a typical instruction might be

(q3, 0, q6, 1, R)

The computation of a Turing machine may be viewed as a sequence of steps, each step transforming a configuration of the machine to another configuration of the machine according to one of the instructions of the machine. A configuration of the machine specifies: the symbols contained by the squares on the tape, the position of the read/write head on the tape, and the current state of the machine. For example, we can specify a configuration of a machine as follows.

    -----------------------------------
    .. |   | 1 |   | 0 | 0 | 1 |   | .. 
    -----------------------------------
                     ^
                     q3

The dots (..) are meant to indicate that the tape continues indefinitely to the left and right, and all the squares not pictured contain the blank symbol. Of the squares pictured, we have the sequence of symbols:

     blank, 1, blank, 0, 0, 1, blank

The caret (^) is meant to indicate what square the read/write head of the machine is scanning. In this case, that square contains a 0, so the current symbol is 0 in this configuration. The current state of the machine is indicated below the caret -- the current state of the machine is q3.

In this configuration of the machine, the instruction

(q3, 0, q6, 1, R)

applies, because the current state is q3 and the current symbol is 0. The rest of the instruction indicate how to update the configuration to get the new configuration. The new state should be q6, the current symbol should be replaced by 1, and, finally, the position of the read/write head should be moved one symbol to the right. These changes yield the following configuration.

    -----------------------------------
    .. |   | 1 |   | 1 | 0 | 1 |   | .. 
    -----------------------------------
                         ^
                         q6

This represents one step of a Turing machine computation -- an instruction applies to a configuration to yield a new configuration.

Next lecture we'll construct some Turing machines to compute things, and then we'll see the unsolvability of the Halting Problem.

Computability II.

Summary:

the definition of computability
the Church-Turing thesis
statement and proof of the unsolvability of the Halting Problem for Racket programs.

Perlis epigram #83: What is the difference between a Turing machine and the modern computer? It's the same as that between Hillary's ascent of Everest and the establishment of a Hilton hotel on its peak.

A formal definition of computability.

A function from strings of 0's and 1's to strings of 0's and 1's is computable if there exists a Turing machine to compute it.

We have seen some examples of computable functions, for example, the function whose input is a binary number and whose output is that binary number plus 1. There are of course many others, including addition, subtraction, multiplication, division, prime testing, reversing the input string, changing every 0 to a 1 and every 1 to a 0, and so on. In fact, any function from strings of 0's and 1's to strings of 0's and 1's that you could compute with a Racket program could also be computed with a Turing machine.

Because we know that a large number of different computational systems could be substituted for Turing machines in the above definition (lambda expressions, general recursive functions, idealized Java programs, idealized Racket programs, ...) without changing the set of computable functions, we have a fair amount of confidence in the "naturalness" of the definition. (The point of "idealized" Java or Racket programs is that we consider programs with no memory limitations, unlike actual Java or Racket programs, that run on actual computers with some finite (though large) amount of memory.)

The Church/Turing thesis claims that the class of functions formally defined as computable captures what we mean by our intuitive notion of "computability." It is a thesis rather than a theorem, because it concerns the relationship between a mathematically defined concept and an intuitive, not formal, notion of ours. It is possible to imagine that we might decide to discard the formal definition of computable functions if we discovered that there were better ways to think about computability, for example, because of new discoveries in physics. However, in the meantime, the formal definition is what we mean by "computable functions."

Are there uncomputable functions?

The answer to this question is YES. One example is a function associated with Hilbert's 10th problem, namely, the function that takes in a binary string representing a Diophantine equation and outputs 1 if the Diophantine equation has a solution in integers, and 0 if it doesn't. Another approach to seeing that there are uncomputable functions is to consider the sizes, or cardinalities, of the sets involved. The set of all Turing machines is a *countable* set, that is, it can be put into one-to-one correspondence with the positive integers. Thus, the set of all computable functions is also a *countable* set. However, the set of all functions from strings of 0's and 1's to all strings of 0's and 1's is an *uncountable* set, that is, cannot be put into one-to-one correspondence with the positive integers. Thus, there must be uncomputable functions from strings of 0's and 1's to strings of 0's and 1's. (In fact, "almost all" of them are uncomputable.) This is interesting, but not terribly satisfying, because we don't actually get our hands on one of the uncomputable functions. We'll see one important uncomputable function below, the Halting Problem, and a proof that it is uncomputable.

See diagonalization argument

The Halting Problem for Racket programs.

It would be convenient to have a procedure (halts? proc expr) that takes a Racket procedure proc of one argument and an arbitrary Racket expression expr, and returns #t if (proc expr) halts and returns #f if (proc expr) doesn't halt. As an example, suppose we define the following.

    (define (nope n)
      (nope (- n 1)))

Then clearly (nope 10) => (nope 9) => (nope 8) => .. (nope -2014) => .. never halts. Then we should have (halts? sqr 4) => #t, because the built in Racket procedure sqr halts on the input 4, but we should have (halts? nope 10) => #f, because the procedure nope does not halt on the input 10.

The procedure halts? would facilitate the grading of your procedures -- if we could test to see that some procedure of yours was not going to halt on a test case, we could skip trying to run your procedure on that input, and simply assign 0 points for that test case. (As it is, we use a fixed "time out" to cut off the evaluation of your procedures on test cases if they run too long.) However convenient or useful, *no* such halts? procedure can exist. To see this, we argue by contradiction, as follows.

If we had the prcedure (halts? proc expr), we could define the following other procedures.

(define (s proc expr)
  (if (halts? proc expr)
      (nope 10)
      "hi!"))

What does the procedure s do? Clearly, using the definition of halts?, we have:

  (s proc expr)
    returns the string "hi!" if (proc expr) doesn't halt
    doesn't halt if (proc expr) halts

We may also define a procedure (q proc) as follows:

(define (q proc)
  (s proc proc))

That is, q takes a procedure proc of one argument and calls (s proc proc). What does the procedure q do?

  (q proc) calls (s proc proc)
      which returns the string "hi!" if (proc proc) doesn't halt
      and doesn't halt if (proc proc) halts

Now we ask the question of whether (q q) halts or not? It is somewhat complicated to follow the logic of the procedure definitions, so we'll consider cases:

1)  if (q q) halts, then when (q q) calls (s q q) we see that
    (s q q) doesn't halt, because (q q) halts.  Thus, if (q q)
    halts, then (q q) doesn't halt.
2)  if (q q) doesn't halt, then when (q q) calls (s q q) we
    see that (s q q) returns the string "hi!", because (q q)
    doesn't halt.  Thus, if (q q) doesn't halt, it halts (and
    returns the string "hi!")

Together these cases imply that (q q) halts if and only if it doesn't halt, a *contradiction*. Thus (assuming, as usual, the consistency of mathematics), there must have been some "flaw" in our reasoning. The "flaw" in this case is the assumption that there exists a procedure (halts? proc expr) at all -- what the contradiction proves is that assumption is false, that is, the function described in the specification of halts? is *uncomputable*.

This is the famous Halting Problem, and it is a problem for every sufficiently powerful programming language, eg, (idealized) Racket, C, C++, Java, Python, and so on. The theorem statement for Racket is as follows.

There can be no Racket procedure (halts? proc expr) that returns 
  #t if (proc expr) halts, and
  #f if (proc expr) does not halt.

It might strike you as a bit odd to write (proc proc), that is, to call a procedure on itself. You might find it more reasonable to imagine writing a C program that strips the comments out of a C program, and then running it on itself. In fact, in the life of a new programming language it is sometimes a major milestone when a compiler for the language can be re-written in the language and succeeds in compiling itself.

[Home]