CS 201: Computability

Computability I.

Summary:

In an early lecture, we presented a racket procedure for the Collatz conjecture (collatz.rkt) which is roughly the following:

    (define (f n)
      (cond
        [(= n 1)
         1]
        [(even? n)
         (f (quotient n 2))]
        [else
         (f (+ 1 (* 3 n)))]))

As an example, we have

(f 10) => (f 5) => (f 16) => (f 8) => (f 4) => (f 2) => (f 1) => 1
It is clear that if we manage to get to a power of 2, then the second cond clause will repeatedly apply until n is reduced to 1, and the result will be 1. It is also clear that if the procedure terminates with any value, that value will be 1. However, it is unknown whether (f n) will terminate for every positive integer n. This question is known as the Collatz Problem (or Collatz conjecture, for the conjecture that it does terminate for every positive integer n) or 3x+1 problem. Many many hours have been spent on this problem without resolving it. Termination is known for values of n up through some quite large numbers, but we have no proof or counterexample to establish the truth or falsity of the conjecture. This is not exactly an earth-shattering problem, but it does illustrate how even a very simple computational procedure can elude our complete understanding.

Discussion of how (f n) might fail to halt: (1) looping, (2) reaching ever larger values of n without halting.

Hilbert's 10th problem.

Starting in 1900, the German mathematician David Hilbert published a list of problems meant to stimulate and focus mathematical research for the 20th century. The 10th problem on his list can be formulated as follows.

Given a Diophantine equation with any number of integer coefficients: to devise a process according to which it can be determined in a finite number of operations whether the equation is solvable in integers.
A Diophantine equation is an equation containing a finite number of variables that may be raised to positive integer powers and multiplied together, with positive or negative integer coefficients. The question to be answered is whether or not there are positive or negative integer values of the variables that satisfy the equation. As examples of Diophantine equations, we have

$$x^2 + 3xy + y^2 = 11$$

$$x^3 - y^3 = 6$$

For the first equation, the answer should be "yes" because $x = 2$ and $y = 1$ give a solution $(4+6+1 = 11)$. For the second equation, the answer is "no".

What Hilbert was asking for in the 10th problem was, in effect, an algorithm that could take in an arbitrary Diophantine equation as input, and answer "yes" or "no" according to whether the equation has a solution in integers. It took until 1970, but the answer to Hilbert's question turned out to be "there is no such algorithm" because the problem as stated is uncomputable -- no algorithm can exist to solve it.

In fact, this was not the answer for which Hilbert was asking. He assumed that such an algorithm existed. He wanted someone to find it.

The Search for Computable Solutions: Leibniz and the boys in the band

Gottfried Wilhelm Leibniz (1646-1716) is usually recognized for developing differential and integral calculus, along with Newton. However, Leibniz is also called the founder of computer science, for creating a mechanical calculator, and refining the binary number system.

Leibniz believed that mathematics and logic were the key to solving human philosophical and social issues. He thought that you could build a machine to solve philosophy.

Several hundred years later, Bertrand Russell (1872-1970) picked up where Leibniz left off. Together with his former teacher, Alfred North Whitehead, Russell tried to reduce all of mathematics to logic in their Principia Mathematica (pace Newton). PM was a huge undertaking. They tried to derive math from first principles. For example, it took 700 pages to prove that "1 + 1 = 2".

The concept of a number came from Georg Cantor's Set Theory. The number "2" was based on sets with 2 members, or a cardinality of 2. Set theory permeated PM, and led to a fundamental problem, known as Russell's Paradox.

A set can contain anything, including other sets. We would call it recursive. For example, you could have a set of all sets. Let's explore that for a second. Instead of sets, think of Wikipedia pages. You could have a Wikipedia page that had links to every Wikipedia page. No problem.

Now consider a Wikipedia page that is a "List of all Wikipedia pages with a title that starts with the letter "L". Clearly, that page would contain a link to itself.

Next, consider a Wikipedia page that is a "List of all Wikipedia pages that do not contain links to themselves." Whoa. If this page does not have a link to itself, it needs to be added to the page. If this page does contain a link to itself, the link needs to be removed. It is a contradiction, a paradox.

An informal explanation of Russell's paradox may be given in the following way. A set can be called "normal" if it does not contain itself as a member. For example, take the set of all squares. That set is not itself a square, and therefore is not a member of the set of all squares. So it is "normal." On the other hand, if one takes the complementary set of all non-squares, that set is itself not a square and so should be one of its own members. It is "abnormal."

Now consider the set of all normal sets—give it the name R—and ask the question: Is R a "normal" set? If it is "normal," then it is a member of R, since R contains all "normal" sets. But if that is the case, then R contains itself as a member, and therefore is "abnormal." On the other hand, if R is "abnormal," then it is not a member of R, since R contains only "normal" sets. But if that is the case, then R does not contains itself as a member, and therefore is "normal." Clearly, this is a paradox: If one supposes R is "normal," one can prove it is "abnormal," and one we supposes R is "abnormal," one can prove it is "normal." Hence, R is neither "normal" nor "abnormal," which is a contradiction.

The German logician, Gottlob Frege (1848-1925), was also trying to achieve Leibniz' dream of putting philosophy on a logical footing. He was about to publish his life's work on mathematical foundations when Russell informed him of the set paradox.
Frege, wrote a gracious addendum acknowledging that his "fundamental assumption was in error" and thanking Russell for his discovery.

Back to HIlbert

Hilbert published his original list of problems in 1900 at the quadrennial International Congress of Mathematicians. There was no meeting during World War I, and for many years, German and Austrian mathematicians were not allowed to participate. They were being punished. Hilbert returned in 1928 and added some new problems.

The Austrian logician Kurt Gödel decided to tackle Hilbert's problems. In 1929, at age 23, his Ph.D. thesis, Gödel proved the Completeness Theorem which establishes a correspondence between semantic truth and syntactic provability in first-order logic.

In 1931, at age 25, Gödel proved the Incompleteness Theorems, which showed that Hilbert's program to find a complete and consistent set of axioms for all mathematics is impossible. Not exactly what Hilbert had in mind.

It also pulled the rug out from under Russell, who by this time had found a work-around for the set paradox.

One of the keys to Gödel's proof was his method of converting logical expressions, axioms, and theorems into numbers. The technique is known as Gödel numbering. Here is how it works.

Take a list of tokens used in logical proofs and assign them sequential positive integers, e.g.,

Then, to encode an expression, like "x = y", create an expression using the sequential prime numbers raised to the power of the respective tokens. Thus, "x = y" becomes

21 x 33 x 52 = 2 x 27 x 25 = 1,350

This encoding is reversible. Given the Gödel number, 1,350, you can precisely derive the expression "x = y".

What about computation?

Gödel brilliantly showed that you could prove something to be impossible. That was new. At the same time, researchers were trying to derive mathematical definitions of computation.

In the 1930's, logicians and other mathematicians worked hard to give a formal definition of what it means to be an effective process or algorithm, and thereby establish a criterion for when a function is computable or uncomputable. A variety of different formalisms were proposed, among them the following.

One of the outgrowths of Church's lambda calculus is the LISP-Scheme-Racket family of programming languages.

These formalisms were apparently very different, but mathematicians found that they all defined the same set of computable functions. This was established by means of simulations -- for example, for each Turing machine, a lambda expression could be defined that would simulate the computation of the Turing machine. The resulting set of computable functions was then taken to be the definition of what we mean by a computable function.

Turing Machines

We will look in detail at one of the formal definitions of computation: Turing machines. These were defined in a 1936 paper by Alan Turing written when he was 23 years old, titled On Computable Numbers with an Application to the Entscheidungsproblem. Despite the formidable title of the paper, part of it consists of an appeal to intuition to establish that the abstract machines he proposed (later named "Turing machines") captured the essence of the operations of a human computer following a specific series of steps in a mathematical procedure. He takes as his model of storage a child's exercise book, ruled into squares, meant to keep the columns of an addition or multiplication problem lined up by putting one digit in each square.

Instead of a two-dimensional array of squares, Turing assumes that the machine has a tape, ruled into squares, and each square may contain one symbol from a finite alphabet of symbols. A typical alphabet might be blank, 0 and 1. The length of the tape is indefinite -- we imagine that it extends as far as needed in both directions. In general, there will be a finite number of squares containing non-blank symbols, and all the other squares are assumed to contain the blank symbol. To operate on the tape, the machine has a read/write head located at one particular square of the tape, and the symbol on that square of the tape is the "current symbol." Only the current symbol may be read or written -- to operate on other symbols on the tape, the machine may move the head left or right one square at a time.

In addition to the storage of symbols on the tape, the machine has some "temporary memory", which consists of a state from a finite set of states, which we will denote q1, q2, q3, and so on. At any time, the machine is in one of the possible states, which is the "current state." As it operates, it may change from one state to another to "remember" some information (for example, what part of the computation is it in.) There is a designated "start state" of the machine, typically q1.

The Turing machine has a finite set of instructions, which determine how its computation will proceed. Each instruction consists of five parts:

(current state, current symbol, new state, new symbol, head direction)
where the current state and the new state are states from the finite set of states for the machine, the current symbol and new symbol are symbols from the finite alphabet of the machine, and the head direction is either L (for left) or R (for right). So a typical instruction might be
(q3, 0, q6, 1, R)

The computation of a Turing machine may be viewed as a sequence of steps, each step transforming a configuration of the machine to another configuration of the machine according to one of the instructions of the machine. A configuration of the machine specifies:

For example, we can specify a configuration of a machine as follows.


    -----------------------------------
    ...|   | 1 |   | 0 | 0 | 1 |   |... 
    -----------------------------------
                     ^
                     q3

The dots (...) are meant to indicate that the tape continues indefinitely to the left and right, and all the squares not pictured contain the blank symbol. Of the squares pictured, we have the sequence of symbols:

     blank, 1, blank, 0, 0, 1, blank

The caret (^) is meant to indicate what square the read/write head of the machine is scanning. In this case, that square contains a 0, so the current symbol is 0 in this configuration. The current state of the machine is indicated below the caret -- the current state of the machine is q3.

In this configuration of the machine, the instruction

(q3, 0, q6, 1, R)

applies, because the current state is q3 and the current symbol is 0. The rest of the instruction indicate how to update the configuration to get the new configuration. The new state should be q6, the current symbol should be replaced by 1, and, finally, the position of the read/write head should be moved one symbol to the right. These changes yield the following configuration.


    -----------------------------------
    .. |   | 1 |   | 1 | 0 | 1 |   | .. 
    -----------------------------------
                         ^
                         q6

This represents one step of a Turing machine computation -- an instruction applies to a configuration to yield a new configuration.

Next lecture we'll construct some Turing machines to compute things, and then we'll see the unsolvability of the Halting Problem.

Computability II.

Summary:

Perlis epigram #83: What is the difference between a Turing machine and the modern computer? It's the same as that between Hillary's ascent of Everest and the establishment of a Hilton hotel on its peak.

A formal definition of computability.

A function from strings of 0's and 1's to strings of 0's and 1's is computable if there exists a Turing machine to compute it.

We have seen some examples of computable functions, for example, the function whose input is a binary number and whose output is that binary number plus 1. (binary increment Turing Machine - needs only three states!)

(I made this diagram using fsm designer tool) FSM stands for Finite State Machine.

There are of course many others, including addition, subtraction, multiplication, division, prime testing, reversing the input string, changing every 0 to a 1 and every 1 to a 0, and so on. In fact, any function from strings of 0's and 1's to strings of 0's and 1's that you could compute with a Racket program could also be computed with a Turing machine.

Because we know that a large number of different computational systems could be substituted for Turing machines in the above definition (lambda expressions, general recursive functions, idealized Java programs, idealized Racket programs, ...) without changing the set of computable functions, we have a fair amount of confidence in the "naturalness" of the definition. (The point of "idealized" Java or Racket programs is that we consider programs with no memory limitations, unlike actual Java or Racket programs, that run on actual computers with some finite (though large) amount of memory.)

The Church/Turing thesis claims that the class of functions formally defined as computable captures what we mean by our intuitive notion of "computability." It is a thesis rather than a theorem, because it concerns the relationship between a mathematically defined concept and an intuitive, not formal, notion of ours. It is possible to imagine that we might decide to discard the formal definition of computable functions if we discovered that there were better ways to think about computability, for example, because of new discoveries in physics. However, in the meantime, the formal definition is what we mean by "computable functions."

Are there uncomputable functions?

The answer to this question is YES. One example is a function associated with Hilbert's 10th problem, namely, the function that takes in a binary string representing a Diophantine equation and outputs 1 if the Diophantine equation has a solution in integers, and 0 if it doesn't.

Another approach to seeing that there are uncomputable functions is to consider the sizes, or cardinalities, of the sets involved. The set of all Turing machines is a countable set, that is, it can be put into one-to-one correspondence with the positive integers. Thus, the set of all computable functions is also a countable set, as originally defined by Georg Cantor. Cantor proved that the integers are countable, but the real numbers are not.

See countable sets.

However, the set of all functions from strings of 0's and 1's to all strings of 0's and 1's is an uncountable set, that is, cannot be put into one-to-one correspondence with the positive integers. See uncountable set. Thus, there must be uncomputable functions from strings of 0's and 1's to strings of 0's and 1's. (In fact, "almost all" of them are uncomputable.)

This is interesting, but not terribly satisfying, because we don't actually get our hands on one of the uncomputable functions. We'll see one important uncomputable function below, the Halting Problem, and a proof that it is uncomputable.

See diagonalization argument

The diagonalization argument is an example of proof by contradiction. You assume proposition $P$ to be true, and then show that it results in a contradiction. This relies on the law of the excluded middle, which is to say that either $P$ is true, or it is not. There is no other possibility.

Other examples of proof by contradiction exist for proving that the square root of 2 is irrational, that there is no smallest real positive number, and of course, the Halting Problem.

The Halting Problem for Racket programs.

It would be convenient to have a procedure (halts? proc expr) that takes a Racket procedure proc of one argument and an arbitrary Racket expression expr, and returns #t if (proc expr) halts and returns #f if (proc expr) doesn't halt. As an example, suppose we define the following.

    (define (nope n)
      (nope (- n 1)))

Then clearly (nope 10) => (nope 9) => (nope 8) => .. (nope -2014) => .. never halts. Then we should have (halts? sqr 4) => #t, because the built in Racket procedure sqr halts on the input 4, but we should have (halts? nope 10) => #f, because the procedure nope does not halt on the input 10.

The procedure halts? would facilitate the grading of your homework assignments -- if we could test to see that some procedure of yours was not going to halt on a test case, we could skip trying to run your procedure on that input, and simply assign 0 points for that test case. (As it is, we use a fixed "time out" to cut off the evaluation of your procedures on test cases if they run too long.) However convenient or useful, no such halts? procedure can exist. To see this, we argue by contradiction, as follows.

If we had the procedure (halts? proc expr), we could define the following other procedures.

(define (s proc expr)
  (if (halts? proc expr)
      (nope 10)
      "hi!"))

What does the procedure s do? Clearly, using the definition of halts?, we have:

(s proc expr)

We may also define a procedure (q proc) as follows:

(define (q proc)
  (s proc proc))

That is, q takes a procedure proc of one argument and calls (s proc proc). What does the procedure q do?

(q proc) calls (s proc proc)

Now we ask the question of whether (q q) halts or not? It is somewhat complicated to follow the logic of the procedure definitions, so we'll consider cases, each of which leads to a contradiction.

  1. if (q q) halts, then when (q q) calls (s q q) we see that (s q q) doesn't halt, because (q q) halts. Thus, if (q q) halts, then (q q) doesn't halt.
  2. if (q q) doesn't halt, then when (q q) calls (s q q) we see that (s q q) returns the string "hi!", because (q q) doesn't halt. Thus, if (q q) doesn't halt, it halts (and returns the string "hi!")

Together these cases imply that (q q) halts if and only if it doesn't halt, a contradiction. Thus (assuming, as usual, the consistency of mathematics), there must have been some "flaw" in our reasoning. The "flaw" in this case is the assumption that there exists a procedure (halts? proc expr) at all -- what the contradiction proves is that the assumption is false, that is, the function described in the specification of halts? is uncomputable.

This is the famous Halting Problem, and it is a problem for every sufficiently powerful programming language, eg, (idealized) Racket, C, C++, Java, Python, and so on. The theorem statement for Racket is as follows. There can be no Racket procedure (halts? proc expr) that returns - #t if (proc expr) halts, and - #f if (proc expr) does not halt.

It might strike you as a bit odd to write (proc proc), that is, to call a procedure on itself. You might find it more reasonable to imagine writing a C program that strips the comments out of a C program, and then running it on itself. In fact, in the life of a new programming language it is sometimes a major milestone when a compiler for the language can be re-written in the language and succeeds in compiling itself.