Perlis epigram #44: Sometimes I think the only universal in the computing field is the fetch-execute cycle.
The term "random access memory" conveys that the time to access any particular value in the memory is (essentially) the same, as opposed to a "sequential access memory" like disk or magnetic tape, where accessing "the next" value can be considerably less time-consuming than accessing an arbitrary value (because of the travel time of the physical media, eg, radial movement of a disk head, the spinning of the disk, or the winding of the tape.)
We abstract the interface for a random access memory (RAM) as follows. There are two special registers, the memory address register (MAR) and the memory data register (MDR), and a signal indicating whether to read from the memory or write to the memory. The memory itself consists of a number n = 2A of memory registers, each of which holds B bits. Then the MDR is also B bits (it can hold the value of one memory register) and the MAR is A bits (it can hold the address -- a number between 0 and 2A-1 -- of any one of the memory registers.) When the operation is a read, the contents of the memory register addressed by the MAR is copied to the MDR. When the operation is a write, the contents of the MDR is copied to the memory register addressed by the MAR, replacing its previous contents. (In the "tiny RAM" example of the previous lecture, there was no MDR, and we had A = 2 address bits, n = 2A = 4 memory registers, and B = 4 bits in each memory register.)
To use this interface to read the contents of a register from RAM, we put the address of the register in the MAR and send the "read" signal. This causes the contents of the addressed memory register to be copied to the MDR, where we may access it. To place a particular value in a particular memory register of the RAM, we put the address of the register in the MAR, and put the value we wish to store in the MDR and send the "write" signal. This causes the contents of the MDR to be copied into the addressed memory register. (The previous contents of that memory register are replaced by the new contents.) These operations are at the "micro-instruction" level, not visible to a person programming the computer at the assembly language level.
(There was an aside on how "core memory" works, and an example of 1024 bits of it from the Computer Museum in Boston. The 4 Gigabytes of RAM on a modern laptop were calculated to be equivalent to more than 32 million such arrays of tiny ferrite cores.)
We looked at two "block diagrams" of the organization of a Von Neumann computer.
Both of them showed main memory, the arithmetic/logical unit (ALU), the control unit (CU), and input/output functions. The ALU and CU are usually grouped together into the "central processing unit" or CPU. The main memory is the RAM that we discussed above. Input/output covers communication of the central computer with devices such as keyboards, mice, displays, microphones, speakers, DVD players, network interfaces, and secondary storage like disks, and so on. The ALU can be thought of as a collection of combinational circuits to perform operations on data like addition, subtraction, comparison, logical operations, and the like. The CU is a collection of registers and combinational logic that collectively are responsible for fetching and executing instructions, (in some sense the conductor of the orchestra that is the computer.) Both diagrams indicated one or more "central registers", that is, registers local to the CPU that are integral to the execution of programs by the computer.
The "Von Neumann architecture", in which the instructions of a program and the data for the program are both located in the same (main) memory, is generally contrasted with the "Harvard architecture", in which there are separate memories for the instructions of a program and its data.
In the Von Neumann architecture, a program is run as follows. There is a CPU register called the "instruction counter" or "program counter", which holds the address of the next instruction to execute. The instruction is "fetched" (read from main memory into a CPU register called the "instruction register") and "decoded" (which determines what operation is specified by the instruction) and "executed." The execution of one instruction may involve:
Usually after an instruction is executed, the program counter is increased by 1, and the whole "fetch-execute" cycle is repeated to execute the next instruction in memory. This process continues until a halt instruction is executed. Some instructions (eg, jumps and skips) cause a different update to the program counter, altering the pattern of executing consecutive instructions.
We describe the design of the TC-201 computer ("TC" for "tiny computer" or "toy computer"), to make the ideas of computer architecture, machine language and assembly language concrete. It consists of 3 parts: main memory (or RAM), a central processing unit (or CPU), and an input/output system to communicate with the outside world (that is, you, the user.)
The main memory, or RAM, of the TC-201 consists of 4096 memory registers, also called "words", of 16 bits each, addressed by integers from 0 to 4095. Because 4096 = 212, 12 bits are sufficient to represent any memory address. There is a central register: the accumulator (or ACC), which has 16 bits. The other CPU state consists of the program counter (or PC), which has 12 bits and holds the address of the next instruction to be executed, the run-flag, which is one bit and indicates whether the computer is running (1) or halted (0), and the arithmetic error bit (or AEB), which indicates whether there has been an arithmetic overflow error.
The TC-201 instructions mentioned in this lecture:
opcode operation ------ --------- 0000 halt -- stops execution (sets the run flag to 0) 0001 load -- copies contents of addressed memory register to accumulator 0010 store -- copies contents of accumulator to addressed memory register
If we denote the contents of the accumulator by acc, the address field of the instruction by addr, and the contents of the memory register with address addr by Mem[addr], we can abbreviate the function of load as acc := Mem[addr] and the function of store as Mem[addr] := acc, where := is an assignment operator.
In the TC-201 we have 16 bits to represent an integer. There are 216 = 65536 possible patterns of 16 bits, so we could represent that many different integers. If we needed only non-negative integers, a natural choice would be "unsigned binary", which would just be the 16-bit binary representations of the numbers 0 through 65,535 as the patterns 0000000000000000 through 1111111111111111, respectively.
However, we'd like to represent a range of positive and negative integers, but the number of integers from -n to n is 2n+1, an odd number. As a result, the number representations we'll consider each have an "anomaly" because the number of patterns is even. The three systems we consider are: sign/magnitude, one's complement, and two's complement. For illustration, we'll consider the systems using just 4 bits (instead of 16) -- the principles apply to any number of bits, for example, the more common 32 or 64 bits of modern computers. With 4 bits we have 24 = 16 possible bit patterns. Here is what they represent, in unsigned binary and in each of the three systems we consider.
bit pattern unsigned binary sign/magnitude one's complement two's complement ----------- --------------- -------------- ---------------- ---------------- 0000 0 0 0 0 0001 1 1 1 1 0010 2 2 2 2 0011 3 3 3 3 0100 4 4 4 4 0101 5 5 5 5 0110 6 6 6 6 0111 7 7 7 7 1000 8 -0* -7 -8* 1001 9 -1 -6 -7 1010 10 -2 -5 -6 1011 11 -3 -4 -5 1100 12 -4 -3 -4 1101 13 -5 -2 -3 1110 14 -6 -1 -2 1111 15 -7 -0* -1
(The asterisk (*) indicates the "anomaly" in each system we consider: a second representation of 0 in sign/magnitude and one's complement, and a number, -8, with no corresponding positive number in the representation, in two's complement.) Note that all three systems represent the numbers 0 through 7 in the same way (except for the extra representations of 0) -- as 0 followed by the 3-bit unsigned binary representation of the integer; this is also the unsigned binary representation of these numbers. (In general, for b bits, the numbers 0 through 2b-1-1 are represented as 0 followed by the (b-1)-bit unsigned binary representation of the number.) The three systems differ in how they represent negative numbers.
For sign/magnitude, the representation of a negative number is 1 followed by the 3-bit unsigned binary representation of the absolute value of the number. Thus, the high-order bit is a sign bit (0 for + and 1 for -), and the remaining bits give the magnitude (or absolute value) of the number. To convert a positive to a negative number, it suffices to change the high-order bit from 0 to 1, and to convert a negative to a positive number, it suffices to change the high-order bit from 1 to 0. Note that we have two representations of 0, namely 0000 and 1000, that latter being referred to as -0.
For one's complement, the negative of a number is found by complementing each bit individually (changing 0 to 1 and 1 to 0.) Thus, we represent -1 in 4-bit one's complement by complementing each bit of 0001 to get 1110. Of course, since complementation is self-inverse (the complement of the complement is the original argument), the negative of a negative number is also found by complementing each bit. In one's complement (also) the number zero has two representations: 0000 and 1111 -- the latter being referred to as -0.
The rationale for two's complement is that it implements arithmetic modulo 2b, where b is the number of bits. If we consider the integers 0 through 15 modulo 16, then we have 1+15 = 0 (modulo 16) and 5+11 = 0 (modulo 16). Thus, modulo 16, 15 "behaves like" -1, in the sense that 1 and 15 add to 0 (modulo 16), and 11 "behaves like" -5. If we consider the 4-bit unsigned representation of 15, we have 1111, which is the representation of -1 in 4-bit two's complement arithmetic, and the unsigned 4-bit representation of 11 is 1011, which is the representation of -5 in 4-bit two's complement. The number 8 is a bit anomalous modulo 16, because 8+8 = 0 (modulo 16). The representation 1000 is taken to be -8, which means that the high-order bit signifies whether the number is negative (if the bit is 1) or non-negative (if the bit is 0.) Why would two's complement be desirable? If you recall the circuit we designed to add two 4-bit numbers to get a 5-bit sum, if we just ignore the high-order bit of the result, the circuit computes the sum of its two inputs, modulo 16. Two's complement is in fact the most common choice for the representation of integers in modern computers.
The rule some have learned for negating a two's complement number is to complement each bit individually and then add 1. For example, to complement 5, we start with 0101 and complement each bit: 1010, and then add 1: 1011, which is the correct representation of -5 in two's complement. Going the other direction, from -5, we start with 1011, complement each bit: 0100, and add 1, to get 0101. Why does this work? Hint: complementing each bit is equivalent to subtracting the unsigned value of the number from 15.
In a class vote (years ago), it was decided that the TC-201 computer would have sign/magnitude representation of integers. We also decided that skipzero will skip on both positive zero (0000 0000 0000 0000) and "negative zero" (1000 0000 0000 0000).
Next, we turn to the following task: read in a zero-terminated sequence of numbers, print them out in reverse, and then halt. A sample interaction with the user might look like the following.
input = 17 input = -13 input = 22 input = 0 output = 22 output = -13 output = 17We start with the following idea, which is NOT a solution.
read-num: input skipzero jump store-num jump produce-output store-num: store num jump read-num num: data 0The reason this doesn't work is that the first number read in will be stored in num, and then the second number read in will be stored in num (overwriting the first number) and then the third number will be stored in num (overwriting the second number), and so on. When this jumps to the code for produce-output, only the latest number read in will be available in num, and we won't have the information we need to print out the reverse of the sequence of numbers read in. (See the next lecture for how we solve this problem.)
In the preceding lecture, we considered the problem of reading in a zero-terminated sequence of numbers, printing out the reverse of the sequence, and halting. Here we temporarily simplify the task to reading in a zero-terminated sequence of numbers, storing them in consecutive memory locations, and then halting. (The task of printing out the reverse of the sequence will be deferred to the homework.) In the last lecture, we considered the following, NOT a solution.
read-num: input skipzero jump store-num halt store-num: store num jump read-num num: data 0This is not a solution because each number read in overwrites the previous one, so that at the end only the most recently read number will be available. What we need is a way of storing into num the first time through the loop, storing into the next memory location the next time through the loop, and so on.
We considered two possible solutions to this problem. The first, highly deprecated, solution is to write self-modifying code. That is, we may treat the store instruction as data, load it into the accumulator, add 1 to it, and store it back into its memory register. This would have the effect of changing the instruction from "store 6" (in the original program, because num corresponds to address 6) to "store 7", so that the next number read in from the user would be stored in memory location 7. The code for this kind of solution would be as follows.
read-num: input skipzero jump store-num halt store-num: store num load store-num add constant-one store store-num jump read-num constant-one: data 1 num: data 0Once your TC-201 simulator is running, you might want to try this running this program to see what it does. Self-modifying programs are very difficult for human beings to understand, debug, and maintain, because of the large amount of "state" the human has to keep track of -- not only the current data values, but also the current version of the program that is running. (See juggling eggs) (When computer memory was scarce, self-modifying code effectively recycled memory addresses - like a studio apartment where you cook, eat, work, and sleep in the same room. You had to make do with what you had.)
So instead we introduce the last two instructions for the TC-201 computer, which allow us to implement "pointers" in our programs. [Note: we also have the following instructions: shift, and, and xor, described in hw6.rkt.] A pointer indicates where in memory an operation is to be performed. The two instructions are as follows.
name opcode operation ---- ------ --------- loadi 1011 acc := Mem[extract-address(Mem[addr])] storei 1100 Mem[extract-address(Mem[addr])] := accThe names stand for "load indirect" and "store indirect". The extract-address() operation indicated above means that we extract the rightmost 12 bits of a 16-bit quantity and treat it as the address of a register in RAM.
Thus, the instruction "loadi addr" is executed as follows: read from memory the contents of the register with address addr, and take the rightmost 12 bits of that as another address, addr'. Copy the contents of the memory register with address addr' to the accumulator. Finally, increment the program counter, as usual. Similarly, the instruction "storei addr" is executed as follows: read from memory the contents of the register with address addr, and take the rightmost 12 bits of that as another address, addr'. Copy the contents of the accumulator to the memory register with address addr'.
The following example may help illuminate the function of loadi and storei. We will simulate the TC-201 for three instructions, starting with the following configuration.
acc: 0000 0000 0000 0000 pc: 0000 0000 0000 rf: 1 aeb: 0 address contents of memory register addressed ------- ------------------------------------- 0 1011 0000 0000 0101 1 1100 0000 0000 0110 2 0000 0000 0000 0000 . . 5 0000 0000 0011 0100 6 0000 0000 0011 0111 . . 52 0000 1111 0000 1111 53 1010 1010 1010 1010 54 1111 0000 1111 0000 55 0101 0101 0101 0101Because the pc contains address 0, we read the instruction at address 0: 1011 0000 0000 0101, decode the opcode 1011 to find that it is a loadi instruction. We take the address field of the instruction, 0000 0000 0101, which is address 5, and read the contents of memory register 5, which is 0000 0000 0011 0100. We take the rightmost 12 bits of that, 0000 0011 0100, and interpret it as address 52. Then the contents of memory register 52 are copied to the accumulator, and the program counter is incremented by 1, to produce the following configuration after the loadi instruction completes.
acc: 0000 1111 0000 1111 pc: 0000 0000 0001 rf: 1 aeb: 0 address contents of memory register addressed ------- ------------------------------------- 0 1011 0000 0000 0101 1 1100 0000 0000 0110 2 0000 0000 0000 0000 . . 5 0000 0000 0011 0100 6 0000 0000 0011 0111 . . 52 0000 1111 0000 1111 53 1010 1010 1010 1010 54 1111 0000 1111 0000 55 0101 0101 0101 0101Now the program counter holds address 1, so we read the instruction at address 1, which is 1100 0000 0000 0110. We decode the opcode, 1100, and find that it is a storei instruction. We take the address field, 0000 0000 0110, which is address 6, and read from address 6, getting the contents 0000 0000 0011 0111. We take the rightmost 12 bits of that as an address, address 55, and copy the contents of the accumulator to memory register 55. The program counter is then incremented, resulting in the following configuration after the storei instruction has been executed.
acc: 0000 1111 0000 1111 pc: 0000 0000 0010 rf: 1 aeb: 0 address contents of memory register addressed ------- ------------------------------------- 0 1011 0000 0000 0101 1 1100 0000 0000 0110 2 0000 0000 0000 0000 . . 5 0000 0000 0011 0100 6 0000 0000 0011 0111 . . 52 0000 1111 0000 1111 53 1010 1010 1010 1010 54 1111 0000 1111 0000 55 0000 1111 0000 1111Note that the contents of registers 5 and 6 are unchanged, but we have copied the contents of register 52 (pointed to by register 5) to register 55 (pointed to by register 6). Next, the instruction at address 2 is executed. Because it is a halt, all that happens is that the run flag (rf) is set to 0, and execution halts.
We can make use of the idea of a pointer (which we repeatedly increment) and the storei instruction to solve the problem of reading in a zero-terminated sequence of numbers and storing them in consecutive locations in memory, and then halting.
read-num: input skipzero jump store-num halt store-num: storei pointer load pointer add constant-one store pointer jump read-num pointer: data table constant-one: data 1 table: data 0The memory location pointer initially contains the address corresponding to table. Thus, the first number read in from the user will be stored in the memory register corresponding to table. Then the sequence of instructions adds one to the value of the pointer, so that it now holds the address of the memory register after table. Thus, the next number read in will be stored in the next memory location, and so on, until a zero is input by the user.
To see why this solution works in a little more detail, we first consider how the assembler that you will write for the homework will translate the above program into a sequence of 16-bit patterns to store in the RAM starting at address 0. Conceptually, the assembler first needs to determine how to translate all the symbolic addresses in the program into numbers. To do this, it merely numbers the instructions and data statements starting with 0, to determine the addresses that the corresponding instructions or data elements will have. Numbering the instructions and data statements of the above program, we have the following.
0 read-num: input 1 skipzero 2 jump store-num 3 halt 4 store-num: storei pointer 5 load pointer 6 add constant-one 7 store pointer 8 jump read-num 9 pointer: data table 10 constant-one: data 1 11 table: data 0From this we can extract a "symbol table" allowing us to translate the symbolic labels to numbers:
label corresponding address ----- --------------------- read-num 0 store-num 4 pointer 9 constant-one 10 table 11With this table in hand, we can now translate the sequence of instructions and data statements one by one into the corresponding 16-bit patterns. For an instruction, we look up the 4 bit opcode corresponding to the name of the instruction, and, if the instruction has an address field, we convert the numeric address into a 12-bit pattern to combine with the opcode. If there is no address field, we fill the remaining 12 bits with 0's. For a data statement, we take the number after the "data" field and convert it, using 16-bit sign/magnitude representation, into a 16-bit pattern.
For example, the first instruction is "input" (which takes no address field), so the resulting bit pattern is 0101 0000 0000 0000. The second instruction, "skipzero" is translated into 1000 0000 0000 0000, because the skipzero opcode is 1000. The third instruction is "jump store-num". The opcode is 0111, for jump, and looking up store-num in the symbol table, we find it is address 4, which we put in the address field in binary, to get the pattern 0111 0000 0000 0100. For the data statement "data 1" we convert 1 to a 16-bit quantity in sign/magnitude representation to get the pattern 0000 0000 0000 0001. For the data statement "data table", we look up table in the symbol table and find that it is address 11, which we convert to 16 bit sign/magnitude representation, giving the pattern 0000 0000 0000 1011. Putting all this together, the complete translation of the above program would be as follows.
addr assembly language machine language ---- ------------------------- -------------------- 0 read-num: input 0101 0000 0000 0000 1 skipzero 1000 0000 0000 0000 2 jump store-num 0111 0000 0000 0100 3 halt 0000 0000 0000 0000 4 store-num: storei pointer 1100 0000 0000 1001 5 load pointer 0001 0000 0000 1001 6 add constant-one 0011 0000 0000 1010 7 store pointer 0010 0000 0000 1001 8 jump read-num 0111 0000 0000 0000 9 pointer: data table 0000 0000 0000 1011 10 constant-one: data 1 0000 0000 0000 0001 11 table: data 0 0000 0000 0000 0000
Note that we have arranged for the table to be the last piece of data in the program, because numbers read from the user will be stored in consecutive memory locations starting with table. We simulated this program for one loop to understand a bit better how it works.
We spent a little while discussing how to extend this program to print in reverse the sequence of numbers read in before halting. The basic idea is to use "loadi pointer" to get the number into the accumulator and "output" to print it. Then the pointer is decreased by 1 and the loop is repeated. To know when to stop, there were two proposals -- one using a sentinel of 0 (which the user cannot cause to be stored in the table) to mark the beginning of the table, and the other to save the original start of the table in a separate variable, so that the pointer could be compared to it. (Note that in the above program, pointer points to the next available location in memory, not to the last stored number.)