P R E L I M I N A R Y S P E C I F I C A T I O N Due 2:00 am, Friday, 08 April 2016 CPSC 223b Homework #5 Hashing and Queue Redux: Generalized Pancake Sort (40) Pancake Sort is an algorithm for rearranging N numbers in sorted order, where the only operation is to reverse some initial segment of length K. When viewing the numbers as a stack of pancakes, this corresponds to inserting a spatula under the top K pancakes and flipping over that part of the stack. The usual object is to minimize the number of flips rather than the number of comparisons as in traditional sorting. See https://en.wikipedia.org/wiki/Pancake_sorting for details. Generalized Pancake Sort extends Pancake Sort from one dimension to two and from sorted order to a goal configuration, while replacing numbers by printing characters. The initial and goal configurations are HEIGHT by WIDTH grids (i.e., each has HEIGHT rows and WIDTH columns) of labelled tiles; for example +---+---+---+ +---+---+---+ | B | E | A | | A | B | C | +---+---+---+ +---+---+---+ | C | H | D | and | D | E | F | +---+---+---+ +---+---+---+ | F | I | G | | G | H | I | +---+---+---+ +---+---+---+ The object is to find a sequence of flips that transform one to the other. Each flip takes some K by L leading grid of tiles (i.e., the first K rows and the first L columns, where 1 <= K <= HEIGHT and 1 <= L <= WIDTH) and reflects it either horizontally or vertically; for example, +---+---+---+ +---+---+---+ +---+---+---+ +---+---+---+ +---+---+---+ | A | B | C | | C | B | A | | G | B | C | | B | A | C | | D | E | C | +---+---+---+ +---+---+---+ +---+---+---+ +---+---+---+ +---+---+---+ | D | E | F | => | D | E | F |, | D | E | F |, | E | D | F |, | A | B | F |, ... +---+---+---+ +---+---+---+ +---+---+---+ +---+---+---+ +---+---+---+ | G | H | I | | G | H | I | | A | H | I | | G | H | I | | G | H | I | +---+---+---+ +---+---+---+ +---+---+---+ +---+---+---+ +---+---+---+ 1x3 Horiz 3x1 Vert 2x2 Horiz 2x2 Vert Write a program pancake [HEIGHT WIDTH] MAXLENGTH INITIAL GOAL that prints out some _shortest_ sequence with at most MAXLENGTH flips (if one exists) that transform the INITIAL configuration (BEACHDFIG above) into the GOAL configuration (ABCDEFGHI above). The optional HEIGHT and WIDTH arguments default to 3, but must lie between 1 and 16, inclusive. For example, % pancake 10 BEACHDFIG ABCDEFGHI BEACHDFIG CHABEDFIG FIABEDCHG BIAFEDCHG IBAFEDCHG CBAFEDIHG ABCDEFGHI If no such sequence of flips exists, then nothing is printed; for example, % pancake 3 3 5 BEACHDFIG ABCDEFGHI The labels in each grid square can be any printing characters (see isprint()), and they need not be unique. Use the submit command to turn in the source files for pancake, a Makefile, and your log file (see Homework #1). YOU MUST SUBMIT YOUR FILES (INCLUDING THE LOG FILE) AT THE END OF ANY SESSION WHERE YOU SPEND AT LEAST ONE-HALF HOUR WRITING OR DEBUGGING CODE, AND AT LEAST ONCE EVERY HOUR DURING LONGER SESSIONS. (All submissions are retained.) Notes: 1. pancake should verify that the command-line arguments are valid, e.g., - HEIGHT, WIDTH, and MAXLENGTH are nonempty sequences of decimal digits - HEIGHT and WIDTH are integers between 1 and 16 - MAXLENGTH is a nonnegative integer - INITIAL and GOAL have HEIGHT*WIDTH characters - INITIAL and GOAL have the same characters (counting multiplicities) and write a one-line error message to stderr and immediately exit otherwise. Warning: Roughly 1/4 of the tests on the final test script will involve detecting errors in the command-line arguments. 2. One way to find a shortest sequence of flips is to use (unidirectional) breadth-first search (BFS). add the GOAL configuration to the queue add the triple (GOAL, NULL, 0) to the dictionary until the queue is empty remove the configuration C at the head of the queue let L be the length associated with C in the dictionary if (L < MAXLENGTH) for each position C' reachable from C by one flip if C' is the INITIAL position print the sequence of flips and exit else if C' is not in the dictionary add C' to the queue add the triple (C', C, L+1) to the dictionary endif endfor endif enduntil Note that to reduce the run-time, BFS adds a configuration to the queue only once. Thus it must maintain a dictonary of (CONF,FROM,LEN) triples, one for each configuration CONF that has already been added to the queue. Here FROM is a configuration from which one can reach CONF in one flip, and LEN is the number of flips required to reach CONF from the GOAL configuration. See https://en.wikipedia.org/wiki/Breadth-first_search for details. Points to Ponder: Why start the search at GOAL rather than INITIAL? Is the algorithm complete as stated (i.e., are there any special cases where it will fail)? Is there any case where GOAL is not reachable from INITIAL? 3. The number of configurations within L flips of GOAL can grow very rapidly for small L, especially on large grids. Thus pancake reduces the run-time by using bidirectional BFS, searching from both GOAL and INITIAL until some configuration has been encountered by both searches. This is accomplished by adding both INITIAL and GOAL to the queue at the outset, and remembering from which of these each subsequent configuration was reached. Warning: These are not the only changes to the algorithm above that will be required. See https://en.wikipedia.org/wiki/Bidirectional_search for details. Bidirectional BFS will be needed in at most 4 tests, which understates its complexity versus the rest of the program. 4. pancake should represent a configuration as a null-terminated string in the format used to specify the INITIAL and GOAL configurations on the command line. This is based on the usual mapping of a 2D array into a 1D array. 5. BFS requires a queue of configurations (= strings). pancake uses either your Queue.c from Homework #4 or Hwk5/Queue.o (which is my implementation). 6. As noted above pancake must maintain a dictionary that contains every configuration that has ever been added to the queue. To support fast insertion and lookup, it uses hashing with chaining, where the number of chains is roughly 100,000 to keep the load average low even for large grids. The hash table should be implemented as a separate module (and in a separate file, with a header file specifying the interface) so that the code could be used in another application. You may adapt the code in K&R, K&P, or van Wyk, as long as you cite your source. One way to modularize the hash table is to mimic the Queue module, with functions to create, insert, and search, and possibly a function to destroy. In this case the header file should declare the Hash object as a pointer to some struct whose fields are undeclared (as was done in Hwk4/Queue.h). The names and types of the fields, which hold all information about the hash table, are only declared in the file that implements the module. Another is to create a module that implements one hash table, with functions to insert and search, and possibly functions to initialize and destroy. Static global variables hold all of the information about the hash table. There is no declaration of the Hash object in the header file. Both ensure that the calling program cannot access the contents of the hash table other than through the functions declared in the header file. 7. The hash function should depend on all of the bits in the representation yet be general enough to handle different values of HEIGHT and WIDTH. Reducing the string to an unsigned integer can be done with a combination of shifts/multiplies (to move bits to the left) and adds/xors (to introduce new characters). Good performance dictates that this integer not be invariant under permutations or depend on a subset of the characters (why?). Together with the fact that it should have as many as six digits unless the number of squares in the grid is very small (why?), this constrains the sizes of the shifts/multipliers. Reducing the integer to a number between 0 and the size of the hash table can be done by taking a remainder. One such function is (see /c/cs223/Hwk5/hashFunction.c): // Return hash of string S into [0,SIZE) static long hash (char *s, long size) { unsigned long sum; int shift; const unsigned long prime = 3141592653589793239L; for (sum = 0, shift = 0; *s; s++) { sum ^= *s<= 57) shift -= 57; } return ((prime * sum) % size); } 8. For simplicity pancake may assume that: * malloc() and realloc() will never fail. * The Queue functions will never fail. * If you insert something into the hash table, a later search will find it. (You may not want forego this assumption while developing your code.) and need not fail gracefully if they are violated. 9. All storage should be reachable on exit. A. You may find it easier to debug your code using a 2x3 or 3x2 (or even 2x2) grid since the number of possible states is much smaller. B. Reading: Van Wyk, Chapter 5 (linked lists) Van Wyk, Chapter 8 (hashing) CS-223-03/18/16