P R E L I M I N A R Y S P E C I F I C A T I O N
Due 2:00 am, Friday, 08 April 2016
CPSC 223b Homework #5 Hashing and Queue Redux: Generalized Pancake Sort
(40) Pancake Sort is an algorithm for rearranging N numbers in sorted order,
where the only operation is to reverse some initial segment of length K. When
viewing the numbers as a stack of pancakes, this corresponds to inserting a
spatula under the top K pancakes and flipping over that part of the stack.
The usual object is to minimize the number of flips rather than the number of
comparisons as in traditional sorting.
See https://en.wikipedia.org/wiki/Pancake_sorting for details.
Generalized Pancake Sort extends Pancake Sort from one dimension to two and
from sorted order to a goal configuration, while replacing numbers by printing
characters. The initial and goal configurations are HEIGHT by WIDTH grids
(i.e., each has HEIGHT rows and WIDTH columns) of labelled tiles; for example
+---+---+---+ +---+---+---+
| B | E | A | | A | B | C |
+---+---+---+ +---+---+---+
| C | H | D | and | D | E | F |
+---+---+---+ +---+---+---+
| F | I | G | | G | H | I |
+---+---+---+ +---+---+---+
The object is to find a sequence of flips that transform one to the other.
Each flip takes some K by L leading grid of tiles (i.e., the first K rows and
the first L columns, where 1 <= K <= HEIGHT and 1 <= L <= WIDTH) and reflects
it either horizontally or vertically; for example,
+---+---+---+ +---+---+---+ +---+---+---+ +---+---+---+ +---+---+---+
| A | B | C | | C | B | A | | G | B | C | | B | A | C | | D | E | C |
+---+---+---+ +---+---+---+ +---+---+---+ +---+---+---+ +---+---+---+
| D | E | F | => | D | E | F |, | D | E | F |, | E | D | F |, | A | B | F |, ...
+---+---+---+ +---+---+---+ +---+---+---+ +---+---+---+ +---+---+---+
| G | H | I | | G | H | I | | A | H | I | | G | H | I | | G | H | I |
+---+---+---+ +---+---+---+ +---+---+---+ +---+---+---+ +---+---+---+
1x3 Horiz 3x1 Vert 2x2 Horiz 2x2 Vert
Write a program
pancake [HEIGHT WIDTH] MAXLENGTH INITIAL GOAL
that prints out some _shortest_ sequence with at most MAXLENGTH flips (if one
exists) that transform the INITIAL configuration (BEACHDFIG above) into the
GOAL configuration (ABCDEFGHI above). The optional HEIGHT and WIDTH arguments
default to 3, but must lie between 1 and 16, inclusive. For example,
% pancake 10 BEACHDFIG ABCDEFGHI
BEACHDFIG
CHABEDFIG
FIABEDCHG
BIAFEDCHG
IBAFEDCHG
CBAFEDIHG
ABCDEFGHI
If no such sequence of flips exists, then nothing is printed; for example,
% pancake 3 3 5 BEACHDFIG ABCDEFGHI
The labels in each grid square can be any printing characters (see isprint()),
and they need not be unique.
Use the submit command to turn in the source files for pancake, a Makefile, and
your log file (see Homework #1).
YOU MUST SUBMIT YOUR FILES (INCLUDING THE LOG FILE) AT THE END OF ANY SESSION
WHERE YOU SPEND AT LEAST ONE-HALF HOUR WRITING OR DEBUGGING CODE, AND AT LEAST
ONCE EVERY HOUR DURING LONGER SESSIONS. (All submissions are retained.)
Notes:
1. pancake should verify that the command-line arguments are valid, e.g.,
- HEIGHT, WIDTH, and MAXLENGTH are nonempty sequences of decimal digits
- HEIGHT and WIDTH are integers between 1 and 16
- MAXLENGTH is a nonnegative integer
- INITIAL and GOAL have HEIGHT*WIDTH characters
- INITIAL and GOAL have the same characters (counting multiplicities)
and write a one-line error message to stderr and immediately exit otherwise.
Warning: Roughly 1/4 of the tests on the final test script will involve
detecting errors in the command-line arguments.
2. One way to find a shortest sequence of flips is to use (unidirectional)
breadth-first search (BFS).
add the GOAL configuration to the queue
add the triple (GOAL, NULL, 0) to the dictionary
until the queue is empty
remove the configuration C at the head of the queue
let L be the length associated with C in the dictionary
if (L < MAXLENGTH)
for each position C' reachable from C by one flip
if C' is the INITIAL position
print the sequence of flips and exit
else if C' is not in the dictionary
add C' to the queue
add the triple (C', C, L+1) to the dictionary
endif
endfor
endif
enduntil
Note that to reduce the run-time, BFS adds a configuration to the queue only
once. Thus it must maintain a dictonary of (CONF,FROM,LEN) triples, one for
each configuration CONF that has already been added to the queue. Here FROM
is a configuration from which one can reach CONF in one flip, and LEN is the
number of flips required to reach CONF from the GOAL configuration.
See https://en.wikipedia.org/wiki/Breadth-first_search for details.
Points to Ponder: Why start the search at GOAL rather than INITIAL? Is the
algorithm complete as stated (i.e., are there any special cases where it
will fail)? Is there any case where GOAL is not reachable from INITIAL?
3. The number of configurations within L flips of GOAL can grow very rapidly
for small L, especially on large grids. Thus pancake reduces the run-time
by using bidirectional BFS, searching from both GOAL and INITIAL until some
configuration has been encountered by both searches. This is accomplished
by adding both INITIAL and GOAL to the queue at the outset, and remembering
from which of these each subsequent configuration was reached. Warning:
These are not the only changes to the algorithm above that will be required.
See https://en.wikipedia.org/wiki/Bidirectional_search for details.
Bidirectional BFS will be needed in at most 4 tests, which understates its
complexity versus the rest of the program.
4. pancake should represent a configuration as a null-terminated string in the
format used to specify the INITIAL and GOAL configurations on the command
line. This is based on the usual mapping of a 2D array into a 1D array.
5. BFS requires a queue of configurations (= strings). pancake uses either
your Queue.c from Homework #4 or Hwk5/Queue.o (which is my implementation).
6. As noted above pancake must maintain a dictionary that contains every
configuration that has ever been added to the queue. To support fast
insertion and lookup, it uses hashing with chaining, where the number of
chains is roughly 100,000 to keep the load average low even for large grids.
The hash table should be implemented as a separate module (and in a separate
file, with a header file specifying the interface) so that the code could be
used in another application. You may adapt the code in K&R, K&P, or van
Wyk, as long as you cite your source.
One way to modularize the hash table is to mimic the Queue module, with
functions to create, insert, and search, and possibly a function to destroy.
In this case the header file should declare the Hash object as a pointer to
some struct whose fields are undeclared (as was done in Hwk4/Queue.h). The
names and types of the fields, which hold all information about the hash
table, are only declared in the file that implements the module.
Another is to create a module that implements one hash table, with functions
to insert and search, and possibly functions to initialize and destroy.
Static global variables hold all of the information about the hash table.
There is no declaration of the Hash object in the header file.
Both ensure that the calling program cannot access the contents of the hash
table other than through the functions declared in the header file.
7. The hash function should depend on all of the bits in the representation yet
be general enough to handle different values of HEIGHT and WIDTH.
Reducing the string to an unsigned integer can be done with a combination of
shifts/multiplies (to move bits to the left) and adds/xors (to introduce new
characters). Good performance dictates that this integer not be invariant
under permutations or depend on a subset of the characters (why?). Together
with the fact that it should have as many as six digits unless the number of
squares in the grid is very small (why?), this constrains the sizes of the
shifts/multipliers. Reducing the integer to a number between 0 and the size
of the hash table can be done by taking a remainder.
One such function is (see /c/cs223/Hwk5/hashFunction.c):
// Return hash of string S into [0,SIZE)
static long hash (char *s, long size)
{
unsigned long sum;
int shift;
const unsigned long prime = 3141592653589793239L;
for (sum = 0, shift = 0; *s; s++) {
sum ^= *s<= 57)
shift -= 57;
}
return ((prime * sum) % size);
}
8. For simplicity pancake may assume that:
* malloc() and realloc() will never fail.
* The Queue functions will never fail.
* If you insert something into the hash table, a later search will find it.
(You may not want forego this assumption while developing your code.)
and need not fail gracefully if they are violated.
9. All storage should be reachable on exit.
A. You may find it easier to debug your code using a 2x3 or 3x2 (or even 2x2)
grid since the number of possible states is much smaller.
B. Reading:
Van Wyk, Chapter 5 (linked lists)
Van Wyk, Chapter 8 (hashing) CS-223-03/18/16