P R E L I M I N A R Y S P E C I F I C A T I O N Due 2:00 am, Friday, 27 March 2020 CPSC 223b Homework #4 QuickSorting Deques REMINDER: Do not under any circumstances copy someone else's code for this assignment, give your code to someone else, or make it publicly available. After discussing any aspect of the assignment with anyone other than a member of the teaching staff (such discussions must be noted in your log file), do not keep any written or electronic record and engage in some mind-numbing activity before you work on the assignment again. Sharing ANY related document (e.g., code or test cases) is a violation of this policy. Since code reuse is an important part of programming, you may study published code (e.g., from textbooks or the Net) and/or incorporate it in a program, provided that you give proper attribution in your log file and in your source file(s) (see the syllabus for details) and that the bulk of the code submitted is your own. Note: Removing/rewriting comments, renaming functions/variables, or reformatting statements does not convey ownership. (40) Write a sort program Qsort [-POS[,LEN]] [FILE]* that reads lines from the files specified; removes their trailing newlines if any; sorts them in ascending order using quickSort; and writes the sorted lines to the standard output, each followed by a newline. When more than one file is specified, all of the lines in all of the files are sorted into one output stream. When no files are specified, nothing is written. All of the lines must be sorted before any are written. The sorted order is based on a "key" associated with each line. Normally, this key is the entire line, but if the -POS[,LEN] option is specified, it consists of the substring of length LEN beginning with line[POS]. (Recall that any trailing newlines were removed at an earlier stage.) If line[POS] lies beyond the end of line, the key is the empty string. If there are fewer than LEN characters starting with line[POS], the key consists of only those characters. For example, LINE POS,[LEN] KEY ~~~~ ~~~~~~~~~ ~~~ abcdefgh 2,4 cdef abcdefgh 4 efgh abcdefgh 6,3 gh abcdefgh 8,2 If LEN is not specified, then its default value is INT_MAX, the largest integer representable with an int, which is defined in . If POS is not specified, then its default value is 0 and the name of the first file may not begin with a -. Qsort is STABLE. That is, if two lines have equal keys, they appear in the same relative order in the output as they did in the input. For example, INPUT Qsort -1,2 INPUT | INPUT Qsort -1,2 INPUT ~~~~~ ~~~~~~~~~~~~~~~~ | ~~~~~ ~~~~~~~~~~~~~~~~ baac baac | caab caab caab caab | baac baac (in each case both keys are "aa"). There will be at most 4 tests of this property. Implementing quickSort with arrays is easy. To make the assignment more challenging, Qsort may not use any arrays other than argv[] and the strings returned by getline(). Instead, it is limited to ONE queue and TWO stacks of string pointers; that is, no more than one queue and two stacks may exist at any point during its execution. Rather than implement two separate data structures, one for queues and one for stacks, Qsort uses three deques (*), one used only as a queue (i.e., items can only be added to the tail and removed from the head), the other two used only as stacks (i.e., items can only be added to the head and removed from the head). If the same deque is used as both a queue and a stack (i.e., items are added to both the tail and the head), then it counts as one of each. (*) A deque (or double-ended queue, pronounced "deck") is a data structure that combines the attributes of a queue and a stack: Items can be added to the tail (as in a queue) or the head (as in a stack), but can only be removed from the head. These deques should be implemented in a separate source file as the abstract data type (ADT) Deque defined in the header file Hwk4/Deque.h. The file Hwk4/example.c contains an example of how the functions declared in Deque.h can be used to create, destroy, add items to, and remove items from a Deque. The file Hwk4/protoDeque.c contains my implementation (except that I deleted some unimportant lines of code ;-)). Copies of Deque.h and example.c are appended. To verify that your implementations of Qsort and the Deque ADT adhere to the interface specification BUT NO MORE, the test script has three phases: 1) Link your Qsort.o with your Deque.o and test whether the combination Qsort sorts correctly. 2) Link your Deque.o with the driver Hwk4/testDeque.o and test whether your implementation of the Deque ADT satisfies the interface specification. You may also use testDeque to debug your code. 3) Link your Qsort.o with Hwk4/Deque.o (another implementation of the Deque ADT that does not rely on any special properties of your implementation) and test whether the combination QsortH (H for hybrid) satisfies the overall specification. WARNING: Hwk4/Deque.o will always implement the Deque ADT, but how it implements it may change without notice. In particular, Hwk4/Deque.o has a MOVING_DEQUES mode where it moves the block of storage (if any) to which the Deque variable points (which changes *D) on every call. When this mode is used you cannot assume that the value of *D does not change when calling isEmptyD(), addD(), remD(), headD(), pushD(), popD(), or topD(). Your code MUST #include Hwk4/Deque.h, not a local copy. Similarly, when making testDeque and QsortH, you MUST link with Hwk4/testDeque.o and Hwk4/Deque.o, respectively, not local copies. If you name your source files Qsort.c and Deque.c, Hwk4/Makefile will make Qsort, testDeque, and QsortH, and thus can be used as (or as the basis for) your makefile. Use the submit command to turn in your source files for Qsort and the Deque ADT, a Makefile, and your log file (see the specification for Homework #1). All files must contain your name and Yale netID. YOU MUST SUBMIT YOUR FILES (INCLUDING YOUR LOG FILE) AT THE END OF ANY SESSION WHERE YOU WRITE OR DEBUG CODE, AND AT LEAST ONCE EVERY HOUR DURING LONGER SESSIONS. (All submissions are retained.) Notes ~~~~~ 1. If POS and/or LEN is not a nonempty sequence of digits, or a file does not exist, or one of the Deque functions reports an unexpected error (e.g., createD() returns false), then Qsort writes a one-line error message to the standard error and exits. Qsort may ignore the possibility that POS or LEN has too many digits to be representable using an int. Qsort may NOT assume that malloc() and realloc() will never fail. 2. The Deque data type should be implemented using a pair of stacks (the H stack and the T stack), each implemented using a headless, singly-linked list. To add an item to the tail (head) of the Deque, push it onto the top of the T (H) stack. To remove an item from the head of the Deque, pop it off the top of the H stack (but if the H stack is empty, first move items one at a time from the top of the T stack to the top of the H stack until the T stack is empty). (Why does this work?) This data structure requires one link per item rather than two as in the doubly-linked list implementation given in Aspnes' notes. But while removing an item may not take constant time when the H stack is empty, the average time per operation for a sequence of operations that begins and ends with an empty Deque is constant. (Why?) 3. In the header file Hwk4/Deque.h defining the interface to the Deque ADT, the type of a Deque is defined as typedef struct deque *Deque; which means that a Deque is a pointer to a struct whose fields are not known, ensuring that calling functions cannot access these fields. (For full generality a Deque should be a void*, which allows implementations where a Deque is something other than a struct.) But within the file that implements the Deque, you must give a complete definition of the fields so that the Deque functions may access them. 4. The descriptions of addD(), remD(), and pushD() contain the warning that "The value of *D may change as a result." This allows implementations of the Deque ADT to use something other than a headed data structure. For example, if an empty Deque were represented as the NULL pointer, the value of *D could change as the result of the operation. The purpose of an ADT is to allow such flexibility. (headD(), topD(), and isEmptyD() are also allowed to change *D to make the interfaces more uniform.) 5. Each value stored in a Deque is a char *, but need not point to a block of allocated storage that contains a null-terminated string. Thus when adding/pushing strings returned by getline() to/on a Deque, the strings themselves are NOT duplicated; and if they are not freed outside of the Deque ADT after a pointer is removed or popped and returned, that storage could be lost. 6. The prohibition against arrays means that you may not use malloc() or realloc() or calloc() outside the file that implements the Deque ADT (other than their use by getline() and in the Standard I/O Library). Moreover, you may not define any global, static local, or automatic local array variables. More generally, you may not use ANY global variables. Finally, you may not open any file for writing, since it could be used as an array; nor may you open files on the command line more than once. 7. The ordering of the keys associated with each line is that defined by strcmp() or strncmp() when the environment variable LC_COLLATE is set to the value specified in the individual test scripts (normally "POSIX"). Thus you must compare keys using one of these functions. 8. Your implementation of quickSort may use any pivot as long as it would run in O(n*log2(n)) time under the usual randomness assumptions (e.g., the first or last item is fine, but the minimum or maximum item is not). Moreover, you may NOT use a Deque just to simulate a random-access array (e.g., to read the i-th item, transfer i-1 items from the head of the Deque to the tail; transfer 1 item from head to tail, making a copy to return to quickSort; and then transfer n-i items from head to tail, restoring the status quo). 9. Qsort must free all storage before exiting, unless an error is detected. There will be at most 4 tests of this property. Hint: fopen() allocates storage when you open a file; that storage is freed by fclose(). A. In addition to the 40 regular tests, there will be some diagnostic tests that may lead to deductions: * QsortH will be used to detect the use of arrays, which will incur a 16-point deduction unless you can demonstrate that your program does not use them. Since writing and then reading a file or recursing to depth #lines can also act like arrays, they incur the same penalty. * QsortH will be analyzed for the presence of global variables, which will incur a 4-point deduction unless you can demonstrate that your program does not use them. * QsortH will count the number of calls to addD()/pushD(). An EFFICIENT implementation requires at most 2 * #compares calls to addD()/pushdD() when n > 4. Requiring more (but less than 4 * #compares) will incur a 2-point deduction. Exceeding 4 * #compares will incur an 8-point deduction on the assumption that your program does not run in O(n*log2(n)) time under the usual randomness assumption. But if you can demonstrate that it does, the deduction will be only 4 points. * Implementing a stable and efficient quickSort with an unlimited number of deques is not very difficult. For example, if there are at least two items in a queue, create two new queues, copy items to one queue or the other depending on their relation to the splitter, sort each queue recursively, and finally copy them back to the original queue separated by the splitter. Hint: You may want to start with such an implementation. More generally, any use of more than one queue and/or more than two stacks also simplifies the problem. Thus QsortH will count the maximum number of queues and stacks in use at any time. Each extra queue or stack will incur a 4-point deduction, but these deductions will be capped at 8 points. (Recall that a single deque used as both a queue and a stack counts as one of each.) WARNING: To pass these diagnostic tests QsortH must sort Hwk4/Tests/f8192 in under a second. Should you never reach that point, you can see me after the final script is released to explain why I should void these deductions. B. You may find the function strtol() useful when parsing -POS[,LEN]. C. Much of the complexity in this assignment arises from the constraints: * Sorting in a stable manner. * Using one queue and two stacks but no arrays (other than those in nodes). * Using at most 2 * #compares calls to addD() or pushD(). Meeting any two is straight-forward; meeting all three requires more thought. The ingredients of such an algorithm may include: 1. Moving blocks of lines from a stack or queue to another stack or queue. 2. Using an empty queue to reverse a block of lines at the top of a stack, or an empty stack to reverse all of the lines in a queue. 3. Sorting in decreasing rather than increasing order by reversing the comparison operator. 4. Creating a temporary empty stack T using a nonempty stack S by keeping track of the items pushed on S, popping all such items remaining when T is no longer needed, and never trying to pop an item that was not pushed. 5. Achieving C * #compares calls to addD()/pushD() by calling them at most C*M times at the level of recursion where there are M items to sort. D. Reading: Kernighan & Ritchie, Chapter 6 (structs) Kernighan & Ritchie, Chapter 7 (input/output) Aspnes, Section 4.11 (structured data types) Aspnes, Section 5.2 (linked lists, including deques) Aspnes, Section 5.3 (abstract data types) CS-223-02/27/20 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ // Deque.h Stan Eisenstat (02/26/20) // // Define the abstract data type for a Deque (*) of string pointers. The // strings themselves are NOT stored. // // (*) A deque (or double-ended queue, pronounced "deck") is a data structure // that combines the attributes of a queue and a stack: items can be added // at the tail and the head but can be removed only from the head. // Define true and false #include // A variable of type Deque is a pointer to the struct used to implement the // deque operations declared below. Note that the declaration of the fields of // that struct appears only in the file that implements these operations. Thus // the calling program does not know what fields have been defined and cannot // access them; and the variable must be a pointer since the size of the struct // is unknown. typedef struct deque *Deque; // The following functions are the only means of accessing a Deque data // structure. Each requires a pointer to a Deque variable in case it needs // to modify the value of that variable (or more generally for uniformity). // Each returns the status of the operation, either true (= success) or false // (= failure; e.g., an invalid argument, an inconsistent Deque object, or a // NULL return from malloc() / realloc()). // Set *D to a new object of type Deque. // Returns true if successful, false otherwise (e.g., malloc() fails). bool createD (Deque *d); // Return true if the Deque *D is empty, false otherwise. The value of *D may // change as a result. bool isEmptyD (Deque *d); // Add the string pointer S to the tail of the Deque *D; the string itself is // not duplicated. The value of *D may change as a result. // Return true if successful, false otherwise (e.g., malloc() fails). bool addD (Deque *d, char *s); // Remove the string pointer at the head of the Deque *D and store that value // (or NULL if the Deque is empty) in *S. The value of *D may change as a // result. // Return true if successful, false otherwise (e.g., *D is empty). bool remD (Deque *d, char **s); // Store in *S the value of the string pointer at the head of the Deque *D (or // NULL if the Deque is empty). The value of *D may change as a result. // Return true if successful, false otherwise (e.g., *D is empty). bool headD (Deque *d, char **s); // Add the string pointer S to the head of the Deque *D; the string itself is // not duplicated. The value of *D may change as a result. // Return true if successful, false otherwise (e.g., malloc() fails). bool pushD (Deque *d, char *s); // An alternate name for remD() when the Deque is used as a stack. #define popD(d,s) remD(d,s) // An alternate name for headD() when the Deque is used as a stack. #define topD(d,s) headD(d,s) // Destroy the Deque *D by freeing any storage that it uses (but not the blocks // of storage to which the string pointers point) and set the value of *D to // NULL. // Return true if successful, false otherwise (e.g., D is an invalid argument). bool destroyD (Deque *d); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ // example.c Stan Eisenstat (02/26/20) // // Demonstrate the use of the Deque ADT #define _GNU_SOURCE #include #include #include #include "/c/cs223/Hwk4/Deque.h" // Print message to stderr and exit. #define DIE(msg) exit (fprintf (stderr, "%s\n", msg)) // Print error message unless COND is true #define ORDIE(cond,msg) if (cond) ; else DIE(msg) int main (int argc, char *argv[]) { FILE *fp = stdin; // Read from standard input Deque D; ORDIE (createD (&D), // Create Deque for stdin "createD() failed"); int nLines = 0; char *line = NULL; size_t length = 0; while (0 <= getline (&line, &length, fp)) { if (nLines % 2 == 0) { // On even iterations ORDIE (pushD (&D, line), // Add line to head of D "pushD() failed"); } else { ORDIE (addD (&D, line), // On odd iterations "addD() failed"); // Add line to tail of D } nLines++; line = NULL; length = 0; } free (line); for (int i = 0; !isEmptyD(&D); i++) { if (i % 2 == 1) { // On even iterations ORDIE (topD (&D, &line), // Output line at head of D "topD() failed"); fputs (line, stdout); ORDIE (remD (&D, &line), // Remove line at head of D "remD() failed"); } else { // On odd iterations ORDIE (headD (&D, &line), // Output line at head of D "headD() failed"); fputs (line, stdout); ORDIE (popD (&D, &line), // Remove line at head of D "popD() failed"); } fputs (line, stdout); // Output line removed from D free (line); // Free storage for removed line } ORDIE (destroyD (&D), // Destroy Deque "destroyD() failed"); return EXIT_SUCCESS; }