P R E L I M I N A R Y S P E C I F I C A T I O N Due 2:00 am, Friday, 11 March 2016 CPSC 223b Homework #4 Merging Queues REMINDER: Do not under any circumstances copy another student's code or give a copy of your code to another student. After discussing the assignment with another student, you may not take any written or electronic record away. Moreover, you must engage in a full hour of mind-numbing activity before you work on it again. Such discussions must be noted in your log file. (40) Write a sort program Merge16 [-POS[,LEN]] [filename]* that reads lines from the files specified, removes the trailing newlines if any, sorts them using a binary merge sort, and writes the sorted lines to the standard output followed by newlines. When more than one file is specified, all of the lines in all of the files are sorted into one output stream. When no files are specified, nothing is output. The sorted order is based on a "key" associated with each line. Normally, this key is the entire line, but if the -POS[,LEN] option is specified, it consists of the substring of length LEN beginning with line[POS]. (Recall that any trailing newlines were removed at an earlier stage.) If POS lies beyond the end of the line, the key is the empty string. If there are fewer than LEN characters starting with line[POS], the key consists of only those characters. For example, LINE POS,LEN KEY 01234567 2,4 2345 01234567 4 4567 01234567 6,3 67 01234567 8,2 If LEN is not specified, its value is INT_MAX, which is defined in . If POS is not specified, the first filename may not begin with a -. Merge16 is a stable sorting algorithm. Thus two lines with equal keys must appear in the same relative order in the output as they did in the input (i.e., in the concatenation of the specified files in left to right order); e.g., INPUT Merge16 -1,3 INPUT Merge16 -0,4 INPUT 1002 2001 1002 2001 1002 2001 Implementing binary merge sort with arrays is easy. To make the assignment more challenging, you may not use any arrays other than argv[] and the strings returned by the function getLine() introduced in Homework #3, which is used to read the lines. Instead, you may use up to two queues of string pointers. The queues should be implemented (in a separate file) as the abstract data type (ADT) Queue defined in the header file Hwk4/Queue.h, a copy of which is appended. The file Hwk4/demo.c (also appended) contains an example of how the functions declared in Queue.h can be used to create, destroy, add elements to, and remove elements from a Queue. The file Hwk4/protoQueue.c contains my implementation (except that I deleted some uninteresting lines of code ;-)). Your source file(s) should #include /c/cs223/Hwk4/Queue.h, not a local copy. To verify that your implementations of Merge16.c and the Queue ADT satisfy the interface requirements, the test script has three phases: 1) Link your Merge16.o with your Queue.o and test whether the combination Merge16 satisfies the overall specification. 2) Link your Queue.o with the driver Hwk4/testQueue.o and test whether your implementation of the Queue ADT satisfies the interface specification. Note that you can use testQueue to debug your Queue.c. 3) Link your Merge16.o with Hwk4/Queue.o (an implementation of the Queue ADT that need not have properties that are true for your implementation but are not required by Queue.h) and test whether the combination Merge16H (H for hybrid) satisfies the overall specification. Hwk4/Makefile will create Merge16, testQueue, and Merge16H and can be used as the basis for your Makefile. Use the submit command to turn in the source files for Merge16, the Queue ADT, a Makefile, and your log file (see Homework #1). YOU MUST SUBMIT YOUR FILES (INCLUDING THE LOG FILE) AT THE END OF ANY SESSION WHERE YOU SPEND AT LEAST ONE-HALF HOUR WRITING OR DEBUGGING CODE, AND AT LEAST ONCE EVERY HOUR DURING LONGER SESSIONS. (All submissions are retained.) Notes ~~~~~ 1. If POS and/or LEN is not a nonempty sequence of decimal digits, or a file does not exist, or one of the Queue functions reports an unexpected error (e.g., createQ() returns false), then Merge16 writes a one-line message to the standard error and exits. 2. Usually we think of mergeSort as a top-down, divide-and-conquer algorithm. However, we can also view it as a bottom-up algorithm. When sorting 2**K (i.e., 2 to the power K) items, merge sort makes K passes over the items to be sorted: pairs of entries are merged into ordered sequences of length two; pairs of ordered sequences of length two are merged into ordered sequences of length four; etc. For example: 31 41 59 26 53 58 97 93 23 84 62 64 33 83 27 95 --> {31 41} {26 59} {53 58} {93 97} {23 84} {62 64} {33 83} {27 95} --> {26 31 41 59} {53 58 93 97} {23 62 64 84} {27 33 83 95} --> {26 31 41 53 58 59 93 97} {23 27 33 62 64 83 84 95} --> {23 26 27 31 33 41 53 58 59 62 64 83 84 93 95 97} When the number of items is not a power of two, the same idea is used, except that the sequences being merged are not always the same length and one sequence may be empty. 14 15 92 65 35 89 79 32 38 46 26 43 38 --> {14 15} {65 92} {35 89} {32 79} {38 46} {26 43} {38} --> {14 15 65 92} {32 35 79 89} {26 38 43 46} {38} --> {14 15 32 35 65 79 89 92} {26 38 38 43 46} --> {14 15 26 32 35 38 38 43 46 65 79 89 92} Since the Queue operations take constant time, the merge of each pair of sequences should run in time proportional to the number of items merged so that Merge16 itself runs in O(N*log(N)) time for N lines. 3. Your implementation of the Queue ADT should be based on a headless, singly- linked, circular list (i.e., each node is a struct containing a value and a link to the next node, the last node links to the first node, and there is no empty header node). The _internal_ Queue data type is a pointer to the last node in the list (i.e., the tail of the queue) or NULL if the queue is empty. Since the first node in the list is the node following the last node, both the head and the tail can be reached in constant time. The values stored in a Queue are the string pointers returned by getLine(). That is, the strings themselves should not be duplicated within the Queue. Your code MUST #include Hwk4/Queue.h, not a modified copy. Similarly, when making testQueue and Merge16H, you MUST link to Hwk4/testQueue.o and Hwk4/Queue.o, not local copies. WARNING: How Hwk4/Queue.o implements the Queue ADT may change without notice. For example, it could use an array instead of a linked list; and it could move the Queue (and thus change *Q) on every call. Thus you may not assume that the value of *Q does not change when you call addQ(), removeQ(), isEmptyQ(), or headQ(). 4. The ordering of keys is that returned by strcmp() / strncmp() when the environment variable LC_COLLATE is set to the value specified in the test scripts (normally "POSIX"). Thus you must compare keys using one of these functions. 5. Merge16 frees all storage before exiting, unless an error is detected. There will be at most 4 tests of this property. Hint: fopen() allocates storage when you open a file; fclose() frees that storage. 6. You may only open files on the command line once and may not rewind them. 7. You may find the function strtol() useful when parsing -POS[,LEN]. Merge16 may ignore the possibility that POS or LEN is not representable by an int. 8. You may NOT assume that malloc() and realloc() never fail. 9. The prohibition against arrays means that you may not use malloc() or realloc() outside the module that implements the Queue ADT. Moreover, you may not define any global, static local, or automatic local array variables. More generally, you may not use ANY global variables. Finally, since writing and then reading a file or recursing to depth O(#lines) also act like arrays, they will incur the same penalty. A. There will be at most 4 tests of stability. B. In addition to the 40 regular tests, there will be some diagnostic tests that may lead to deductions, including: * Using Merge16H to verify that your Merge16 implements merge sort. Failing this test will incur a 10-point deduction unless you can show it does. * Using Merge16H to count the maximum number of queues in use at any time. Using three or more queues will incur a 4-point deduction. * Using Merge16H to count the number of calls to addQ(). An _efficient_ implementation of Merge16 avoids unnecessary movement of lines between queues and needs at most N*log2(N) calls to sort N lines when N > 10. Using more calls but at most N*log2(N) + N will incur a 1-point deduction. Using still more but at most 2*N*log2(N) + 2*N will incur a 2-point deduction. Using even more will incur a 3-point deduction. Hint: Different merged sequences can be added to different queues. * Using Merge16H to detect the use of arrays (automatic local, static local, or global), which will incur a 10-point deduction unless you can show your program does not use them. * Analyzing Merge16H for the presence of global variables, which will incur a 4-point deduction unless you can show your program does not use them. To pass these tests your Merge16 must be able to sort 10000-line files that are initially in random order (e.g., shuffled by the shuf command) in under one second. Should you never reach that point, you can see me after the final script is released to explain why I should void these deductions. C. Reading: Kernighan & Ritchie, Chapter 6 (structures) Summit: https://www.eskimo.com/~scs/cclass/krnotes/sx9.html (K&R 6) Kernighan & Ritchie, Section 7 (input/output) Summit: https://www.eskimo.com/~scs/cclass/krnotes/sx10.html (K&R 7) Kernighan & Pike, Section 2.7 (lists) Kernighan & Pike, Chapter 4 (interfaces) CS-223-02/24/16 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ // Queue.h Stan Eisenstat (2016/02/24) // // Define the abstract data type for a Queue. #include // A Queue is a pointer to a struct that is used to implement the queue // operations declared below. By defining its data type incompletely (i.e., // by not defining the fields of the struct node below), that pointer can be // stored within the calling program without knowing what the exact type of // the object is. typedef struct node *Queue; // The following functions are the only means to access the Queue data // structure. They all require a pointer to a Queue variable since they may // need to modify that variable in the course of accessing the data structure. // Most return the status of the operation, either TRUE or FALSE (e.g., invalid // arguments, an inconsistent Queue object, or a NULL return from malloc() / // realloc()). // Set *Q to a new object of type Queue. Return status. int createQ (Queue *q); // Add the string pointer S to the tail of Queue *Q; the string itself is not // copied. *Q may change as a result. Return status. int addQ (Queue *q, char *s); // Return TRUE if the Queue *Q is empty, FALSE otherwise. *Q may change as a // result. int isEmptyQ (Queue *q); // Copy the string pointer at the head of Queue *Q to *S, but do not remove it // from *Q. *Q may change as a result. Return status. (If *Q is empty, then // returns FALSE and leaves *S unchanged.) int headQ (Queue *q, char **s); // Remove the string pointer at the head of the Queue *Q and store that value // in *S. *Q may change as a result. Return status. (If *Q is empty, then // returns FALSE and leaves *S unchanged.) int removeQ (Queue *q, char **s); // Destroy the Queue *Q by freeing any storage it uses (but not that to which // the string pointers point). Set *Q to NULL. Return status. int destroyQ (Queue *q); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ // demo.c Stan Eisenstat (2016/02/24) // // Demonstrate the use of getLine() and the Queue ADT #include #include #include "/c/cs223/Hwk3/getLine.h" #include "/c/cs223/Hwk4/Queue.h" // Print message to stderr and exit. #define DIE(msg) exit (fprintf (stderr, "%s\n", msg)) int main (int argc, char *argv[]) { FILE *fp = stdin; // Read from standard input */ Queue Q; char *line; if (!createQ (&Q)) // Create Queue for stdin DIE ("createQ() failed"); while ((line = getLine(fp))) { // Append lines read to Q if (!addQ (&Q, line)) DIE ("addQ() failed"); } while (!isEmptyQ (&Q)) { // Output lines in Q if (!removeQ (&Q, &line)) DIE ("removeQ() failed"); fputs (line, stdout); free (line); // Free storage for line } if (!destroyQ (&Q)) // Destroy Queue DIE ("destroyQ() failed"); return EXIT_SUCCESS; }