P R E L I M I N A R Y S P E C I F I C A T I O N Due 5:00 PM, Monday, 04 May 2009 CPSC 223b Homework #6 WIPL-Balanced String Trees (40) Write a program Count09 [TEXTFILE | -d DICTIONARY | -set VALUE | -print | -dump | -ipl]* to count the number of times that lines appear in certain files. Count09 maintains a (line, count) tree and an improvement factor IF (initially 0) and accepts six types of arguments: Type Action Taken ---- ------------ TEXTFILE Read lines from TEXTFILE, insert into the tree those not already present, and increment the count for the rest -d DICTIONARY Read lines from DICTIONARY and delete from the tree those that are present -set VALUE Set the improvement factor IF to the integer VALUE specified -print Write the lines (and corresponding counts) in the line tree to the standard output in inorder (i.e., alphabetic order), using the format "%d %s" -dump Write the lines in the line tree to the standard output in preorder, using the format "%s" -ipl Write the sum of the counts in the nodes of the line tree and the weighted internal path length of the line tree to the standard output using the format "%d, %d\n" Each command line argument is processed at the time it is encountered. Thus with the command % Count09 Essay -d SysDict -dump -set 2 Appendix -d UsrDict -print the lines in the file Essay will be added to the tree; the lines in the file SysDict deleted from the tree; the tree dumped in preorder; IF changed to 2 from the default of 0; the lines in the file Appendix added to the tree; the lines in the file UsrDict deleted from the tree; and the lines in the tree printed in alphabetic order along with their counts. The tree should be implemented as a "WIPL-balanced" binary search tree. Define the weight W(T) of a subtree T to be the sum of the counts in the nodes of T, including its root; and define the weight W(N) of a node N as the weight of the subtree rooted at N. Similarly, let WIPL(T) denote the weighted internal path length of a subtree T, and let WIPL(N) denote the weighted internal path length of the subtree rooted at N. To improve the average search time the standard algorithms for search/insert and delete are modified as follows: Search/Insert: Assume that the line LINE was found or inserted in the subtree T rooted at N. If the node containing LINE is in the left [right] subtree of T, then rotate N to the right [left] whenever this rotation would reduce WIPL(T) by more than IF; that is, letting T' denote the subtree after the rotation, whenever WIPL(T') < WIPL(T) - IF. Thus a single search/insert can cause a cascade of rotations. Delete: Assume that the line LINE to be deleted is stored in the root N of the subtree T. If N has two children, L and R, and W(L) <= W(R) [W(L) > W(R)], then rotate N to the left [right], so that R [L] is the new root of T, and then delete LINE from the left [right] subtree. Otherwise, if N has fewer than two children, N is deleted in the usual manner. Assume that the line was deleted from the right [left] subtree of the root N of the subtree T (or would have been deleted from that subtree had it been present in T originally). If N has a left [right] child, rotate N to the right [left] whenever this rotation would reduce WIPL(T) by more than IF; that is, letting T' denote the subtree T after the rotation, whenever WIPL(T') < WIPL(T) - IF. Thus a delete can cause a cascade of rotations. searchInsert() and delete() must run in O(tree-height) time. This means that W(T) and WIPL(T) must be stored in the root of each subtree T of the line tree and that these values must be updated in O(1) time per node visited. When IF is 0, the change to search/insert is intended to reduce the weighted internal path length (and thus the average search time) whenever possible. The change to delete is also intended to reduce the IPL. (Whether these intentions are met is unknown.) Examples: % Count09 sectionOne sectionTwo sectionThree -d /usr/dict/words -print % Count09 chapter -d /server/usr/dict/words -print -d myDictionary -print % Count09 wordList -print -ipl % Count09 -set 3 novella -d /server/usr/dict/words -dump Use the submit command to turn in the source files for Count09, a Makefile, and your log file (see Homework #1). YOU MUST SUBMIT YOUR FILES (INCLUDING THE LOG FILE) AT THE END OF ANY SESSION WHERE YOU SPEND AT LEAST ONE HOUR WRITING OR DEBUGGING CODE, AND AT LEAST ONCE EVERY FIVE HOURS DURING LONGER SESSIONS. (ALL SUBMISSIONS ARE RETAINED.) Notes: 1. Count09 should use strcmp() to define the order on lines, and print each line with the format "%s\n". 2. A dictionary can be ANY file (that is, there is no requirement that the lines be in alphabetical order). 3. If TEXTFILE or DICTIONARY is -, then the lines should be read from the standard input (i.e., using the file pointer stdin). Note: None of the test scripts will use this "shorthand" more than once in a single execution of Count09. 4. Count09 should detect illegal command-line arguments and nonexistent or unreadable files, printing one-line error messages to stderr and exiting. Errors may be detected when they are invoked (vs. at the beginning). 5. The getLine(FILE*) function implemented in Hwk4/getLine.h and Hwk4/getLine.o may be used to read lines from a file given an existing FILE pointer. 6. The WIPL-balanced binary search tree should be written as an abstract data type (i.e., with a .h file to define the interface and the implementation in a separate .c file) so that the code can be used in other applications. The implementation may NOT use parent pointers (i.e., there can only be two node pointers per node) or arrays (other than to hold copies of lines). Hwk6/Tree.h defines an interface to an ADT for a standard binary search tree of which Hwk6/Tree.c is a nearly complete implementation. These are based on the handout in class. You may use any or all of this code without attribution (but you are responsible for its style!). 7. Your implementation of Count09 may not use any non-static global variables, should free malloc()-ed storage before overwriting the last pointer to it, and should close each file pointer after reading the file. However, you need not free all storage before exiting. 8. You may find it easier to get Count09 working with standard binary search trees first and then modify the tree module to support the modifications to searchInsert() and delete() needed for WIPL-balanced binary search trees. CS-223-04/19/09