P R E L I M I N A R Y S P E C I F I C A T I O N Due 2:00 am, Friday, 27 February 2009 CPSC 223b Homework #3 String Rewriting Systems (40) Write a "string rewriting" filter Subst09 [-QNCAqnca] [Fi Ti]* filename that reads lines from the specified file one at a time, performs the requested substitutions (if any), and writes the possibly modified lines to the standard output. NOTE: Brackets in the command line are used to delimit optional arguments: [-QNCAqnca] denotes any combination of these flags; and [Fi Ti]* denotes zero or more pairs of arguments that represent string substitution rules. Thus the first argument contains flags only if the total number of arguments (including the zero-th argument Subst09) is odd. Lines are processed one at a time. The UPPER case flags Q, N, C, and A specify the ORDER in which the string substitution rules are applied; the LOWER case flags q, n, c, and a specify HOW each rule is applied. Each string substitution rule is specified by a pair of command line arguments, say Fi and Ti, where Fi denotes the string to replace and Ti the replacement string, which may be longer than, shorter than, or the same length as Fi): When the q flag (Quit) is specified, only the leftmost occurrence (if any) of Fi in the line is replaced by Ti. When the n flag (continue Next) is specified, every occurrence of Fi in the line is replaced by Ti. That is, the leftmost occurrence (if any) of Fi is replaced; and, after each substitution, the modified line is scanned beginning with the first character FOLLOWING the replacement string for the leftmost occurrence of Fi, which is then replaced by Ti. When the c flag (rescan Current) is specified, the leftmost occurrence (if any) of Fi is replaced by Ti; and, after each substitution, the modified line is scanned beginning with the first character IN the replacement string for the leftmost occurrence of Fi, which is then replaced by Ti. When the a flag (rescan All) is specified, the leftmost occurrence (if any) of Fi is replaced by Ti; and, after each substitution, the ENTIRE line is (re)scanned for the leftmost occurrence of Fi, which is then replaced by Ti. The default (when no lower-case flag is specified) is n. Examples: Command Input Intermediate stages Output ~~~~~~~ ~~~~~ ~~~~~~~~~~~~~~~~~~~ ~~~~~~ Subst09 -q ab ba "aabbab" "ababab" ^^ Subst09 -n ab ba "aabbab" "ababab" "ababba" ^^ ^^ Subst09 -c ab ba "aabbab" "ababab", "abbaab" "abbaba" ^^ ^^ ^^ Subst09 -a ab ba "aabbab" "ababab", "baabab", "babaab", "bbbaaa" ^^ ^^ ^^ ^^ "bbaaab", "bbaaba", "bbabaa" ^^ ^^ ^^ Subst09 ab ba "aabbab" "ababab" "ababba" ^^ ^^ The application of a string substitution rule is said to be successful if it causes at least one replacement. Otherwise it is unsuccessful. When more than one string substitution rule is specified, they are applied in in LEFT-TO-RIGHT order, with the following constraints: When the Q flag (Quit) is specified, if the application of a rule is successful, then the remaining rules are ignored. When the N flag (apply Next) is specified, if the application of a rule is successful, then the next rule (in left to right order) is applied. When the C flag (reapply Current) is specified, if the application of a rule is successful, then that rule is reapplied before continuing on to the next rule (in left to right order). When the A flag (reapply All) is specified, if the application of a rule is successful, then all of the rules are (re)applied in left to right order. The default (when no upper-case flag is specified) is N. Examples: Command Input Intermediate stages Output ~~~~~~~ ~~~~~ ~~~~~~~~~~~~~~~~~~~ ~~~~~~ Subst09 -Qn aa a abb aab "aaabbb" "aabbb" ^^ Subst09 -Nn aa a abb aab "aaabbb" "aabbb" "aaabb" ^^ ^^^ Subst09 -Cn aa a abb aab "aaabbb" "aabbb", "abbb", "aabb" "aaab" ^^ ^^ ^^^ ^^^ Subst09 -An aa a abb aab "aaabbb" "aabbb", "abbb", "aabb", "ab" ^^ ^^ ^^^ ^^ "abb", "aab" ^^^ ^^ Subst09 -n aa a abb aab "aaabbb" "aabbb" "aaabb" ^^ ^^^ Subst09 aa a abb aab "aaabbb" "aabbb" "aaabb" ^^ ^^^ If more than one of the upper case flags Q, N, C, and A are specified, then the last flag given takes precedence. Similarly for the lower case flags q, n, c, and a. The flags may be specified in any order. If the filename specified is -, then Subst09 reads from the standard input. Subst09 should be reasonably robust, printing a one-line error message to the standard error and exiting if the command-line arguments are illegal or if FILENAME is not readable. Use the submit command to turn in the source files for Subst09, a Makefile, and your log file (see Homework #1). YOU MUST SUBMIT YOUR FILES (INCLUDING THE LOG FILE) AT THE END OF ANY SESSION WHERE YOU SPEND AT LEAST ONE HOUR WRITING OR DEBUGGING CODE, AND AT LEAST ONCE EVERY FIVE HOURS DURING LONGER SESSIONS. (ALL SUBMISSIONS ARE RETAINED.) Notes: 1. To detect certain errors associated with the use of malloc(), realloc(), and free(), all tests will use valgrind, a system that detects uninitialized variables, malloc() errors, etc. To use it yourself, type % /c/cs223/bin/valgrind -q ./Subst04 ... See http://valgrind.org/ for details. (There is a link to this page on the class web page.) 2. To ensure that Subst09 can handle input lines of arbitrary length, use the function getLine() which is defined in Hwk3/getLine.h and implemented in Hwk3/getLine.c. Your source file(s) should #include /c/cs223/Hwk3/getLine.h rather than the name of a local copy. Similarly your Makefile should specify Hwk3/getLine.o rather than a local copy. 3. Subst09 frees all storage before exiting, unless an error is detected. This will be checked using valgrind (but in at most six tests). 4. Some substitutions (e.g., "Subst09 -a a a" or "Subst09 -A a a" applied to "a") would not normally terminate. Thus Subst09 detects some of these cases and acts as follows: a) If the -c or -a flag is specified but the replacement of some Fi by Ti leaves the line unchanged, then pretend that this rule was applied successfully and do not resume scanning. b) For each rule, keep a copy of the line just before the last time that the rule was applied to it (or NULL if this is the first time that the rule was applied to it) and whether or not the application was successful. If a rule is about to be applied to the same input and was successful previously, then pretend that the substitution was NOT successful and continue. Point to Ponder: Will these measures detect ALL cases where Subst09 would not otherwise terminate? 5. Subst09 does not place any limits on the number of substitution rules nor on the lengths of the Fi and Ti strings. 6. The string functions strlen(), strstr(), strncmp(), strcat(), strcpy(), and strncpy(), may save you a lot of coding in Subst09, but make sure that you understand strncpy() before using it (in particular, note that it does NOT null-terminate the result). 7. While you may assume that there are no null characters in the input (since getLine() has no mechanism for returning them), you may not assume that the other 255 possible character values are not present. 8. Your program may assume that malloc() and realloc() will always return a non-NULL value, which will be true if storage is allocated and freed as required. 9. Hwk3/Subst09 accepts a -v option that prints the rule being applied and the line before/after each substitution. E.g., % echo aaabbaa | Subst09 -Aav aa a bb a - F=aa T=a POS=0 OLD=aaabbaa NEW=aabbaa F=aa T=a POS=0 OLD=aabbaa NEW=abbaa F=aa T=a POS=3 OLD=abbaa NEW=abba F=bb T=a POS=1 OLD=abba NEW=aaa F=aa T=a POS=0 OLD=aaa NEW=aa F=aa T=a POS=0 OLD=aa NEW=a a You do NOT need to implement the -v option; its only purpose is to help you understand how the substitution process works. CS-223-02/09/09