P R E L I M I N A R Y S P E C I F I C A T I O N Due 2:00 AM, Friday, 26 February 2016 REMINDER: Do not under any circumstances copy another student's code or give a copy of your code to another student. After discussing the assignment with another student, you may not take any written or electronic record away. Moreover, you must engage in a full hour of mind-numbing activity before you work on it again. Such discussions must be noted in your log file. CS 223 Homework #3 String Rewriting Systems (40) Write a "string rewriting" filter Subst16 [FROM TO -FLAGS]* that reads lines from the standard input, performs the requested substitutions (if any), and writes the (possibly) modified lines to the standard output. NOTE: Brackets in the command line are used to delimit optional arguments, so [FROM TO -FLAGS]* denotes zero or more triples (FROM, TO, -FLAGS). Lines are processed one at a time. Each triple (FROM, TO, -FLAGS) specifies a string substitution rule. FROM is the string to replace; TO is the replacement string (note that TO may be longer than, shorter than, or exactly the same length as FROM); and FLAGS is a concatenation of one or more strings that specify how the rule is applied (q, g, and r) and the next rule to apply (see the descriptions of Sn and Fm below): When the q flag (Quit) is specified, only the leftmost occurrence (if any) of the string FROM in the line is replaced by the string TO. When the g flag (Global) is specified, every occurrence of FROM in the line is replaced by TO. That is, the leftmost occurrence (if any) of FROM is replaced by TO; and, after each substitution, the modified line is scanned beginning with the first character _following_ the replacement string for the leftmost occurrence of FROM, which is then replaced by TO. When the r flag (Rescan) is specified, the leftmost occurrence (if any) of FROM is replaced by TO; and, after each substitution, the modified line is scanned beginning with the first character _in_the_replacement_string_ for the leftmost occurrence of FROM, which is then replaced by TO. If more than one of q, g, or r is specified, then the last is used. If none is specified, then q is used. Examples: Command Input Intermediate stages Output Subst16 ab ba -q "aabbab" "ababab" ^^ Subst16 ab ba -g "aabbab" "ababab" "ababba" ^^ ^^ Subst16 ab ba -r "aabbab" "ababab", "abbaab" "abbaba" ^^ ^^ ^^ Subst16 ab ba - "aabbab" "ababab" ^^ Subst16 ab ba -rqg "aabbab" "ababab" "ababba" ^^ ^^ Once the application of a single string substitution rule is complete, it is said to be successful if it caused at least one replacement, and unsuccessful otherwise. String substitution rules are normally applied in left to right order, with the following constraints: When the Sn (success) flag is specified (where n denotes a sequence of zero or more digits) and the application of the rule is successful, the next rule applied is the n-th rule (the leftmost rule on the command line is the 0-th). When the Fm (failure) flag is specified (where m denotes a sequence of zero or more digits) and the application of the rule is unsuccessful, the next rule applied is the m-th rule. If n (or m) is omitted, then its value is 0. Thus S is equivalent to S0. If n or m is greater than the index of the rightmost rule, then no further substitutions are applied to the current line. If more than one Sn (Fm) is specified, then the last is used. If none is specified, then the next rule in sequence (if any) is applied. Examples: Command Input Intermediate stages Output Subst16 aa a -S2 abb aab -S0 "aaabbb" "aabbb" ^^ Subst16 aa a -S1 abb aab -S2 "aaabbb" "aabbb" "aaabb" ^^ ^^^ Subst16 aa a -S1 abb aab -S1 "aaabbb" "aabbb", "aaabb" "aaaab" ^^ ^^^ ^^^ Subst16 aa a -S0 abb aab - "aaabbb" "aabbb", "abbb" "aabb" ^^ ^^ ^^^ Subst16 aa a -S0 abb aab -S1 "aaabbb" "aabbb", "abbb", "aabb" "aaab" ^^ ^^ ^^^ ^^^ Subst16 aa a - abb aab -S0 "aaabbb" "aabbb", "aaabb", "aabb", ^^ ^^^ ^^ ^^^ "aaab" "aab" ^^ Subst16 aa a - abb aab - "aaabbb" "aabbb" "aaabb" ^^ ^^^ Subst16 aa a -F2 abb aab -S1 "abbb" "abbb" Subst16 aa a -S2 abb aab -S0 "abbb" "aabb" "abb" ^^^ ^^ Subst16 bd a -S2 aa a - .. "aabbbc" "abbbc", "aabbc", "abbc", abb aab -S1 c d -S ^^ ^^^ ^^ ^^^ "aabc", "abc", "abd" "aa" ^^ ^ ^^ The flags q, g, r, Sn, and Fm may appear in any combination and in any order within a given FLAGS field. Thus -gS2F3, -S2gF3, and -F3S2g are all valid. To make Subst16 more useful, the FROM field can contain the wildcard character . (period), which matches any character other than a newline. To specify a real period, use the escape sequence "@." (an @ can be specified as "@@" when necessary, but an @ does not escape any other characters). Also, the TO field can contain the insert-matched-string character ^ (caret), which is replaced by the substring of the line that matched the FROM field. To specify a real ^, use the escape sequence "@^" (an @ can be specified as "@@" when necessary, but an @ does not escape any other characters). Examples: Command Input Intermediate stages Output Subst16 a. ba -g "aabbab" "babbab" "babbba" ^^ ^^ Subst16 a. ba -r "aabb" "babb", "bbab" "bbba" ^^ ^^ ^^ Subst16 a. ^c -g "aabbab" "aacbbab" "aacbbabc" ^^ ^^ Use the submit command to turn in the source files for Subst16, a Makefile, and your log file (see Homework #1). YOU MUST SUBMIT YOUR FILES (INCLUDING THE LOG FILE) AT THE END OF ANY SESSION WHERE YOU SPEND AT LEAST ONE-HALF HOUR WRITING OR DEBUGGING CODE, AND AT LEAST ONCE EVERY HOUR DURING LONGER SESSIONS. (All submissions are retained.) Notes ~~~~~ 1. Subst16 may assume that the command-line arguments are valid, but should fail gracefully if they are not. 2. To ensure that Subst16 can handle input lines of arbitrary length, Subst16 should use the function getLine() which is defined in Hwk3/getLine.h and implemented in Hwk3/getLine.c. Your source file(s) should #include /c/cs223/Hwk3/getLine.h rather than the name of a local copy. Similarly your Makefile should specify Hwk3/getLine.o rather than a local copy. To read from the standard input, the argument to getLine() should be stdin as defined in . 3. To detect certain errors associated with the use of malloc(), realloc(), and free(), all tests will use valgrind, a system that detects uninitialized variables, malloc() errors, etc. To use it yourself, type % /usr/bin/valgrind -q ./Subst16 ... See Hwk3/Valgrind for a list of common valgrind errors and possible explanations of their origin. See http://valgrind.org/ for more details. (There are links to these pages on the class web page.) 4. Subst16 should free all storage before exiting, unless an error is detected. This will be checked using valgrind. 5. Subst16 does not place any limits on the number of substitution rules nor on the lengths of the FROM and TO strings. However, you may assume that there are no newlines or null characters in the substitution rules. While you may assume that there are no null characters in the input (since getLine() has no mechanism for returning them), any of the other 255 possible character values may be present. Subst16 does not detect infinite loops such as % echo aaabbaa | Subst16 a a -S0 and need not fail gracefully in such cases. 6. The string functions strtol(), strlen(), strcmp(), strncmp(), strstr(), strchr(), strcat(), strcpy(), strncpy(), strdup(), and strndup() may save you some coding in Subst16, but make sure that you understand strncpy() before using it (in particular, note that it does NOT null-terminate the result). If you use strdup() or strndup(), you must add the line #define _GNU_SOURCE before the first #include. Subst16 may NOT use asprintf() or atoi(). 7. Subst16 may use structs, although they are not really necessary. However, it may not use any global variables or open any files. 8. Your program may assume that malloc() and realloc() will always return a non-NULL value, which will be true if storage is allocated and freed as required. 9. Hwk3/Subst16 accepts a -v option (which must be the first argument) that prints the rule being applied and the line before/after each substitution. E.g., % echo aaabbaa | Subst16 -v aa a -S0 bb a -S0 F=aa T=a POS=0 OLD=aaabbaa NEW=aabbaa F=aa T=a POS=0 OLD=aabbaa NEW=abbaa F=aa T=a POS=3 OLD=abbaa NEW=abba F=bb T=a POS=1 OLD=abba NEW=aaa F=aa T=a POS=0 OLD=aaa NEW=aa F=aa T=a POS=0 OLD=aa NEW=a a You do NOT need to implement the -v option; its only purpose is to help you understand how the substitution process works. A. The following will be worth at most five points each: * the wildcard character . (period) in FROM * the insert-matched-string character ^ (caret) in TO * freeing all storage before exiting B. Reading: Kernighan & Ritchie, Chapter 5 (pointers) Summit: https://www.eskimo.com/~scs/cclass/krnotes/sx8.html (K&R 5) Kernighan & Ritchie, Section B3 (string functions) Kernighan & Pike, Section 2.6 (growing arrays) Kernighan & Pike, Chapter 5 (debugging: second reading) CS-223-02/10/16