R E V I S E D S P E C I F I C A T I O N Due 2:00 am, Friday, 28 February 2020 CPSC 223 Homework #3 String Rewriting Systems REMINDER: Do not under any circumstances copy someone else's code for this assignment, give your code to someone else, or make it publicly available. After discussing any aspect of the assignment with anyone other than a member of the teaching staff (such discussions must be noted in your log file), do not keep any written or electronic record and engage in some mind-numbing activity before you work on the assignment again. Sharing ANY related document (e.g., code or test cases) is a violation of this policy. Since code reuse is an important part of programming, you may study published code (e.g., from textbooks or the Net) and/or incorporate it in a program, provided that you give proper attribution in your log file and in your source file(s) (see the syllabus for details) and that the bulk of the code submitted is your own. Note: Removing/rewriting comments, renaming functions/variables, or reformatting statements does not convey ownership. (40) Write a "STring ReWRiting System" strwrs [ -SRNQsrnq ] [ Fi Ti ]* that reads lines from the standard input, applies the string substitution rules specified (if any) in the manner specified, and writes the (possibly) modified lines to the standard output. NOTE: Brackets on the command line are used to delimit optional arguments. -SRNQsrnq means that any combination of S, R, N, Q, s, r, n, and q (including duplicates) may follow the leading -. However, the first argument will only contain flags if the total number of arguments (including strwrs) is even. Lines are processed one at a time. Each string substitution rule is specified by a pair of command line arguments, say Fi and Ti, where Fi denotes the string to replace and Ti the replacement string, which may be longer than, shorter than, or the same length as Fi). the LOWER case flags s, r, n, and q specify HOW each string substitution rule is applied: When the q flag (Quit) is specified, only the leftmost occurrence (if any) of Fi in the line is replaced by Ti. When the n flag (continue Next) is specified, every occurrence of Fi in the line is replaced by Ti; that is, each time that the leftmost occurrence of Fi is replaced by Ti, the line is scanned beginning with the first character following the replacement string for the leftmost occurrence of Fi, which is then replaced. When the r flag (Rescan) has been specified, each time the leftmost occurrence of Fi is replaced by Ti, the line is scanned beginning with the first character in the replacement string for the leftmost occurrence of Fi, which is then replaced. When the s flag (Start anew) has been specified, each time that the leftmost occurrence of Fi is replaced by Ti, the ENTIRE line is (re)scanned for the leftmost occurrence of Fi, which is then replaced. If none of q, n, r, or s is specified, then n is used. Examples: Command Input Intermediate stages Output strwrs -q ab bca "aabbab" "abcabab" ^^ strwrs -n ab bca "aabbab" "abcabab" "abcabbca" ^^ ^^ strwrs -r ab bca "aabbab" "abcabab", "abcbcaab" "abcbcabca" ^^ ^^ ^^ strwrs -s ab bca "aabbab" "abcabab", "bcacabab", "bcacbcbcaca" ^^ ^^ ^^ "bcacbcaab", "bcacbcabca" ^^ ^^ strwrs ab bca "aabbab" "abcabab" "abcabbca" ^^ ^^ The application of a string substitution rule is said to be successful if it causes at least one replacement. Otherwise it is unsuccessful. The UPPER case flags S, R, N, and Q specify the ORDER in which the string substitution rules are applied when more than one string substitution rule is specified: The leftmost rule on the command line is applied first. When the Q flag (Quit) has been specified and the application of a rule is successful, then the remaining rules are ignored and the line is printed. | When the N flag (apply Next) has been specified and the application of a rule is successful, then the next rule (in left to right order) is applied (starting at the beginning of the string). If there is no next rule, the | line is printed. | When the R flag (Reapply current) has been specified and the application of a rule is successful, then that rule is reapplied (starting at the beginning of| the string). | When the S flag (Start anew) has been specified and the application of a rule is successful, then the first rule (in left to right order) is applied next | (starting at the beginning of the string). | When the application of a rule is UNsuccessful, then the next rule (in left- to-right order) is applied (starting at the beginning of the string). If | there is no next rule, the line is printed. | If none of Q, N, R, or S is specified, then N is used. Examples: Command Input Intermediate stages Output strwrs -Qn aa a abb aab "aaabbb" "aabbb" ^^ strwrs -Nn aa a abb aab "aaabbb" "aabbb" "aaabb" ^^ ^^^ strwrs -Rn aa a abb aab "aaabbb" "aabbb", "abbb", "aabb" "aaab" ^^ ^^ ^^^ ^^^ strwrs -Sn aa a abb aab "aaabbb" "aabbb", "abbb", "aabb", "ab" ^^ ^^ ^^^ ^^ "abb", "aab" ^^^ ^^ strwrs -n aa a abb aab "aaabbb" "aabbb" "aaabb" ^^ ^^^ strwrs aa a abb aab "aaabbb" "aabbb" "aaabb" ^^ ^^^ If more than one of the upper case flags Q, N, R, and S are specified, then the last such flag specified takes precedence. Similarly for the lower case flags q, n, r, and s. The flags may be specified in any order. To make strwrs more useful, the first character in the Fi field of a rule can be a + (plus sign), which matches the beginning of the line. Similarly, the last character can be a - (minus sign), which matches the gap following the last character on the line (not including the newline, if any, so that the newline itself is NOT replaced). Examples: Command Input Intermediate stages Output strwrs -n +a b "abab" "bbab" ^^ strwrs -n +a b "baba" "baba" strwrs -n a- b "baba" "babb" ^^ strwrs -n a- b "abab" "abab" To allow a + to be the first character of a match, or a - to be the last character of a match, a / (forward slash) escapes the character immediately following (but not a newline or the terminating null character) just as a \ (backslash) escapes a following \ or " in a C string. Note that this is not required when the + is not the first character or the - is not the last character. Examples: Command Input Intermediate stages Output strwrs -n /++ ab "++++" "ab++" "abab" ^^ ^^ strwrs -n -/- ab "----" "ab--" "abab" ^^ ^^ strwrs -n /a/ bc "a/a/" "bca/" "bcbc" ^^ ^^ strwrs should be reasonably robust, printing a one-line error message (ending in a newline) to the standard error and exiting if the command-line arguments are illegal. Use the submit command to turn in the source files for strwrs, a Makefile, and your log file (see Homework #1). All files must contain your name and Yale netID. YOU MUST SUBMIT YOUR FILES (INCLUDING YOUR LOG FILE) AT THE END OF ANY SESSION WHERE YOU WRITE OR DEBUG CODE, AND AT LEAST ONCE EVERY HOUR DURING LONGER SESSIONS. (All submissions are retained.) Notes: 1. To detect certain errors associated with the use of malloc(), realloc(), and free(), all tests will use valgrind, a system that detects uninitialized variables, malloc() errors, etc. To use it yourself, type % valgrind -q ./strwrs ... See Hwk3/Valgrind for a list of common valgrind errors and possible explanations of their origin. See http://valgrind.org/ for more details. (There are links to these pages on the class web page.) 2. To ensure that strwrs can handle input lines of arbitrary length, strwrs should use the function getline() (see "man getline"), which requires adding the line "#define _GNU_SOURCE" before the first #include. Here is a short code fragment that illustrates its use copying stdin to stdout: size_t n = 0; char *line = NULL; while (getline (&line, &n, stdin) != -1) { fputs (line, stdout); free (line); n = 0; line = NULL; } free (line); 3. strwrs should free all storage allocated by malloc() or realloc() before calling exit() or returning from main(), unless an error is detected. This will be checked using valgrind in at most five tests. 4. Some substitutions (e.g., "strwrs -s a a" applied to "a") would not normally terminate. To make your program more useful, it should detect the following such cases: a) If the r or s flag has been specified and the replacement of some Fi by Ti leaves the line unchanged, then pretend that this rule was not applied successfully and continue to the next rule. b) If the R flag has been specified and a successful substitution leaves the line unchanged, then pretend that the substitution was UNsuccessful and continue to the next rule. There will be at most eight tests of this capability. Point to Ponder: Will these measures detect ALL cases where strwrs would not otherwise terminate? 5. strwrs does not place any limits on the number of substitution rules or on the lengths of the Fi and Ti strings. Since the Fi and Ti fields and the input lines returned by getline() are strings, they cannot contain null characters. For simplicity strwrs may also assume that there are no newlines in the Fi and Ti fields (and need not fail gracefully if they are present). However, you may NOT assume that the other 254 possible character values are not present. 6. Some of the string functions strlen(), strcmp(), strncmp(), strstr(), strchr(), strcat(), strcpy(), strncpy(), strdup(), and strndup() may save you some coding in strwrs, but make sure that you understand strncpy() before using it (in particular, note that it does NOT null-terminate the result). If you use strdup() or strndup(), you must add the line #define _GNU_SOURCE before the first #include. The print functions sprintf() and asprintf() may also be useful. 7. Your program may assume that malloc() and realloc() will always return a non-NULL value, which will be true if storage is allocated and freed as required. 8. Hwk3/strwrs accepts a -v option (which must be the first argument) that prints the rule being applied and the line before/after each substitution. E.g., % echo aaaaaaaaa | strwrs -v -Sn aaa bb bbb aa F=aaa T=bb POS=0 OLD=aaaaaaaaa NEW=bbaaaaaa POS=2 OLD=bbaaaaaa NEW=bbbbaaa POS=4 OLD=bbbbaaa NEW=bbbbbb F=bbb T=aa POS=0 OLD=bbbbbb NEW=aabbb POS=2 OLD=aabbb NEW=aaaa F=aaa T=bb POS=0 OLD=aaaa NEW=bba bba You do NOT need to implement the -v option; its only purpose is to help you understand how the substitution process works. 9. Freeing all storage will be worth at most 5 points. The fences (+, -, and /) will be worth at most 7 points. Detecting infinite loops will be worth at most 6 points. A. Hint: One way to implement strwrs is to write 16 functions, one to handle each combination of an upper and a lower case flag. Another is to write 8 functions, one for each upper and lower case flag. There are approaches that require even less code duplication. (The reference solution is just over 100 lines as measured by /c/cs223/bin/xLines.) B. strwrs may NOT use global variables. C. Reading: Kernighan & Ritchie, Chapter 5 (pointers) Kernighan & Ritchie, Section 7.6 (printing to the standard error) Kernighan & Ritchie, Section 7.8 (string functions and storage management) Kernighan & Ritchie, Section B3 (string functions) Aspnes, Section 3.4.4 (valgrind) Aspnes, Section 4.9 (pointers and malloc()) Aspnes, Section 4.10 (strings) /c/cs223/Hwk3/Valgrind CS-223-02/16/20|