P R E L I M I N A R Y S P E C I F I C A T I O N Due 2:00 AM, Friday, 02 October 2020 CPSC 323 Homework #2 Parsing (Some) Bash Commands REMINDER: Do not under any circumstances copy someone else's code for this assignment, give your code to someone else, or make it publicly available. After discussing any aspect of the assignment with anyone other than a member of the teaching staff (such discussions must be noted in your log file), do not keep any written or electronic record and engage in some mind-numbing activity before you work on the assignment again. Sharing ANY related document (e.g., code or test cases), is a violation of this policy. Since code reuse is an important part of programming, you may study published code (e.g., from textbooks or the Net) and/or incorporate it in a program, provided that you give proper attribution in your log file and in your source files (see the syllabus for details) and that the bulk of the code submitted is your own. Note: Removing/rewriting comments, renaming functions/variables, or reformatting statements does not convey ownership. (50 points) Write a bash command-line parser "parsley" that prompts for and reads lines from stdin, breaks the lines into tokens, parses the tokens into a tree of commands, and writes the tree in in-order to stdout. Here a token is one of the following: (1) a maximal contiguous sequence of non-whitespace printing characters (see "man 3 isgraph") other than the metacharacters <, >, ;, &, |, (, and ) [a TEXT token]; (2) a redirection symbol [<, <<, >, >>, 2>, 2>>, or &>]; (3) a pipeline symbol [|]; (4) a command operator [&& or ||]; (5) a command terminator [; or &]; (6) a left or right parenthesis [used to group commands into subcommands]; Whitespace may be used to separate tokens from each other but is otherwise ignored. Nonprinting characters are treated as spaces. Example: The command line < A B | ( C 2> D & E < F ) > G ; H=I J K results in the following command tree . ; . / \ . PIPE H=I J K . / \ . B G . / . & . / \ . C 2>D E D CMD (Depth = 3): SEP_BG CMD (Depth = 4): SIMPLE, argv[0] = E G CMD (Depth = 0): SEP_END CMD (Depth = 1): SIMPLE, argv[0] = J, argv[1] = K LOCAL: H=I, The file Hwk2/mainParsley.c contains the main program for parsley; your task is to write the function parse(), which tokenizes and parses each line, creating a tree of command structs as specified in Hwk2/parsley.h. Tokenization includes the following bash features: * [Escape Characters] The escape character \ removes any special meaning that is associated with the non-null, non-newline character immediately following. This may be used to include whitespace (but not newlines), metacharacters, and single/double quotes in a TEXT token. The escape character is removed. * [Comments] If a # would begin a TEXT token, it and the remainder of the line (including any trailing \'s) are ignored; otherwise it is just another character. Escaping a leading # allows it to begin a TEXT token. * (Optional) [Single Quoted Strings] All characters appearing between matching single quotes (') are treated as nonwhitespace, non-metacharacters. Such strings may not contain single quotes, even when preceded by a \. The surrounding quotes are removed. If there is no matching quote, parse() writes a message to stderr and returns NULL. * (Optional) [Double Quoted Strings] All characters appearing between matching double quotes (") are treated as nonwhitespace, non-metacharacters, except that a \ followed by a " or \ acts as an escape character and removes any special meeting associated with the " or \. The surrounding quotes and all escape characters are removed. If there is no matching quote, parse() writes a message to stderr and returns NULL (which denotes the empty command). While creating the command tree, parse() must also handle the following bash-like features: * [Local Variables] A TEXT token of the form NAME=VALUE specifies that VALUE should be assigned to the environment variable NAME when a simple command or subcommand is executed. Here NAME is a sequence of one or more alphanumeric or underscore characters that does not begin with a digit, and VALUE may be empty. Such assignments must precede the zero-th argument of the simple command or the opening left parenthesis of the subcommand. * [HERE Documents] The redirection <, >>, 2>, 2>>, &>), and HERE documents. But bear in mind that there are many features that parsley does not implement. Although they are implemented in the reference solution, your parse() NEED NOT handle single or double quoted strings or redirection involving stderr (i.e., 2>, 2>>, and &>). They will not be tested. 3. Your source files must #include Hwk2/parsley.h and your object files must be linked with Hwk2/mainParsley.o rather than local copies. The source code for parse() should be in a different file (or files). To enforce this the test script may delete local files named parsley.h and mainParsley.c. Thus you should use #include "/c/cs323/Hwk2/parsley.h" instead of #include "parsley.h" in your source files; or CFLAGS= -std=c99 -pedantic -Wall -I/c/cs323/Hwk2/ instead of CFLAGS= -std=c99 -pedantic -Wall in your Makefile (the -Idirectory option extends the search for #include-d files to the directory specified). For more information on how to do so, read the two-page "How to Create a Makefile", available at http://zoo.cs.yale.edu/classes/cs323/doc/Makefile You can use % /c/cs323/bin/makeit 2 parsley or % /c/cs323/bin/testit 2 parsley to test whether you have done this correctly. 4. Tokenization is greedy so that >>> is split into >> and >, not > and >> or >, >, and >. 5. parsley should detect errors such as: * unbalanced parentheses; e.g., (ls bar ; ls foo * a missing name following a redirection symbol; e.g., ls > * multiple input, output, or error redirection; e.g., ls >bar | wc * both a command and a subcommand or two subcommands; e.g., ls (cat /etc/motd) * a missing command; e.g., ls | | wc writing a one-line error message to stderr and returning NULL (which denotes the empty command). WARNING: A few of the public tests involve error checking, but parse() will need to do a LOT more to pass all of the final tests. 6. The behavior of parsley does not match bash in all particulars, including: a. parsley does not recognize line splices outside of HERE documents. b. Within a double-quoted string parsley does not treat a \ followed by a $, `, or ! as an escape character. c. parsley allows local variable definitions and input, output, or error redirection _before_ a subcommand. d. parsley treats commands like > FILE (missing command) and command >file1 >>file2 (multiple input, output, or error redirection) as syntactically invalid. e. parsley does not issue a prompt for each line in a HERE document when run interactively. f. parsley does not expand environment variables on the command line. In such instances, the behavior of Hwk2/parsley is what your code should reproduce. Note: A list of other known differences will be maintained at Hwk2/Differences. Please report any others that you discover. 7. To avoid limiting the length of a line, mainParsley.c uses the library function getline(). (See "man 3 getline"; you must #define _GNU_SOURCE.) 8. mainParsley.c contains functions that you may find useful for implementing and debugging parsley: mallocCMD() and freeCMD() to create and free an empty CMD struct; and dumpTree() to print a command tree. 9. For simplicity, you may ignore the possibility of error returns from malloc()/realloc(). However, you must free() all such storage when it is no longer needed, so that all storage has been freed when the program exits. A. The inputs used by the test scripts may not be valid bash commands. B. Hwk2/parsley exists to answer your questions about how parsley is supposed to work. But ask if its behavior is not consistent with how you interpret this specification, since either may have bugs. C. Ignoring code that handles quoted strings and redirection of stderr, the reference solution is roughly 160 lines long (as measured by xLines) and uses the library functions asprintf(), strchr(), strdup(), strncmp(), and strspn() among others. Appendix ~~~~~~~~ The syntax for a command is [local] = NAME=VALUE [red_op] = < / << / > / >> / 2> / 2>> / &> [redirect] = [red_op] FILENAME [prefix] = [local] / [redirect] / [prefix] [local] / [prefix] [redirect] [suffix] = TEXT / [redirect] / [suffix] TEXT / [suffix] [redirect] [redList] = [redirect] / [redList] [redirect] [simple] = TEXT / [prefix] TEXT / TEXT [suffix] / [prefix] TEXT [suffix] [subcmd] = ([command]) / [prefix] ([command]) / ([command]) [redList] / [prefix] ([command]) [redList] [stage] = [simple] / [subcmd] [pipeline] = [stage] / [pipeline] | [stage] [and-or] = [pipeline] / [and-or] && [pipeline] / [and-or] || [pipeline] [sequence] = [and-or] / [sequence] ; [and-or] / [sequence] & [and-or] [command] = [sequence] / [sequence] ; / [sequence] & Note that FILENAME = TEXT. Aside: The grammar is ambiguous in the sense that in the input % A=B parsley could treat the TEXT token A=B as an argument in a valid command or as a local in an invalid one. Similarly, for the input % A=B C=D E it could treat A=B and C=D as locals, or treat A=B as a local and C=D as an argument, or treat A=B and C=D as arguments; all of which result in valid commands. To resolve the ambiguity we will assume that a TEXT token of the form NAME=VALUE is always treated as a [local] if it can be. A command is represented as a tree of CMD structs containing its [simple] commands and the "operators" | (= PIPE), && (= SEP_AND), || (= SEP_OR), ; (= SEP_END), & (= SEP_BG), and SUBCMD. The command tree is determined by (but is not equal to) the parse tree in the above grammar. The tree for a [simple] is a single struct of type SIMPLE that specifies its arguments (argc, argv[]); its local variables (nLocal, locVar[], locVal[]); and whether and where to redirect its standard input (fromType, fromFile), its standard output (toType, toFile), and its standard error (errType, errFile). The left and right children are NULL. The tree for a [stage] is either the tree for a [simple] or a CMD struct of type SUBCMD (which may have local variables and redirection) whose left child is the tree representing a [command] and whose right child is NULL. Note that I/O redirection is associated with a [stage] (i.e., a [simple] or [subcmd]), but not with a [pipeline] (redirection for the first/last stage is associated with the stage, not the pipeline). The tree for a [pipeline] is either the tree for a [stage] or a CMD struct of type PIPE whose right child is a tree representing the last [stage] and whose left child is the tree representing the rest of the [pipeline]. The tree for an [and-or] is either the tree for a [pipeline] or a CMD struct of type && (= SEP_AND) or || (= SEP_OR) whose left child is a tree representing an [and-or] and whose right child is a tree representing a [pipe-line]. The tree for a [sequence] is either the tree for an [and-or] or a CMD struct of type ; (= SEP_END) or & (= SEP_BG) whose left child is a tree representing a [sequence] and whose right child is a tree representing an [and-or]. The tree for a [command] is either the tree for a [sequence] or a CMD struct of type ; (= SEP_END) or & (= SEP_BG) whose left child is the tree representing a [sequence] and whose right child is NULL. Examples (where A, B, C, D, and E are [simple]): Command Tree < A B | C | D | E > F PIPE / \ PIPE E >F / \ PIPE D / \