P R E L I M I N A R Y S P E C I F I C A T I O N Due 2:00 AM, Friday, 11 December 2020 CPSC 323 Homework #5 The Shell Game: Sister Sue Saw B-Shells ... REMINDER: Do not under any circumstances copy someone else's code for this assignment, give your code to someone else, or make it publicly available. After discussing any aspect of the assignment with anyone other than a member of the teaching staff (such discussions must be noted in your log file), do not keep any written or electronic record and engage in some mind-numbing activity before you work on the assignment again. Sharing ANY related document (e.g., code or test cases), is a violation of this policy. Since code reuse is an important part of programming, you may study published code (e.g., from textbooks or the Net) and/or incorporate it in a program, provided that you give proper attribution in your log file and in your source files (see the syllabus for details) and that the bulk of the code submitted is your own. Note: Removing/rewriting comments, renaming functions/variables, or reformatting statements does not convey ownership. (60 points) bashLT is a simple shell, a baby brother of the Bourne-again shell bash, and offers a limited subset of bash's functionality: - execution of simple commands with zero or more arguments - definition of local environment variables (NAME=VALUE) - redirection of the standard input (<, <<) - redirection of the standard output (>, >>) - execution of pipelines (sequences of one or more simple commands or subcommands separated by the pipeline operator |) - execution of conditional commands (sequences of one or more pipelines separated by the command operators && and ||) - execution of sequences of one or more conditional commands separated by the command terminators ; and & and possibly terminated by ; or & - execution of commands in the background (&) - subcommands (commands enclosed in parentheses that act like a single, simple command) - reporting the status of the last simple command, pipeline, conditional command, or subcommand executed by setting the environment variable $? to its "printed" value (e.g., the string "0" if the value is zero). - directory manipulation: cd -p Print current working directory to the standard output. cd DIRNAME Change current working directory to DIRNAME. cd Change current working directory to $HOME, where HOME is is an environment variable. - other built-in commands: export NAME=VALUE Set environment variable NAME to VALUE. export -n NAME Delete environment variable NAME. wait Wait until all child processes of the shell process have died. The status is 0. Once the command line has been parsed, the exact semantics of bashLT are those of bash, except for the status variable and the items noted below (see Note A). The assignment is to write the function process() called by Hwk5/mainBashLT.c. Thus you should use (that is, link with) * Hwk5/mainBashLT.o, which is the main program (source is Hwk5/mainBashLT.c) * Hwk5/parsley.o, which implements the function parse() that tokenizes and parses commands into syntactically correct trees of CMD structures (interface in Hwk2/parsley.h). DO NOT MODIFY Hwk5/mainBashLT.c or Hwk2/parsley.h---the source code for process() should be in a different file (or files). To enforce this the test script may delete/rename local files named mainBashLT.* or parsley.* before trying to make your program. Use the submit command to turn in your log file (see Homework #1) and the source files for bashLT (including a Makefile, but not mainBashLT.* or parsley.*) as assignment 5. YOU MUST SUBMIT YOUR FILES (INCLUDING YOUR LOG FILE) AT THE END OF ANY SESSION WHERE YOU WRITE OR DEBUG CODE, AND AT LEAST ONCE EVERY HOUR DURING LONGER SESSIONS. (All submissions are retained.) Notes ~~~~~ 1. [Matthew & Stones, Chapter 2] contains a more complete description of bash, including environment variables, the various I/O redirection operators, pipelines, command operators, command terminators, backgrounded commands, and subcommands; and "man bash" and "info bash" contain more information. But bear in mind that there are many features that bashLT does not implement. Moreover, the behavior of bashLT may not match bash in some cases, including: a. bash has both shell variables and environment variables. A command like % NAME=VALUE assigns VALUE to the shell variable NAME, and thereafter sequences like $NAME are expanded to VALUE as commands are parsed; while a command like % NAME=VALUE printenv NAME assigns VALUE to the environment variable NAME in the process that is executing printenv (but NAME is not defined in the parent shell). bashLT only supports the latter construct (but see the export command). b. In bash local variable definitions and redirection to/from a file may appear only after a subcommand, not before. c. bash allows multiple input and output redirections, with the last encountered taking precedence. In bashLT the parse() function issues an error message instead. d. In bash $? is a shell variable rather than an environment variable, and its value may differ from the status that is reported by bashLT. e. bash and bashLT report the termination of background commands and reaped zombies differently. f. In bash the status of a pipeline is the status of the last stage unless the pipefail option is enabled. g. bash expands HERE documents; bashLT does not. h. bash does not implement the cd -p command. (The bash dirs command is similar, but the pathname is "massaged". For example, if you cd to /c/cs323 the directory name is /home/classes/cs323, not /c/cs323. i. In bash there are other forms of the cd, export, and wait commands. For example, the wait command takes command-line arguments that specify the children for which to wait (vs. all of them). A list of all known differences will be maintained at Hwk5/Differences. Please report any others that you discover. 2. An EOF (CTRL-D in column 1) makes bashLT exit since getline() returns NULL. 3. While executing a simple command, subcommand, pipeline, or conditional command, bashLT waits until it terminates, unless it has been backgrounded. bashLT ignores SIGINT interrupts while waiting, but child processes (other than subshells) do not so that they can be killed by a CTRL-C. Hint: Do not implement this feature until everything else seems to be working. 4. bashLT uses perror() (see "man perror") to report errors from system calls. It may ignore error returns from close(), dup(), dup2(), wait(), and waitpid(), but not from chdir(), execvp(), fork(), getcwd(), getwd(), get_current_dir_name(). open(), pipe(), putenv(), setenv(), and unsetenv(). bashLT also reports an error if the number of arguments to a built-in command is incorrect or if an error is detected during execution of that command. Execution of the command is skipped. All error messages are written to stderr and are one line long. As noted in the man page for perror() ("man 3 perror"): Note that errno is undefined after a successful system call or library function call: this call may well change this variable, even though it succeeds, for example because it internally used some other library function that failed. Thus, if a failing call is not immediately followed by a call to perror(), the value of errno should be saved. For the same reasons, perror() itself may change errno. 5. For simplicity, process() may ignore the possibility of error returns from malloc() and realloc(). However, all storage that it allocates must still be reachable after it returns to main(). 6. The easiest way to implement subcommands (and possibly pipelines as well) is to use a subshell (i.e., a child of the shell process that is also running bashLT) to execute the subcommand and exit with its status before returning to main(). 7. To use getenv(), setenv(), or putenv() to get or set environment variables, you must first #define _GNU_SOURCE since they are not part of the ANSI Standard. 8. When redirecting stdout to a file that does not exist, the new file should be readable and writable by the owner and readable and writable by the group and others. 9. You may find mkstemp(), tmpfile(), fileno(), rewind(), and/or lseek() useful when implementing HERE documents. See their man pages for details. Note: Deleting an open file does not close the associated file descriptor (see "man 2 unlink"). A. Hwk5/mainBashLT.c contains the function dumpTree() that dumps a parse tree of CMD structures. If the environment variable DUMP_TREE exists, then bashLT dumps the parse tree using dumpTree()) . B. Hwk5/process-stub.h contains the #include statements, the STATUS() macro, and the function prototype for process() from my solution. C. No, you may not use system() or /bin/*sh. Fine Points ~~~~~~~~~~~ 1. For a simple command, the status is either the status of the program executed (*) or the value of the global variable errno if some system call failed while setting up to execute the program. (*) This status is normally the value WEXITSTATUS(status), where the variable status contains the value returned by the call to waitpid() that reported the death of the process. However, for processes that are killed (that is, for which WIFEXITED(status) is false), that value may be zero. Thus you should use the macro #define STATUS(x) (WIFEXITED(x) ? WEXITSTATUS(x) : 128+WTERMSIG(x)) instead (see Hwk5/process-stub.h). For a pipeline, the status is that of the latest (that is, rightmost) stage to fail, or 0 if the status of every stage is true. (This is the behavior of bash with the pipefail option enabled.) For a subcommand, the status is that of the command within the parentheses. For a sequence of commands, the status is that of the last command in the sequence. For a backgrounded command, the status in the invoking shell is 0. For a built-in command, the status is 0 if successful, the value of errno if a system call failed, and 1 otherwise (e.g., when the number of arguments is incorrect). Note that this status may differ from that reported by bash. The command % /c/cs323/Hwk5/Tests/exit N will exit with the status N and may be useful for testing purposes. 2. In bash the status $? is an internal shell variable. However, since bashLT does not have such variables, it has no means to check its value. Thus in bashLT the status is an environment variable, which can be checked using /usr/bin/printenv (that is, "printenv ?"). 3. The command separators && and || have the same precedence, lower than |, but higher than ; or &. && causes the simple command, subcommand, or pipeline following to be skipped if the current command exits with a nonzero status (= FALSE, the opposite of C). The status of the skipped command is that of the current command. || causes the simple command, subcommand, or pipeline following to be skipped if the current command exits with a zero status (= TRUE, the opposite of C). The status of the skipped command is that of the current command. Since && and || have equal precedence, in the command (1)$ A || B && C (where A, B, and C are simple commands, subcommands, or pipelines) the command A is always executed; if the status of A is zero, B is skipped and C is executed; and if the status of A is nonzero, B is executed and, if its status is zero, C is executed. 4. While the grammar captures the syntax of bash commands, it does not reflect the semantics of &, which specify that only the preceding should be executed in the background, not the entire preceding . For example the command (1)$ A & B leads to the command tree CMD (Depth = 1): SIMPLE, argv[0] = A CMD (Depth = 0): SEP_BG CMD (Depth = 1): SIMPLE, argv[0] = B which might suggest that the left child of a SEP_BG node is always executed in the background. However, the command (1)$ A ; B & leads to the command tree CMD (Depth = 2): SIMPLE, argv[0] = A CMD (Depth = 1): SEP_END CMD (Depth = 2): SIMPLE, argv[0] = B CMD (Depth = 0): SEP_BG for which backgrounding the left child of the SEP_BG node is incorrect. 5. Anything written to stdout by a built-in command is redirectable. When a built-in command fails, bashLT continues to execute commands. When a built-in command is invoked within a pipeline, is backgrounded, or appears in a subcommand, that command has no effect on the parent shell. For example, the commands (1)$ cd /c/cs323 | ls and (2)$ ls & cd .. & ls do not work as you might otherwise expect. 5. When a redirection fails, bashLT does not execute the simple command or subcommand. Its status is the errno of the system call that failed. 6. When bashLT runs a command in the background, it writes the process id to stderr using the format "Backgrounded: %d\n". bashLT reaps zombies periodically (that is, at least once during each call to process()) to avoid running out of processes. When doing so, it writes the process id and status to stderr using the format "Completed: %d (%d)\n". The built-in wait command does the same. 7. To make programming bashLT slightly more challenging, it may not use wait() or any other system call (e.g., waitpid(-1,...)) that does not specify the pid of the process whose death it is awaiting. That is, it may only use waitpid(pid,...) with a positive pid. Unlike the usual tests, the test of this constraint will deduct 4 points from the total score if it detects a violation. 8. gdb can follow child processes. See the gdb manual (link on the class web page) for details. Limitations ~~~~~~~~~~~ The following features will be worth at most the number of points shown: * (20 points) pipelines * (12 points) &&, ||, and & * (12 points) subcommands * (12 points) the status variable $? * (12 points) the cd built-ins * (10 points) the other built-ins * (10 points) HERE documents * ( 6 points) Reaping zombies Here "at most" signals a crude upper bound intended to give more flexibility while developing the test script and to allow interactions among features. CS-323-11/14/20