P R E L I M I N A R Y    S P E C I F I C A T I O N

                                          Due 2:00 AM, Friday, 11 December 2020

CPSC 323   Homework #5   The Shell Game: Sister Sue Saw B-Shells ...

REMINDER:  Do not under any circumstances copy someone else's code for this
assignment, give your code to someone else, or make it publicly available.
After discussing any aspect of the assignment with anyone other than a member
of the teaching staff (such discussions must be noted in your log file), do not
keep any written or electronic record and engage in some mind-numbing activity
before you work on the assignment again.  Sharing ANY related document (e.g.,
code or test cases), is a violation of this policy.

Since code reuse is an important part of programming, you may study published
code (e.g., from textbooks or the Net) and/or incorporate it in a program,
provided that you give proper attribution in your log file and in your source
files (see the syllabus for details) and that the bulk of the code submitted is
your own.  Note:  Removing/rewriting comments, renaming functions/variables, or
reformatting statements does not convey ownership.


(60 points) bashLT is a simple shell, a baby brother of the Bourne-again shell
bash, and offers a limited subset of bash's functionality:

- execution of simple commands with zero or more arguments

- definition of local environment variables (NAME=VALUE)

- redirection of the standard input (<, <<)

- redirection of the standard output (>, >>)

- execution of pipelines (sequences of one or more simple commands or
  subcommands separated by the pipeline operator |)

- execution of conditional commands (sequences of one or more pipelines
  separated by the command operators && and ||)

- execution of sequences of one or more conditional commands separated by the
  command terminators ; and & and possibly terminated by ; or &

- execution of commands in the background (&)

- subcommands (commands enclosed in parentheses that act like a single, simple
  command)

- reporting the status of the last simple command, pipeline, conditional
  command, or subcommand executed by setting the environment variable $? to
  its "printed" value (e.g., the string "0" if the value is zero).

- directory manipulation:

    cd -p              Print current working directory to the standard output.
    cd DIRNAME         Change current working directory to DIRNAME.
    cd                 Change current working directory to $HOME, where HOME is
                         is an environment variable.

- other built-in commands:

    export NAME=VALUE  Set environment variable NAME to VALUE.
    export -n NAME     Delete environment variable NAME.

    wait               Wait until all child processes of the shell process have
                       died.  The status is 0.

Once the command line has been parsed, the exact semantics of bashLT are those
of bash, except for the status variable and the items noted below (see Note A).

The assignment is to write the function process() called by Hwk5/mainBashLT.c.
Thus you should use (that is, link with)

* Hwk5/mainBashLT.o, which is the main program (source is Hwk5/mainBashLT.c)

* Hwk5/parsley.o, which implements the function parse() that tokenizes and
  parses commands into syntactically correct trees of CMD structures (interface
  in Hwk2/parsley.h).

DO NOT MODIFY Hwk5/mainBashLT.c or Hwk2/parsley.h---the source code for
process() should be in a different file (or files).  To enforce this the test
script may delete/rename local files named mainBashLT.* or parsley.* before
trying to make your program.

Use the submit command to turn in your log file (see Homework #1) and the
source files for bashLT (including a Makefile, but not mainBashLT.* or
parsley.*) as assignment 5.

YOU MUST SUBMIT YOUR FILES (INCLUDING YOUR LOG FILE) AT THE END OF ANY SESSION
WHERE YOU WRITE OR DEBUG CODE, AND AT LEAST ONCE EVERY HOUR DURING LONGER
SESSIONS.  (All submissions are retained.)


Notes
~~~~~
1. [Matthew & Stones, Chapter 2] contains a more complete description of bash,
   including environment variables, the various I/O redirection operators,
   pipelines, command operators, command terminators, backgrounded commands,
   and subcommands; and "man bash" and "info bash" contain more information.
   But bear in mind that there are many features that bashLT does not
   implement.  Moreover, the behavior of bashLT may not match bash in some
   cases, including:

   a. bash has both shell variables and environment variables.  A command like

        % NAME=VALUE

      assigns VALUE to the shell variable NAME, and thereafter sequences like
      $NAME are expanded to VALUE as commands are parsed; while a command like

        % NAME=VALUE printenv NAME

      assigns VALUE to the environment variable NAME in the process that is
      executing printenv (but NAME is not defined in the parent shell).
      bashLT only supports the latter construct (but see the export command).

   b. In bash local variable definitions and redirection to/from a file may
      appear only after a subcommand, not before.

   c. bash allows multiple input and output redirections, with the last
      encountered taking precedence.  In bashLT the parse() function issues an
      error message instead.

   d. In bash $? is a shell variable rather than an environment variable, and
      its value may differ from the status that is reported by bashLT.

   e. bash and bashLT report the termination of background commands and reaped
      zombies differently.

   f. In bash the status of a pipeline is the status of the last stage unless
      the pipefail option is enabled.

   g. bash expands HERE documents; bashLT does not.

   h. bash does not implement the cd -p command.  (The bash dirs command is
      similar, but the pathname is "massaged".  For example, if you cd to
      /c/cs323 the directory name is /home/classes/cs323, not /c/cs323.

   i. In bash there are other forms of the cd, export, and wait commands.
      For example, the wait command takes command-line arguments that specify
      the children for which to wait (vs. all of them).

   A list of all known differences will be maintained at Hwk5/Differences.
   Please report any others that you discover.

2. An EOF (CTRL-D in column 1) makes bashLT exit since getline() returns NULL.

3. While executing a simple command, subcommand, pipeline, or conditional
   command, bashLT waits until it terminates, unless it has been backgrounded.
   bashLT ignores SIGINT interrupts while waiting, but child processes (other
   than subshells) do not so that they can be killed by a CTRL-C.  Hint:  Do
   not implement this feature until everything else seems to be working.

4. bashLT uses perror() (see "man perror") to report errors from system calls.
   It may ignore error returns from close(), dup(), dup2(), wait(), and
   waitpid(), but not from chdir(), execvp(), fork(), getcwd(), getwd(),
   get_current_dir_name().  open(), pipe(), putenv(), setenv(), and unsetenv().

   bashLT also reports an error if the number of arguments to a built-in
   command is incorrect or if an error is detected during execution of that
   command.  Execution of the command is skipped.

   All error messages are written to stderr and are one line long.

   As noted in the man page for perror() ("man 3 perror"):

     Note that errno is undefined after a successful system call or library
     function call:  this call may well change this variable, even though it
     succeeds, for example because it internally used some other library
     function that failed.  Thus, if a failing call is not immediately followed
     by a call to perror(), the value of errno should be saved.

   For the same reasons, perror() itself may change errno.

5. For simplicity, process() may ignore the possibility of error returns from
   malloc() and realloc().  However, all storage that it allocates must still
   be reachable after it returns to main().

6. The easiest way to implement subcommands (and possibly pipelines as well) is
   to use a subshell (i.e., a child of the shell process that is also running
   bashLT) to execute the subcommand and exit with its status before returning
   to main().

7. To use getenv(), setenv(), or putenv() to get or set environment variables,
   you must first #define _GNU_SOURCE since they are not part of the ANSI
   Standard.

8. When redirecting stdout to a file that does not exist, the new file should
   be readable and writable by the owner and readable and writable by the
   group and others.

9. You may find mkstemp(), tmpfile(), fileno(), rewind(), and/or lseek() useful
   when implementing HERE documents.  See their man pages for details.  Note:
   Deleting an open file does not close the associated file descriptor (see
   "man 2 unlink").

A. Hwk5/mainBashLT.c contains the function dumpTree() that dumps a parse tree
   of CMD structures.  If the environment variable DUMP_TREE exists, then
   bashLT dumps the parse tree using dumpTree()) .

B. Hwk5/process-stub.h contains the #include statements, the STATUS() macro,
   and the function prototype for process() from my solution.

C. No, you may not use system() or /bin/*sh.


Fine Points
~~~~~~~~~~~
1. For a simple command, the status is either the status of the program
   executed (*) or the value of the global variable errno if some system call
   failed while setting up to execute the program.

     (*) This status is normally the value WEXITSTATUS(status), where the
     variable status contains the value returned by the call to waitpid()
     that reported the death of the process.  However, for processes that are
     killed (that is, for which WIFEXITED(status) is false), that value may be
     zero.  Thus you should use the macro

       #define STATUS(x) (WIFEXITED(x) ? WEXITSTATUS(x) : 128+WTERMSIG(x))

     instead (see Hwk5/process-stub.h).

   For a pipeline, the status is that of the latest (that is, rightmost) stage
   to fail, or 0 if the status of every stage is true.  (This is the behavior
   of bash with the pipefail option enabled.)

   For a subcommand, the status is that of the command within the parentheses.

   For a sequence of commands, the status is that of the last command in the
   sequence.

   For a backgrounded command, the status in the invoking shell is 0.

   For a built-in command, the status is 0 if successful, the value of errno if
   a system call failed, and 1 otherwise (e.g., when the number of arguments is
   incorrect).

   Note that this status may differ from that reported by bash.  The command

     % /c/cs323/Hwk5/Tests/exit N

   will exit with the status N and may be useful for testing purposes.

2. In bash the status $? is an internal shell variable.  However, since bashLT
   does not have such variables, it has no means to check its value.  Thus in
   bashLT the status is an environment variable, which can be checked using
   /usr/bin/printenv (that is, "printenv ?").

3. The command separators && and || have the same precedence, lower than |, but
   higher than ; or &.

   && causes the simple command, subcommand, or pipeline following to be
   skipped if the current command exits with a nonzero status (= FALSE, the
   opposite of C).  The status of the skipped command is that of the current
   command.

   || causes the simple command, subcommand, or pipeline following to be
   skipped if the current command exits with a zero status (= TRUE, the
   opposite of C).  The status of the skipped command is that of the current
   command.

   Since && and || have equal precedence, in the command

     (1)$ A || B && C

   (where A, B, and C are simple commands, subcommands, or pipelines) the
   command A is always executed; if the status of A is zero, B is skipped and C
   is executed; and if the status of A is nonzero, B is executed and, if its
   status is zero, C is executed.

4. While the grammar captures the syntax of bash commands, it does not reflect
   the semantics of &, which specify that only the preceding <and-or> should be
   executed in the background, not the entire preceding <sequence>.  For
   example the command
    
     (1)$ A & B
    
   leads to the command tree
    
     CMD (Depth = 1):  SIMPLE,  argv[0] = A
     CMD (Depth = 0):  SEP_BG
     CMD (Depth = 1):  SIMPLE,  argv[0] = B
    
   which might suggest that the left child of a SEP_BG node is always executed in
   the background.  However, the command
    
     (1)$ A ; B &
    
   leads to the command tree
    
     CMD (Depth = 2):  SIMPLE,  argv[0] = A
     CMD (Depth = 1):  SEP_END
     CMD (Depth = 2):  SIMPLE,  argv[0] = B
     CMD (Depth = 0):  SEP_BG
    
   for which backgrounding the left child of the SEP_BG node is incorrect.
    
5. Anything written to stdout by a built-in command is redirectable.

   When a built-in command fails, bashLT continues to execute commands.

   When a built-in command is invoked within a pipeline, is backgrounded, or
   appears in a subcommand, that command has no effect on the parent shell.
   For example, the commands

     (1)$ cd /c/cs323 | ls

   and

     (2)$ ls & cd .. & ls

   do not work as you might otherwise expect.

5. When a redirection fails, bashLT does not execute the simple command or
   subcommand.  Its status is the errno of the system call that failed.

6. When bashLT runs a command in the background, it writes the process id to
   stderr using the format "Backgrounded: %d\n".

   bashLT reaps zombies periodically (that is, at least once during each call
   to process()) to avoid running out of processes.  When doing so, it writes
   the process id and status to stderr using the format "Completed: %d (%d)\n".
   The built-in wait command does the same.

7. To make programming bashLT slightly more challenging, it may not use wait()
   or any other system call (e.g., waitpid(-1,...)) that does not specify the
   pid of the process whose death it is awaiting.  That is, it may only use
   waitpid(pid,...) with a positive pid.  Unlike the usual tests, the test of
   this constraint will deduct 4 points from the total score if it detects a
   violation.

8. gdb can follow child processes.  See the gdb manual (link on the class web
   page) for details.


Limitations
~~~~~~~~~~~
The following features will be worth at most the number of points shown:
 * (20 points) pipelines
 * (12 points) &&, ||, and &
 * (12 points) subcommands
 * (12 points) the status variable $?
 * (12 points) the cd built-ins
 * (10 points) the other built-ins
 * (10 points) HERE documents
 * ( 6 points) Reaping zombies
Here "at most" signals a crude upper bound intended to give more flexibility
while developing the test script and to allow interactions among features.

                                                                CS-323-11/14/20