CS470/570 Problem Set 3: Syntactic Trees Etc.

Revision History

Oct 3 Replaced occurrences of some characters with entities (e.g., "`<det>`" might have been changed to "`<det>`"; most browsers will display the first version incorrectly, taking it for a mistyped HTML tag or perhaps an unknown XML tag.

Oct 10 Fixed dead link to CFG.scala. Got rid of one kind of "zombie"; `CFGempty` no longer appears in expansions; a zombie is just an expansion to `Nil`, which requires no special handling.

Oct 11 Clarified definition of "path." Added several more files, listed at the end of question 2B

Due: October 10, 2016

Revision History
Oct 3	Replaced occurrences of some characters with entities (e.g., "`<det>`" might have been changed to "`<det>`"; most browsers will display the first version incorrectly, taking it for a mistyped HTML tag or perhaps an unknown XML tag.
Oct 10	Fixed dead link to CFG.scala. Got rid of one kind of "zombie"; `CFGempty` no longer appears in expansions; a zombie is just an expansion to `Nil`, which requires no special handling.
Oct 11	Clarified definition of "path." Added several more files, listed at the end of question 2B

How and What to Submit:

Homework for all problem sets will be submitted using the Canvas Assignments tool. There will be several attachments, consisting of Scala files, class files, text files, and runs of your program. For this assignment, attach the files PS3Q1.scala and PS3Q2.scala

Auxiliary files mentioned in the problem sets may be found at Files > Problem Set Software, as well as by following links from the online version of this problem set.

As spelled out at length in the lecture notes for lecture 2 the file Resources > Scripts contain some helpful shell scripts for compiling and running Scala programs. The Scala compiler usually generates more class files than the Java compiler for a program of similar size. To hide this clutter away, create a bin directory below the directory your code appears in, then use the shell scripts scc and sca instead of scalac and scala. (These scripts work in any Unix system, including OS X and Linux. Windows users should choose scc.bat and sca.bat instead.)

You should avoid using vars as much as possible and use recursion and high-order function calls instead of while

And never allow an object to return a mutable object! In fact, don't use any mutable data structures at all, beyond simple vars.

1A Scala does not provide a built-in function for removing the first occurrence of an element from a list. If we write this as a subroutine, its signature would be

  removeOne[A](x: A, l: List[A]): List[A]

Write it using recursion.

1B removeOne doesn't lend itself to being implemented using foldLeft or foldRight, because there's no way to tell the "fold" variants to stop after finding an occurrence of the element to be removed. It's easy to write removeAll(x,l), which removes all occurrences of x from l. (Its header is the same as removeOne's, modulo the name change.) Write it, using either foldLeft or foldRight.

1C There are plenty of ways to write removeOne "functionally" without using recursion. I call your attention to the vast collection of library facilities described in the library catalog for List, and in particular to the following entries (in which tl is "this list," the zero'th argument in any call to one of these methods, and A is the type of the elements of tl). The entries are in alphabetical order:

find(p: (A) => Boolean): Option[A] — Finds and returns the first element x of tl such that x satisfies p, returning Some(x), or None if tl contains no such x.
indexOf(elem: A): Int = tl.indexWhere(_ == elem)
indexWhere(p: (A) => Boolean): Int — Like find, except it returns –1 if no such element exists, else k, the number of elements to the left of the first such element.
prefixLength(p: (A) => Boolean): Int — The length of the longest prefix of tl such that p is true of the each element of the prefix. If tl.forall(p), tl.prefixLength(p) = tl.length.
reverse_:::(l: List[A]): List[A] — tl reverse_::: l = tl.reverse ::: l, but avoids copying tl.reverse, as the separate call to ::: would be obliged to do. (Try defining reverse_::: yourself and you'll see the trick. By the way, the name of this method shows that when underscore is useful inside a name, the sky doesn't fall.)
span(p: (A) => Boolean): (List[A], List[A]) — breaks tl at the first element that doesn't satisfy p, if any, and returns the resulting two pieces. Either piece may be empty if the first element of tl fails to satisfy p, or all elements of tl satisfy it.
splitAt(n: Int): (List[A], List[A]) — breaks tl at position n, an integer, 0 ≤ n ≤ tl.length.
tails: Iterator[List[A]] — If tl = List(x₀, x₁,...,x_n–1) then tl.tails is an iterator that generates the following lists in this order:
```
  
     List(x₀, x₁,...,x_n–1), 
     List(x₁,...,x_n–1), 
       ...,
     List(x_n–1),
     List()
```
An iterator behaves in most ways like a collection; but its primitive interface is through two methods:
1. hasNext: true if the iterator can generate another element.
2. next(): generates the next element and advances the iterator. This side effect is why the empty parens are used to indicate a call to iterator.next(). If there is no next element, next() throws an exception.

Write a "functional" version of removeOne that doesn't call itself recursively or do iteration, just calls various functions and binds various vals. (We'll use the name removeOneF to avoid collision with the name of the function in part 1A.) removeOneF should copy as little list structure as you can manage. See if you can get it to copy as little as removeOne, which presumably copies only up to the first occurrence of x, or copies the entire list if x doesn't occur in it.^No-x

No one solution will use all of the built-in functions listed above, and it's okay to use other things you find in the List catalog or elsewhere in the Scala library . Don't bother looking too hard at collections outside List, though. Most of the entries you will find are identical to entries in the List catalog, being inherited from the same more abstract class or trait (which accounts for the stilted and inappropriate tone of many of the descriptions, which refer to, say, "Traversable-onces" instead of Lists or whatever).

Place your answers (definitions of removeOne, removeAll, and removeOneF) in a file PS3Q1.scala. You may want to include a test App in the file, but I'll rely mainly on whether the code looks right.

2A An interesting exercise given a context-free grammar is to generate a stream, usually infinite, of all the nonterminal strings the grammar licenses. To make sure that every string in the language eventually occurs, it is a reasonable idea to generate in breadth-first order all the trees that the grammar can generate. Saying what "breadth-first" and "generate" mean here requires some technical detail that I don't want to go into.

So let's focus on a likely subproblem (a quite small subproblem), namely, the problem of finding a nonterminal to expand. To avoid redundancy, it's important to expand only the leftmost nonterminal, but also important to expand it all possible ways.

The data structures defining context-free grammars and syntax trees are given in the file CFG.scala . Here are the essentials:

sealed trait CFGsymbol {
  override def toString: String = {
    "<" +
    (this match {
       case nt: CFGnonterminal => "." + nt.name
       case t: CFGterminal => t.name
      case CFGempty => CFG.Epsilon
     }) +
    ">"
  }
}


case class CFGterminal(name: String) extends CFGsymbol {
}

case class CFGnonterminal(name: String) extends CFGsymbol {
}

// The empty string.  Of use in the definition of some grammars.
case object CFGempty extends CFGsymbol {
}

What we have in mind for terminals is, for example, n[oun], not "lamp". That will mean "all possible strings" will have strings like `List(<det>, <n>, <v>, <det>, <noun>)` --- which isn't a string at all, but a list giving a possible sequence of terminals. (The toString method of CFGsymbol turns terminal n into "<n>" and nonterminal NP into "<.NP>".) The terminals will match categories in the lexicon, eventually. (I'm not sure why I'm using lower-case letters for terminals, but it seems appropriate.)

To transform such a list into an actual string, you would have to select from the lexicon items of the appropriate categories; at that point, all the agreement problems we have postponed would have to be solved, so that we could get (e.g.) the string "The girl loved the tree" and avoid "A girls stare a trees" from the sequence above.

Next, we have rules, which require no explanation:

case class CFGrule(lhs: CFGnonterminal, rhs: List[CFGsymbol]) {
  override def toString: String =
    "Rule[" + lhs.toString + " –> " + rhs.map.toString.mkString(", ") + "]"
}

and, finally, the grammar itself, which looks a lot like the formal definition of context-free grammar in SLP, sect. 12.2.1:

case class CFG(terms: List[CFGterminal], 
               nonterms: List[CFGnonterminal], 
               rules: List[CFGrule],
               startSym: CFGnonterminal) {
}

(There is more stuff in the file CFG.scala. For instance, the class CFG contains the definition of lookupLeft: CFGnonterminal => List[CFGrule] and supporting data structures; lookupLeft retrieves the rules that expand a given nonterminal.)

A word on the empty grammatical category, CFGempty: Sometimes it is convenient to have a rule of the form "N→ ε", meaning "nonterminal N may be rewritten as nothing." For example, α* might be defined by introducing a new nonterminal N, defined by the rules

N → ε

N → α N

and replacing α* with N.

and finally the data structures for syntax trees, which includes partially expanded syntax trees.

/** Syntax tree occurring at level `depth` in some bigger tree (unless
  * `depth` = 0, when this is the root).
  * maxDepth is the depth of the deepest node of the tree (wrt the
  * root)
  * If depth = maxDepth, then this tree is a leaf.
  */
abstract class CFGsyntaxTree(val depth: Int, val maxDepth: Int) {
}

/** A tree consisting of a single terminal symbol
  */
case class CFGtreeLeaf(val term: CFGterminal, d: Int) extends
    CFGsyntaxTree(d, d) {
}

/** A tree consisting of a single unexpanded nonterminal
  */
case class CFGtreeTip(val nonterm: CFGnonterminal, d: Int) extends
     CFGsyntaxTree(d, d) {
}

/** A tree consisting of nothing.  Does *not* increase the maxDepth
  * of any tree it occurs in.
  */
case class CFGtreeEmpty(d: Int) extends CFGsyntaxTree(d, d-1)

/** A tree recording the expansion of a nonterminal using a rule.
  * If d = m, then the rule's right-hand side is empty, and this
  * is a "terminal nonterminal" node.
  */
case class CFGexpansion(val nonterm: CFGnonterminal, 
                        val expansion: List[CFGsyntaxTree],
                        val rule: CFGrule,
                        d: Int,
                        m: Int) extends
    CFGsyntaxTree(d, m) {
}

Although there are other possible design decisions, I've chosen to make each tree context-dependent, in the sense that it includes a field depth specifying how far it is from the root (as measured in CFGexpansions). It also includes a field maxDepth giving the maximum depth of any of the nodes beneath it, also with respect to the root.

Finally, the only role for the companion object of CFG is to provide a place to put the constant Epsilon:

object CFG {

  final val Epsilon = "\u03B5" // – Greek letter, lower-case
}

(In the file, the object CFGcontains machinery for checking that the grammar avoids violating certain obvious rules, such as the terminals and nonterminals being disjoint, and for printing out descriptions of the violations to make debugging the grammar easier. )

At last we are in a position to say what the problem you are to solve is. You are to write a function leftmostTip that, given a CFGsyntaxTree, finds its leftmost unexpanded leaf. We have to explain what we mean by "leaf" and "unexpanded," and what the answer should look like.

There are three kinds of leaf in a CFGsyntaxTree:

A Leaf (with a capital "L"): a CFGtreeLeaf (a single terminal symbol),
A tip: CFGtreeTip (containing a nonterminal symbol), and a
A zombie: A CFGexpansion whose expansion is Nil.

Obviously, the unexpanded leaves are the CFGtreeTips. The third category, an expansion to nothing, is somewhat unexpected, but fits the definition of "leaf," because it has no children. (Note that CFGempty is not a terminal symbol, and so cannot be the term of a Leaf.)

We choose the obvious way of representing the answer returned by leftmostTip. A path is a list of nonnegative Ints representing positions in lists in CFGexpansions. List(k₁, k₂, …, k_n) is the path to the k_n'th branch of the k_n–1'st branch of the … k₁'st branch of a tree.

Another way to look at it is this: Nil refers to the root of the tree. If If P is a path from the root of the tree to a CFGexpansion node D₁, with at least k+1 children, let c_k be the k'th child. Suppose Q is the path (possibly empty) from c_k to a node D₂. Then P ::: k :: Q is the path from the root to D₂.

If we define

type Path = List[Int]

Then the header for leftmostTip is

     def leftmostTip(t: CFGsyntaxTree): Option[Path]

The reason to wrap an Option around the Path is because the tree might not include any tips, in which case the terminals at its leaves form a sentence.

2B This is a simple continuation of the previous problem: Write a function expandPath that takes a CFGsyntaxTree and a path to a tip of that tree and returns a list of syntax trees, one for each possible expansion of that tip. (Remember that a tip is a node with no children labeled with a nonterminal symbol; which is distinct from a Leaf, a node labeled with a terminal symbol.) The header of the function is

   def expandPath(tr: CFGsyntaxTree, path: Path, grammar: CFG): 
            List[CFGsyntaxTree]

Put your definitions for leftmostTip and expandPath in the file PS3Q2.scala.

The file TestGrammar mentioned in the original version of this file has been superseded. (Too many global variables leaked out of it.) Please take a look at the following files for better utilities and examples. Each file starts with a comment that explains its contents better than the brief descriptions in the following table.

File	Description
`CFGutil.scala`	Low-level utilities, really independent of the CFG application
`Sexp.scala`	A useful abstraction we'll encounter again: A Lisp-like "S-expression."
`CFGaux.scala`	Facilities for turning strings into CFG data structures and vice versa; and for retrieving information from grammars.
`CFGexample1.scala`	Defines the grammar "S-> aSa, S->, S->b" (`grammar1`)
`PS3Q2Demo.scala`	Calls your functions `leftmostTip` and `expandPath` to generate all the sentences of `CFGexample1.grammar1`.

End Notes

Note No-x
It is possible to avoid copying the list if x doesn't occur in it, but let's assume whoever uses removeOne is pretty sure x is there. [Back]