Revision History | |
---|---|
Oct 3 | Replaced occurrences of some characters with
entities (e.g., "<det> " might have been changed to
"<det> "; most browsers will display
the first version incorrectly, taking it for a mistyped HTML tag or
perhaps an unknown XML tag. |
Oct 10 | Fixed dead link to CFG.scala. Got rid of one
kind of "zombie"; CFGempty no longer appears in
expansions; a zombie is just an expansion to Nil ,
which requires no special handling.
|
Oct 11 | Clarified definition of "path." Added several more files, listed at the end of question 2B |
Due: October 10, 2016
Homework for all problem sets will be submitted using the Canvas Assignments tool. There will be several attachments, consisting of Scala files, class files, text files, and runs of your program. For this assignment, attach the files PS3Q1.scala and PS3Q2.scala
Auxiliary files mentioned in the problem sets may be found at Files > Problem Set Software, as well as by following links from the online version of this problem set.
As spelled out at length in the lecture notes for lecture 2
the file Resources > Scripts contain some helpful shell
scripts for compiling and running Scala programs. The Scala
compiler usually generates
more class files than the Java compiler for a program of similar
size. To hide this clutter away, create a bin directory
below the directory your code appears in, then
use the shell scripts scc
and sca
instead of scalac
and scala
. (These scripts work in any Unix system,
including OS X and Linux.
Windows users should choose scc.bat
and sca.bat
instead.)
You should avoid using var
s as much as
possible
and use recursion and high-order function calls instead
of while
And never allow an object to return a mutable object! In fact,
don't use any mutable data structures at all, beyond
simple var
s.
1A Scala does not provide a built-in function for removing the first occurrence of an element from a list. If we write this as a subroutine, its signature would be
removeOne[A](x: A, l: List[A]): List[A]
Write it using recursion.
1B removeOne
doesn't lend
itself to being implemented using foldLeft
or foldRight
, because there's no way to tell the "fold"
variants to stop after finding an occurrence of the element to be
removed. It's easy to write removeAll(
x,l)
, which
removes all occurrences of x from l. (Its header is
the same as removeOne
's, modulo the name change.)
Write it, using either foldLeft
or foldRight
.
1C There are plenty of ways to
write removeOne
"functionally" without using
recursion.
I call your attention to the vast collection of library
facilities described
in the
library catalog for List
, and in particular to the
following
entries (in which tl
is "this
list," the zero'th argument in
any call to one of these methods, and A
is the type of the
elements of tl
). The entries are in alphabetical order:
find(p: (A) => Boolean): Option[A]
—
Finds and returns the first element x of tl
such
that x satisfies p
,
returning Some(
x)
,
or None
if tl
contains no such x.
indexOf(elem: A): Int
= tl.indexWhere(_ == elem)
indexWhere(p: (A) => Boolean): Int
— Like find
, except it returns –1 if no
such element exists, else k, the number of elements to the left
of the first such element.
prefixLength(p: (A) => Boolean): Int
— The length of the longest prefix of tl
such
that p
is true of the each element of the prefix.
If tl.forall(p)
, tl.prefixLength(p)
= tl.length
.
reverse_:::(l: List[A]): List[A]
— tl reverse_::: l
= tl.reverse ::: l
, but avoids
copying tl.reverse
, as the separate call
to :::
would be obliged to do.
(Try defining reverse_:::
yourself
and you'll see the trick. By the way, the name of this method shows
that when underscore is useful inside a name,
the sky doesn't fall.)
span(p: (A) => Boolean): (List[A], List[A])
—
breaks tl
at the first element that doesn't
satisfy p
, if any, and returns the resulting two pieces.
Either piece may be empty if the first element of
tl
fails to satisfy p
, or all elements of
tl
satisfy it.
splitAt(n: Int): (List[A], List[A])
—
breaks tl
at position n
, an integer, 0
≤ n
≤ tl.length
.
tails: Iterator[List[A]]
— If tl
= List(x0, x1,...,x
n–1)
then tl.tails
is an iterator that generates the
following lists in this order:
List(x0, x1,...,x
n–1),
List(x1,...,xn–1),
...,List(xn–1),
List()
An iterator behaves in most ways like a collection; but its primitive interface is through two methods:
hasNext
: true if the iterator can generate another element.
next()
: generates the next element and advances
the iterator. This side effect is why the empty parens are
used to indicate a call to iterator.next()
.
If there is no next element, next()
throws an exception.
Write a "functional" version of removeOne
that
doesn't call itself recursively or do iteration, just calls various
functions and binds various val
s.
(We'll use the name removeOneF
to avoid
collision with the name of the function in part
1A.) removeOneF
should copy as little list structure
as you can manage. See if you can get it to copy as little as
removeOne
, which presumably copies only up to the first
occurrence of x
, or copies the entire list
if x
doesn't occur in it.No-x
No one solution will use all of the built-in functions listed
above, and it's okay to use
other things you find in the List
catalog or elsewhere
in the Scala library .
Don't bother looking too hard at collections
outside List
, though.
Most of the entries you will find are identical
to entries in the List
catalog, being inherited from
the same more abstract class or trait (which accounts for the stilted
and inappropriate tone of many of the descriptions, which refer to, say,
"Traversable-once
s" instead of List
s or
whatever).
Place your answers (definitions of removeOne
,
removeAll
, and removeOneF
) in a
file PS3Q1.scala. You may want to include a
test App
in the file, but I'll rely mainly on whether the code
looks right.
2A An interesting exercise given a context-free grammar is to generate a stream, usually infinite, of all the nonterminal strings the grammar licenses. To make sure that every string in the language eventually occurs, it is a reasonable idea to generate in breadth-first order all the trees that the grammar can generate. Saying what "breadth-first" and "generate" mean here requires some technical detail that I don't want to go into.
So let's focus on a likely subproblem (a quite small subproblem), namely, the problem of finding a nonterminal to expand. To avoid redundancy, it's important to expand only the leftmost nonterminal, but also important to expand it all possible ways.
The data structures defining context-free grammars and syntax trees are given in the file CFG.scala . Here are the essentials:
sealed trait CFGsymbol { override def toString: String = { "<" + (this match { case nt: CFGnonterminal => "." + nt.name case t: CFGterminal => t.name case CFGempty => CFG.Epsilon }) + ">" } } case class CFGterminal(name: String) extends CFGsymbol { } case class CFGnonterminal(name: String) extends CFGsymbol { } // The empty string. Of use in the definition of some grammars. case object CFGempty extends CFGsymbol { }
What we have in mind for terminals is, for example, n
[oun], not
"lamp".
That will mean "all possible strings" will have strings
like `List(<det>, <n>, <v>, <det>, <noun>)
` --- which isn't a string at
all, but a list giving a possible sequence of terminals.
(The toString
method of CFGsymbol
turns
terminal n
into "<n>"
and
nonterminal NP
into "<.NP>"
.)
The terminals will match categories in the lexicon, eventually.
(I'm not sure why I'm using lower-case letters for terminals, but
it seems appropriate.)
To transform such a list into an actual string, you would have to select from the lexicon items of the appropriate categories; at that point, all the agreement problems we have postponed would have to be solved, so that we could get (e.g.) the string "The girl loved the tree" and avoid "A girls stare a trees" from the sequence above.
Next, we have rules, which require no explanation:
case class CFGrule(lhs: CFGnonterminal, rhs: List[CFGsymbol]) { override def toString: String = "Rule[" + lhs.toString + " –> " + rhs.map.toString.mkString(", ") + "]" }
and, finally, the grammar itself, which looks a lot like the formal definition of context-free grammar in SLP, sect. 12.2.1:
case class CFG(terms: List[CFGterminal], nonterms: List[CFGnonterminal], rules: List[CFGrule], startSym: CFGnonterminal) { }
(There is more stuff
in the file CFG.scala. For instance, the class CFG
contains the definition of lookupLeft: CFGnonterminal =>
List[CFGrule]
and supporting data
structures; lookupLeft
retrieves the rules that expand
a given nonterminal.)
A word on the empty grammatical category, CFGempty
:
Sometimes it is convenient to have a rule of the form "N→
ε", meaning "nonterminal N may be rewritten as
nothing." For example, α* might be defined by introducing a
new nonterminal N, defined by the rules
N → ε | N → α N |
and replacing α* with N.
and finally the data structures for syntax trees, which includes partially expanded syntax trees.
/** Syntax tree occurring at level `depth` in some bigger tree (unless * `depth` = 0, when this is the root). * maxDepth is the depth of the deepest node of the tree (wrt the * root) * If depth = maxDepth, then this tree is a leaf. */ abstract class CFGsyntaxTree(val depth: Int, val maxDepth: Int) { } /** A tree consisting of a single terminal symbol */ case class CFGtreeLeaf(val term: CFGterminal, d: Int) extends CFGsyntaxTree(d, d) { } /** A tree consisting of a single unexpanded nonterminal */ case class CFGtreeTip(val nonterm: CFGnonterminal, d: Int) extends CFGsyntaxTree(d, d) { } /** A tree consisting of nothing. Does *not* increase the maxDepth * of any tree it occurs in. */ case class CFGtreeEmpty(d: Int) extends CFGsyntaxTree(d, d-1) /** A tree recording the expansion of a nonterminal using a rule. * If d = m, then the rule's right-hand side is empty, and this * is a "terminal nonterminal" node. */ case class CFGexpansion(val nonterm: CFGnonterminal, val expansion: List[CFGsyntaxTree], val rule: CFGrule, d: Int, m: Int) extends CFGsyntaxTree(d, m) { }
Although there are other possible design decisions, I've chosen to
make each tree context-dependent, in the sense that it includes a
field depth
specifying how far it is from the root
(as measured in CFGexpansion
s). It also includes a field
maxDepth
giving the maximum depth of any of the nodes
beneath it, also with respect to the root.
Finally, the only role for the companion object of CFG
is to provide a place to put the constant Epsilon
:
object CFG { final val Epsilon = "\u03B5" // – Greek letter, lower-case }
(In the file, the object CFG
contains machinery for
checking that the grammar avoids
violating certain obvious rules, such as the terminals and
nonterminals being disjoint, and for printing out descriptions of
the violations to make debugging the grammar easier. )
At last we are in a position to say what the problem you are to
solve is. You are to write a function leftmostTip
that,
given a CFGsyntaxTree
, finds its leftmost
unexpanded leaf. We have to explain what we mean by "leaf"
and "unexpanded," and what the answer should look like.
There are three kinds of leaf in a CFGsyntaxTree
:
CFGtreeLeaf
(a single terminal symbol),
CFGtreeTip
(containing a nonterminal
symbol), and a
CFGexpansion
whose expansion
is
Nil
.
Obviously,
the unexpanded leaves are
the CFGtreeTip
s. The third category, an expansion to
nothing, is somewhat unexpected, but fits the definition of "leaf,"
because it has no children.
(Note that CFGempty
is not a terminal symbol, and so
cannot be the term
of a Leaf.)
We choose the obvious way of representing the answer returned by
leftmostTip
. A path is a list of
nonnegative Int
s representing positions in lists
in CFGexpansion
s. List(
k1, k2,
…, kn)
is the path to
the kn'th branch of
the kn–1'st branch of the
… k1'st branch of a tree.
Another way to look at it is this:
Nil
refers to the root of the tree. If
If P is a
path from the root of the tree to a CFGexpansion
node D1, with at least k+1
children, let ck be
the k'th child. Suppose
Q is the path (possibly empty) from ck to
a node D2. Then P :::
k ::
Q is the path from the root
to D2.
If we define
type Path = List[Int]
Then the header for leftmostTip
is
def leftmostTip(t: CFGsyntaxTree): Option[Path]
The reason to wrap an Option
around
the Path
is because the tree might not include any
tips, in which case the terminals at its leaves form a sentence.
2B This is a simple continuation of the
previous problem: Write a function expandPath
that
takes a CFGsyntaxTree
and a path to a tip of that tree
and returns a list of syntax trees, one for each possible expansion
of that tip. (Remember that a tip is a node with no
children labeled with a nonterminal symbol; which is distinct from
a Leaf, a
node labeled with a terminal symbol.)
The header of the function is
def expandPath(tr: CFGsyntaxTree, path: Path, grammar: CFG): List[CFGsyntaxTree]
Put your definitions for leftmostTip
and expandPath
in the file PS3Q2.scala.
The file TestGrammar mentioned in the original version of this file has been superseded. (Too many global variables leaked out of it.) Please take a look at the following files for better utilities and examples. Each file starts with a comment that explains its contents better than the brief descriptions in the following table.
File | Description |
---|---|
CFGutil.scala | Low-level utilities, really independent of the CFG application |
Sexp.scala | A useful abstraction we'll encounter again: A Lisp-like "S-expression." |
CFGaux.scala | Facilities for turning strings into CFG data structures and vice versa; and for retrieving information from grammars. |
CFGexample1.scala | Defines the grammar "S-> aSa, S->, S->b" (grammar1 ) |
PS3Q2Demo.scala | Calls your
functions leftmostTip and expandPath to
generate all the sentences
of CFGexample1.grammar1 . |