A rubric in this context is an explanation of how a problem set was graded. The solution may be found on the course website.
Even when a function has type parameters in its header,
it is usually not necessary to include values for them in a
function call. That is, you can
write importantFunc(...)
instead
of importantFunc[...](...)
. The compiler will infer
the missing types from context. The exceptions are corner cases
like List()
. Sometimes you have to help the compiler
out by writing List[Cookie]()
, or it will infer that
it's a list of Nothing
, which results in weird error messages.
[Repeated from PS1 rubric (I realize students hadn't seen the solutions to PS1 when they turned in PS2)] — Scala style calls for indentation by 2 spaces; if your editor has default indentation = 4 spaces, or, worse yet, by a tab character, change the setting. If indentation by tab is wired in, find a new editor. (Tab stops are set by the reader of the code, not the writer; it's unpredictable what the code will look like to the reader if it contains tabs.)
[Repeated from PS1 rubric] —
Avoid using the return
statement or any other "nonlocal goto" except
throwing exceptions. The correct way (as far as CS470/570 is
concerned) to exit a loop "prematurely" is to add a (typically
Boolean) variable whose value may be set when the condition requiring
premature exit is detected, and read in the while
condition.
Make sure that subexpressions are indented so that every line starts on a column to the right of the column the containing expression starts in. Here's what not to do:
case ... => if (...) { ... } — These lines violate the rule. The "}
" else — and "else
" start in the same column as "case
". Better: case ... => if (...) { ... } else
It's even clearer with a pair of braces around the
if
-expression but they're not necessary.
Exactly how many points were taken off for violating the following "suggestions" was a matter of deciding how obvious and damaging the violation was.
It's useful to daydream about different cases when first
approaching a problem, but don't automatically start writing
an "if – then – else
" or "case
… match { case … => … }
" without
being prepared to throw it all away. The problem isn't that you
won't be able to finish having started this way; the problem is that
you will always be able to finish. That's because if a
problem has a solution, then that problem + some assumption A also
has a
solution.Theorem
So if you write
if
(
A)
CT else
CF
then you'll be able to write CT, with the added information that A, and you'll be able to write CF, with the added information that ¬A. But if you start to notice big overlaps between CT and CF, you should stop and consider whether the time to check whether A is true is at the beginning, or at some later point, or never. You don't have to do this stopping and considering, but if you don't you'll have a clumsy, overstuffed program. Especially if you repeat this mistake every chance you get.
Feel free to create as many functions as you need.
You may come to the conclusion from early examples that recursive
functions never need auxiliary functions. They often do. In fact, it's
a healthy design technique, whether the function you're writing is
recursive or not,
to imagine an auxiliary function you need, and write a
quick informal definition of what it's supposed to do (an
argument-assumptions/result-requirements spec), not how it does it.
Continue writing your original function assuming you have this
function, then go back and implement it later.
(And please, give functions names that say what they do. Don't name
your functions helperFcn1
, helperFcn2
, etc.)
At some point you may realize that the function you need is one you're
already defining. Eureka! Feel free to call it; you've found
two ends of a string, and tying them together will
… yes, wrap up the solution recursively.
But again, don't forget to revisit guesses you made early on in a program's development. If an auxiliary function F turns out to do almost nothing besides call some other function G, then perhaps the "almost nothing" bit can be moved into G, or moved to the callers of F, and F can be eliminated. Maybe not; maybe even though F does very little, that very little can't be moved to G (because G has other callers, say), and can't be moved to the caller(s) of F (because it would spoil their natural beauty). Leave it alone, then! Just make sure that if it survives it has an informative name! Perhaps its original informative name you can now see isn't quite right. Change it while you have a chance.
Use immutable data structures.
This is the crucial respect in which a program using immutable data
structures differs from a program using mutable ones:
If procedure p makes a change to a mutable data structure x and
returns n values (i.e., an
n-tupleTiny
tuples (
v1, …, vn)
), the
equivalent procedure p' making a "change" to the immutable version of
x must return an extra value: (
v1, … {}, vn,
x')
, where x' is the new version of x. (It doesn't
have to be the last value returned, of course, but if it isn't
returned, then the "change" accomplishes nothing.)
In the simplest case, p might
have changed its x and returned no value (in Scala,
returned ()
; the compiler is friendly enough not to
require you to ensure that, if p has return type Unit
,
the last expression in p actually evaluates to ()
,
but any other value is discarded
and ()
will be returned). The new version, p', returns one value, x'.
If p changes k data structures, it will have to return k new
values besides the previous ones it returned.
This may seem like a big hassle. It did to me when I first heard the idea. And it seems inefficient. If you make an actual change to a data structure, you don't have to copy big pieces of it. I have changed my mind. For one thing, the cost of copying depends on how the abstraction is implemented. Immutable hash tables (or "hash maps") are not simple arrays, so you're not copying an array every time you copy one with a small change. They're implemented so they share some structure with the map they're a copy of. (I don't know the details.)
The main advantage of immutable data structures is that you get to think of them as constants. This frees your mind dramatically. For instance, if you want to compare some property of a data structure before and after changing it, with a mutable data structure you have to think, often very carefully, about when to measure and how much to measure. With an immutable data structure, that issue goes away. You have one variable whose value is the earlier version, and one whose value is the later one. They can't possibly "change out from under you." If the measurement is rarely needed, then you don't make it unless you have to. It's just a question of when the garbage collector gets its hands on the earlier version (by which colorful phrase I mean, "when no pointers to it are still accessible by the program").
A recursive function almost always has at least one parameter ("formal argument"). If you are going to do something to a recursively defined data structure, and that something involves looking at every level of the recursively defined structure, then that something is probably a recursive function one of whose parameters is bound to a piece of the recursively defined structure.
However, this must be qualified by one caveat: If the class
definition for the subpieces is something you controlControl (it might be
the class for the object you started with), then another option is to make
the recursive function a member of that class definition.
Then instead of function(subpiece, ...)
you would
write subpiece.function(...)
. Knowing when to
use one approach rather than the other is tricky, but if in doubt
use the first version.
In the case of PS2Q1
, though, the second option is
not possible. An instance of a PS2Q1
has no subpieces
that are also instances of PS2Q1
. Its subpieces are
lists of integers. The class List
is not under your
control. If you want to write a recursive function to do one of the
tasks for question 2, define an auxiliary function. You can define
it any convenient place. One such place is the
companion object for the class. Another is the body of the
function that calls it.
A good sign of trouble is if you are building a new object
just so you can call your recursive function. You're writing
something like ...+ new Foo(...).recursiveFunc(...)
or
val x1 = ... val f1 = new Foo(...) x1 + f1.recursiveFunc(...)
Don't casually compute
the length
of a linked list:
Avoid calculating the size of a list unless you really need it, and
be suspicious of any code that computes the length of the same list
repeatedly. Linked lists are wonderful data structures, but they're
not good at accessing elements beyond the first two or three, and
computing the length requires iterating down the entire list.
(The size
of a collection is the same as
its length
). Be especially suspicious of repeated calls
to length
in a loop or recursion. Even if the list is
getting smaller with each recursive call, the resulting algorithm
will still have time complexity proportional
to N2.
A test such as l.length > 0
is really a waste.
If the length is actually > 0, the entire march down the list is
painful to contemplate, because the exact number computed is almost
certainly useless. Just say l.nonEmpty
or just l !=
Nil
. Better yet, use pattern matching.
Another idea, occasionally useful, is to call the
function shorter
: shorter(
l, k)
is t
if l's length is ≤ k. (You'll have to
write shorter
yourself; it's not built in.) Instead of
writing l.length > 0
, write !shorter(l,1)
.
I took no points off for this mistake, because I made it myself!
The original solution for 1C was correct, but took the length of every
tail of the list nums
.
:::
or ++
or +
Append versions):
The Scala collections API, and most other functional languages,
make it too easy to append two linked lists.
You write l1 :::
l2 and get a nice
functional feeling. It doesn't look much worse than
x1 ::
l2. But it requires
copying l1, which takes time proportional to the
length of l1. Most collections, not just
linked lists, have this problem (queues being an obvious exception).
If the call to ":::
" is in a loop, the problem is
compounded. Suppose you are adding to a list l
by collecting items
in a loop of some sort, and every time you find an
item w
that is worth keeping
you execute
l = l ::: List(w)
Then if the length of l
is n by the time you're
done, the time required to execute the loop is proportional
to n2. A simple change to
l = w :: l
reduces the time to O(n). Of course, now the elements will be in the opposite order when you're done collecting them. If that matters, as it often does, just do this after the loop is through collecting:
l = l.reverse
This operation is again O(n). Instead of taking time proportional to n2, the entire thing takes time proportional to n.
Note Theorem
How can it be true that adding some random assumption A
to what you know to hold at a certain point in a programming problem never causes
it to become unsolvable? I had in mind a certain definition of
"solve": a program solves a problem if, whenever the given input assumptions
are true, the program returns a result satisfying the output
requirements. Under this definition, if a problem's input assumptions never
hold, then any program will solve it. Adding an assumption
never makes a problem harder, and may make it easier, perhaps even vacuously easy.
[Back]
Note Tiny tuples
A 1-tuple specifying a single value (v)
is
equivalent to v. A 0-tuple ()
is just the unique
entity of type Unit
.
[Back]
Note Control
Scala has tricks that allow you to make it look like
you can add stuff to a class you don't control, even one built into
the language, like List
. These tricks are beyond our
scope in this course.
[Back]
Note Append versions
The difference between x ++
y and
x :::
y is that the former has the type
of x and the latter has the type of y.
("s +
x" where s is
a String
is synonymous with
"s ++
x.toString
, so it
behaves like ++
.)
Given the
rules about Scala operators, the asymmetry between "++
"
and ":::
" is sort of inevitable. Operators
are interpreted as member functions of their left argument,
unless their names end in ":
".
So x :::
y
means y.:::(
x)
;
whereas x++
y
means x.++(
y)
. It's not
surprising that a member of a collection class would create another
object of the same type. If x and y are the same type,
which is the usual case,
then the two expressions are synonymous. If they are not the same
type, it is usually prudent to
convert y to the type of x or vice versa
before appending anyway,
just so the reader doesn't have to fill in the blanks to figure out
all the computations required to evaluate the expressions.
No question about it, at first the news that x :::
y
means y.:::(
x)
is perplexing.
It looks wrong, but that's because one doesn't normally think about
the fact that the job of ":::
" is to copy its argument
(x), converting to the type of this
(i.e., its zeroth
argument, y in the case at hand),
and
make the last tail of the copy = this
, if it has
a last tail. If it's of a class for which "last tail" is meaningless,
consult your spiritual advisor.
How does that work with other operators? If o is some
operator, doesn't x1 o
x2 :::
x3 come out
meaning x1.
o(
x3.:::(
x2))
? Which is not what it looke like.
Yes, but only if o has lower precedence than :::
.
If o is :::
, then it associates to the
right,
so x1 :::
x2 :::
x3
= x1 :::
(x2 :::
x3)
= (
x3.:::(
x2)).:::(
x1))
.
If o is right-associative and has the same precedence as
":::
", which, as it happens, "::
" does, then
something very similar happens: x1 ::
x2 :::
x3
= x3.:::(
x2).::(
x1)
,
which looks odd until you get used to it.
Further, even less important, notes: The precedence of an operator
depends on its first character. Operators starting with
"arithmetic" characters (+ - % / *
) have higher
precedence than most (exceptions: ~ @ #
). So
x1 ++
x2 :::
x3
= (
x1 ++
x2)
:::
x3
= x3.:::(
x1.++(
x2))
,
evaluation of which requires copying the cells of x1
twice, which the earlier versions did not do.
[Back]