Assignment 2: Dynamic Programming

Objectives

Introduction

Shut the Box is a dice game for two or more players. The game is played with tiles numbered 1 through $n$ that are initially all in an upright, "open" position. Each player takes a turn trying to close the tiles by rolling the dice and then closing an open set of tiles whose sum equals the sum on the dice. For example, if the open tiles are $\{1, 2, 3, 4, 7\}$ and the roll total is 7, then the player has three choices: close the 7; close the 4 and 3; and close the 4, 2, and 1. Each player keeps rolling until they close all the tiles or are unable to make a sum equalling the roll total using the open tiles (for example, if the open tiles are $\{1, 2, 4, 9\}$ and the roll total is 8). If a player's turn ends with all tiles closed ("shutting the box"), then that player wins immediately – the other players do not get to take a turn. If a player's turn ends with some tiles open, then that player's score is the sum of the open tiles and the next player starts their turn with all tiles open. If the last player closes tiles so that the sum of the remaining tiles is less than that of all previous players' scores, then the game ends immediately and the last player wins. If no player shuts the box after all players have taken their turns, then the player with the lowest score wins.

There are many variations of Shut the Box with different $n$, and different rules for determining how many dice a player rolls in each position. We will consider only the two-player game with $n=9$, and with players rolling two fair six-sided dice when the sum of the open tiles is strictly greater than 6 and one die otherwise.

Assignment

Write a program called ShutTheBox that calculates the expected number of wins for a given player, assuming optimal play by each player starting from a given position and counting a tie as half a win for each player, and also determines the optimal move (which numbers to close) given a position and roll.

The parameters for a given run will be given as command-line arguments as follows.

All input will be as specified, the given position will be nonterminal (for --one, there will be tiles remaining; for --two, the sum of the remaining tiles will be at least player 1's score, which will be positive and possible; and for --move in either case, there will be a valid action to take), and the roll will be valid for the given set of remaining tiles (for example, the roll won't be 1 if the sum of the remaining tiles is greater than 6).

Your program should be able to perform any computation in no more than a few seconds.

Output

Expected wins should be printed to standard output with 6 digits after the decimal point, with the last digit rounded to the nearest value (so you can use the %.6f format specifier for functions that take printf-style formats). Optimal moves should be printed to standard output as a comma-separated, strictly increasing list of tiles to shut, delimited by square brackets.

Testing

Although not required by the specification, you may find it easier to test if you generalize your code to allow for versions of the game with different ranges of numbers on the boxes and dice with different numbers of sides, since you can then test on small games that require fewer calculations to solve by hand.

When your program is not working correctly, it is helpful to debug the calculations made for the first state (the one closest to the end of the game) for which you compute an incorrect value. It can be difficult to pinpoint that state if you don't know what the correct values are! To facilitate testing without giving the correct values, there is a verifier in /c/cs474/hw2/stb_verify.jar that will, as you send your computed values to it, output an error message for the first one that is incorrect. The error message will be either "missing successor" if you sent a state but have not previously sent some nonterminal successor of that state, or "incorrect value" if the value you have sent is incorrect for the corresponding state. There are also some error messages if your input is not in the correct format (see below). There is no output if all the values you sent are correct.

To use the verifier, write the states and corresponding values to a file as you compute them. The states and values must be output in an order so that if state $s'$ is reachable from $s$, then $s'$ must be written before $s$. The output for each state should give the open tiles in increasing order with no spaces between, player 1's score (for states where it is player 2's turn), and the corresponding untruncated value, all separated by commas. For example

1,1,0.5833333333333333
2,1,0.16666666666666666
2,2,0.5833333333333333
3,1,0.16666666666666666
3,2,0.16666666666666666
...
23456789,0.3476896905946085
123456789,0.5028103544024047
You need not include any terminal states, but the verifier will check them if you do. You can then check your computed values by running them through the verifier: if the above is saved in a file debug.out then you can run it through the verifier with java -jar /c/cs474/hw2/stb_verify.jar < debug.out. Your final submission should not write the debugging output to standard output, but it is OK if it writes it to standard error or writes it to a file.

Note that the verifier expects "states" to correspond to the input to the ShutTheBox program – states are before rolls, not before actions as in a traditional Markov Decision Process. If you use the traditional formulation, then somewhere you will still need to calculate the values that the verifier expects (the $q(s,a)$ values in the traditional setup), and you should output each one as you compute it.

If you do have an incorrect value, you will still have to compute the correct value by hand and then trace through your program to see where its calculations diverge from your own.

Examples

$ ./ShutTheBox --one --expect 123456789
0.502810
$ ./ShutTheBox --one --expect 146789
0.256254
$ ./ShutTheBox --one --move 146789 9
[9]
$ ./ShutTheBox --two --expect 123456789 8
0.381212
$ ./ShutTheBox --two --expect 12345689 41
1.000000
$ ./ShutTheBox --two --expect 13456789 43
0.986111
$ ./ShutTheBox --two --move 13456789 17 12
[3, 9]