Assignment 6 - Neural Network for Solitaire Yahtzee

Objectives

to gather good training data for supervised learning
to choose a good representation of inputs to a neural network
to train and use a neural network in an agent for a game

Assignment

Create a Python 3 module called nn that creates and trains a neural network for playing solitaire Yahtzee. This module must define

a function called train that reads your training examples from standard input and returns a trained neural network; and
a class called NNPolicy that has an initializer that can take the object returned from train as its parameter and has two methods choose_dice and choose_category that meet the requirements of the yahtzee.evaluate_policy function.

Design Process

There are four steps to creating and using the neural network.

1. Gather training data

Training examples will be solitaire Yahtzee positions and the optimal choice (dice to keep for reroll positions, category to score in for end-of-turn positions) for those positions. You may use no more than 100,000 examples for training, so you will have to devise a way to sample the approximately 1.5 billion distinct positions in solitaire Yahtzee. The yahtzee module contains a function evaluate_policy and a class RandomPolicy that you can use to generate an initial set of training data. Once you have chosen your training inputs, you can determine the optimal choices with the query_optimal.sh program in /c/cs474/hw6. That program reads positions from standard input where each position is given as a comma-separated list of the state of the scoresheet (as returned by YahtzeeScoresheet.as_state_string), the current roll, and the number of rerolls, and outputs the positions with either the dice chosen to keep or the string representation of the category to score in.

[jrg94@scorpion Yahtzee]$ /c/cs474/hw6/query_optimal.sh
UP0,11223,2
UP0,11223,2,[22]
2 3 4 5 6 3K 4K FH SS LS C UP50,11166,2
2 3 4 5 6 3K 4K FH SS LS C UP50,11166,2,[111]
2 3 4 5 6 3K 4K FH SS LS UP60,11666,2
2 3 4 5 6 3K 4K FH SS LS UP60,11666,2,[11]
2 3 4 5 6 3K 4K FH SS LS UP60,11666,1
2 3 4 5 6 3K 4K FH SS LS UP60,11666,1,[11]
2 3 4 5 6 3K 4K FH SS LS UP60,11666,0
2 3 4 5 6 3K 4K FH SS LS UP60,11666,0,C
2 3 4 5 6 3K 4K FH SS LS C Y UP60,16666,2
2 3 4 5 6 3K 4K FH SS LS C Y UP60,16666,2,[1]
2 3 4 5 6 3K 4K FH SS LS C Y+ UP60,16666,2
2 3 4 5 6 3K 4K FH SS LS C Y+ UP60,16666,2,[6666]

2. Transform and label the training data

You will have to devise a way of representing the positions as inputs to your neural network and the optimal actions as its outputs. For the latter, we suggest categorical outputs where each class corresponds to a meta-action as defined by the yahtzee.YahtzeeRoll.select_for_xxx methods. Those meta-actions represent instructions such as "try for 1s", "try for 2s", ..., "try for 6s", "try for Yahtzee", "try for full house", "try for a straight", or "try for chance", where what each meta-action means in each position is defined using the hand-written rules implemented by the functions. To convert the output of query_optimal.sh to labelled output, you will have to write code that determines which meta-action best matches the game action selected by the optimal policy.

3. Design and train your neural network

You may use at most 300 neurons total across all your hidden layers (not counting Dropout layers) but the architecture may be whatever you wish. The nn.train function should read your training examples from standard input, train your neural network, and return it.

4. Implement a policy that uses your neural network

Create a class called NNPolicy that defines three functions as described below.

First, an initializer that can take as its argument the object returned by train.
Next, a method choose_dice that takes a YahtzeeScoresheet, a YahtzeeRoll, and the number of rerolls remaining, an returns a subroll of the given roll indicating which dice to keep. This method should encode the given game state as input suitable for your neural network, compute the output of your neural network, and interpret that output as a game action – which dice to keep. If your outputs correspond to meta-actions then you can use the yahtzee.YahtzeeRoll.select_for_xxx methods to translate the meta-actions into game actions.
Finally, a method choose_category that takes a YahtzeeScoresheet and a YahtzeeRoll, and returns the unused category to score the roll in. This function should encode the game state as input to your neural network, compute its output, and interpret that output as an unused category. If your outputs are meta-actions then you will have to write your own code to translate the meta-action suggested by your neural network to which category to choose – there are no pre-defined meta-actions as there are for choosing which dice to keep. Beware that your neural network will sometimes suggest putting a zero in a category (which is the optimal decision in some cases) and may occasionally improperly suggest choosing a category that has already been used.

Additional Requirements

Your submission must run with the testit script in the allotted time, which is 75 CPU-minutes. Because training and evaluation use multiple CPUs, 75 minutes of CPU time under testit corresponds to about 12 minutes of clock time when running the grading script directly.

Graduate Credit

For graduate credit, once you have optimized your neural network design and training plan for the standard assignment, check its performance when varying the resources allowed: for four different combinations of number of training examples and total hidden layer size, train and evaluate your neural network. Include at least two data points with more or less than 300 total hidden neurons and at least two data points with more or less than 100,000 training examples. Evaluate by averaging the score over at least 100,000 simulated games each. Submit your results in your time log.

Files

In /c/cs474/hw6/Required are three Python 3 modules:

yahtzee.py that defines classes that implement the rules of Yahtzee;
multiset.py that defines classes required by the yahtzee module; and
test_yahtzee.py that implements the test driver

Additionally, in /c/cs474/hw6/ there is a Java implementation of an optimal solitaire Yahtzee player and a program to determine what choices the optimal policy makes:

yahtzee.jar, which contains the compiled Java implementation;
bonusyahtzeestate.dat, a data file required by that implementation; and
query_optimal.sh, a shell script that reads solitaire Yahtzee game states and outputs those same positions with along with the optimal action.

Submissions

Submit just your nn.py module along with any other supporting modules, your training data as training.dat, and your log.