Assignment 6 - Neural Network for Solitaire Yahtzee
Objectives
- to gather good training data for supervised learning
- to choose a good representation of inputs to a neural network
- to train and use a neural network in an agent for a game
Assignment
Create a Python 3 module callednn
that creates and trains
a neural network for playing solitaire Yahtzee. This module must define
- a function called
train
that reads your training examples from standard input and returns a trained neural network; and - a class called
NNPolicy
that has an initializer that can take the object returned fromtrain
as its parameter and has two methodschoose_dice
andchoose_category
that meet the requirements of theyahtzee.evaluate_policy
function.
Design Process
There are four steps to creating and using the neural network.1. Gather training data
Training examples will be solitaire Yahtzee positions and the optimal choice (dice to keep for reroll positions, category to score in for end-of-turn positions) for those positions. You may use no more than 100,000 examples for training, so you will have to devise a way to sample the approximately 1.5 billion distinct positions in solitaire Yahtzee. Theyahtzee
module contains a function evaluate_policy
and a
class RandomPolicy
that you can use to generate
an initial set of training data. Once you have
chosen your training inputs, you can determine the optimal choices
with the query_optimal.sh
program in /c/cs474/hw6
.
That program reads positions from standard input where each position
is given as a comma-separated list of the state of the scoresheet
(as returned by YahtzeeScoresheet.as_state_string
),
the current roll, and the number of rerolls, and outputs the positions
with either the dice chosen to keep or the string representation of
the category to score in.
[jrg94@scorpion Yahtzee]$ /c/cs474/hw6/query_optimal.sh UP0,11223,2 UP0,11223,2,[22] 2 3 4 5 6 3K 4K FH SS LS C UP50,11166,2 2 3 4 5 6 3K 4K FH SS LS C UP50,11166,2,[111] 2 3 4 5 6 3K 4K FH SS LS UP60,11666,2 2 3 4 5 6 3K 4K FH SS LS UP60,11666,2,[11] 2 3 4 5 6 3K 4K FH SS LS UP60,11666,1 2 3 4 5 6 3K 4K FH SS LS UP60,11666,1,[11] 2 3 4 5 6 3K 4K FH SS LS UP60,11666,0 2 3 4 5 6 3K 4K FH SS LS UP60,11666,0,C 2 3 4 5 6 3K 4K FH SS LS C Y UP60,16666,2 2 3 4 5 6 3K 4K FH SS LS C Y UP60,16666,2,[1] 2 3 4 5 6 3K 4K FH SS LS C Y+ UP60,16666,2 2 3 4 5 6 3K 4K FH SS LS C Y+ UP60,16666,2,[6666]
2. Transform and label the training data
You will have to devise a way of representing the positions as inputs to your neural network and the optimal actions as its outputs. For the latter, we suggest categorical outputs where each class corresponds to a meta-action as defined by theyahtzee.YahtzeeRoll.select_for_xxx
methods. Those
meta-actions represent instructions such as "try for 1s", "try for 2s",
..., "try for 6s", "try for Yahtzee", "try for full house", "try for a
straight", or "try for chance", where what each meta-action means in each
position is defined using the hand-written rules implemented by the
functions. To convert the output of query_optimal.sh
to labelled output,
you will have to write code that determines which meta-action
best matches the game action selected by the optimal policy.
3. Design and train your neural network
You may use at most 300 neurons total across all your hidden layers (not countingDropout
layers) but the architecture may be whatever
you wish. The nn.train
function should read your
training examples from standard input, train your neural network,
and return it.
4. Implement a policy that uses your neural network
Create a class calledNNPolicy
that defines three functions
as described below.
- First,
an initializer that can take as its argument the object returned
by
train
. - Next, a method
choose_dice
that takes aYahtzeeScoresheet
, aYahtzeeRoll
, and the number of rerolls remaining, an returns a subroll of the given roll indicating which dice to keep. This method should encode the given game state as input suitable for your neural network, compute the output of your neural network, and interpret that output as a game action – which dice to keep. If your outputs correspond to meta-actions then you can use theyahtzee.YahtzeeRoll.select_for_xxx
methods to translate the meta-actions into game actions. - Finally, a method
choose_category
that takes aYahtzeeScoresheet
and aYahtzeeRoll
, and returns the unused category to score the roll in. This function should encode the game state as input to your neural network, compute its output, and interpret that output as an unused category. If your outputs are meta-actions then you will have to write your own code to translate the meta-action suggested by your neural network to which category to choose – there are no pre-defined meta-actions as there are for choosing which dice to keep. Beware that your neural network will sometimes suggest putting a zero in a category (which is the optimal decision in some cases) and may occasionally improperly suggest choosing a category that has already been used.
Additional Requirements
Your submission must run with thetestit
script in the
allotted time, which is 75 CPU-minutes. Because training and
evaluation use multiple CPUs, 75 minutes of CPU time under
testit
corresponds to about 12 minutes of clock time
when running the grading script directly.
Graduate Credit
For graduate credit, once you have optimized your neural network design and training plan for the standard assignment, check its performance when varying the resources allowed: for four different combinations of number of training examples and total hidden layer size, train and evaluate your neural network. Include at least two data points with more or less than 300 total hidden neurons and at least two data points with more or less than 100,000 training examples. Evaluate by averaging the score over at least 100,000 simulated games each. Submit your results in your time log.Files
In/c/cs474/hw6/Required
are three Python 3 modules:
-
yahtzee.py
that defines classes that implement the rules of Yahtzee; -
multiset.py
that defines classes required by theyahtzee
module; and -
test_yahtzee.py
that implements the test driver
Additionally, in /c/cs474/hw6/
there is a Java implementation of an optimal solitaire Yahtzee
player and a program to determine what choices the optimal policy makes:
-
yahtzee.jar
, which contains the compiled Java implementation; -
bonusyahtzeestate.dat
, a data file required by that implementation; and -
query_optimal.sh
, a shell script that reads solitaire Yahtzee game states and outputs those same positions with along with the optimal action.
Submissions
Submit just yournn.py
module along with any other supporting modules, your training data as training.dat
, and your log.