Assignment 6 - Neural Network for Solitaire Yahtzee
Objectives
- to gather good training data for supervised learning
- to choose a good representation of inputs to a neural network
- to train and use a neural network in an agent for a game
Assignment
Create a Python 3 module callednn
that creates and trains
a neural network for playing solitaire Yahtzee. There is an incomplete
version of that module in /c/cs474/hw6/Optional
.
The completed module must define
a class called ANNPolicy
that contains the following methods:
- an initializer that takes an object that encapsulates the rules of Yahtzee;
- a method
train
, which reads your training examples from a file and trains your neural network; - methods
save
andload
to load and save trained neural networks to and from files; - a method
choose_category
, which takes a position, feeds it through the neural network, and uses the output to select and return an unused category to play the roll in; - a method
choose_dice
, which takes a position, feeds it through the neural network, and used the output to select and return a subset of the roll to keep before the next roll. - methods
start_turn
andsee_roll
, which do nothing but are required for the generic policy interface; - a class method
add_arguments
that takes anargparse
object and that you can use to set parameters for your ANN architecture and training (see the skeleton code for instructions and examples).
Supporting Code
The modules in/c/cs474/hw6/Required
and
/c/cs474/hw6/Optional
define classes that model
solitaire Yahtzee. Of particular use are
-
DiceRoll
, defined in theroll
module, instances of which represent outcomes of rolling dice; -
StandardYahtzee
, defined in theyahtzee
module, which defines objects that represent the rules of solitaire Yahtzee; -
StandardYahtzee.Anchor
, instances of which represent game state at the beginning of turns before any dice have been rolled; -
StandardYahtzee.Position
, instances of which represent other game states; and -
OptimalPolicy
, defined in theoptimal
module, which implements the optimal solitaire policy when used with the data file/c/cs474/hw6/yahtzee_4295404029819880470.dat
.
Meta-Actions
There are almost 500 possible game actions, which is too many for the small neural networks we aim to use. So instead of selecting from all the possible game actions, you can design your neural network to select from a set of meta-actions. The meta-actions represent instructions such as "try for 1s", "try for 2s", ..., "try for 6s", "try for n-of-a-kind", "try for full house", "try for a straight", or "try for chance", where what each meta-action means in each position is defined by a set of hand-written rules and will depend on the game state. For example, when the current roll is[2 2 2 5 6]
then "try for n-of-a-kind" means "keep
[2 2 2]
" at the beginning of the game when no categories
have been chosen yet, but means "keep [2 2 2 5 6]
"
when three-of-kind is the only category remaining and Yahtzee has 0
so there is no possibility of a Yahtzee bonus.
The functions in the meta_actions
module define a good
set of meta-actions, but you may write your own or not use them at
all if you wish.
Design Process
There are four steps to creating and using the neural network.1. Gather training input
Training examples will be solitaire Yahtzee positions and the optimal choice (dice to keep for reroll positions, category to score in for end-of-turn positions) for those positions. You may use no more than 100,000 examples for training (each position is one example, so one game produces 39 possible examples), so you will have to devise a way to sample the approximately 1.5 billion distinct positions in solitaire Yahtzee. You can use therandom_player
as a starting point for generating
your initial collection of training inputs.
[jrg94@cicada code]$ python3 random_player.py --count=1 --log --sample=1 UP0,22246,2 UP0,22225,1 UP0,22223,0 2 UP8,26666,2 2 UP8,46666,1 2 UP8,46666,0 2 4K UP8,12556,2 2 4K UP8,14466,1 2 4K UP8,11335,0 1 2 4K UP10,14466,2 1 2 4K UP10,23566,1 1 2 4K UP10,46666,0 1 2 3K 4K UP10,24566,2 1 2 3K 4K UP10,13356,1 1 2 3K 4K UP10,23336,0 1 2 3 3K 4K UP19,11233,2 1 2 3 3K 4K UP19,12346,1 1 2 3 3K 4K UP19,12234,0 1 2 3 3K 4K SS UP19,11226,2 1 2 3 3K 4K SS UP19,22445,1 1 2 3 3K 4K SS UP19,34445,0 1 2 3 4 3K 4K SS UP31,12346,2 1 2 3 4 3K 4K SS UP31,12346,1 1 2 3 4 3K 4K SS UP31,22556,0 1 2 3 4 5 3K 4K SS UP41,24556,2 1 2 3 4 5 3K 4K SS UP41,12456,1 1 2 3 4 5 3K 4K SS UP41,44456,0 1 2 3 4 5 3K 4K SS C UP41,15566,2 1 2 3 4 5 3K 4K SS C UP41,11666,1 1 2 3 4 5 3K 4K SS C UP41,55666,0 1 2 3 4 5 3K 4K FH SS C UP41,12345,2 1 2 3 4 5 3K 4K FH SS C UP41,13356,1 1 2 3 4 5 3K 4K FH SS C UP41,33334,0 1 2 3 4 5 3K 4K FH SS C Y UP41,24444,2 1 2 3 4 5 3K 4K FH SS C Y UP41,24666,1 1 2 3 4 5 3K 4K FH SS C Y UP41,24466,0 1 2 3 4 5 6 3K 4K FH SS C Y UP53,14445,2 1 2 3 4 5 6 3K 4K FH SS C Y UP53,11455,1 1 2 3 4 5 6 3K 4K FH SS C Y UP53,14445,0 187.0
2. Label your training inputs
Once you have chosen your training inputs, you can determine the optimal choices with the methods from theOptimalPolicy
in the
optimal
module. In particular, you can use the
choose_category
and choose_dice
methods to determine the optimal action for any position, and you
can use the choose_action
and
value_actions
methods to determine the best of a
list of meta-actions or the values of all the actions selected by
the meta-actions on a list. All of those methods take
StandardYahtzee.Position
objects, and you can turn
the output from the random player into such objects with the
parse_position
method in StandardYahtzee
.
3. Transform and label the training data
You will have to devise a way of representing the positions as inputs to your neural network and the optimal actions as its outputs. Keep in mind the general principle that if you expect two positions to have similar outputs then those inputs corresponding to those positions should be similar to each other so that the neural network can generalize from one position to the other.4. Design and train your neural network
You may use at most 300 neurons total across all your hidden layers (not countingDropout
layers) but the architecture may be whatever
you wish. The train
method in ANNPolicy
should read your
training examples from the file whose name is passed to the method
and then train your neural network.
5. Implement a policy that uses your neural network
Create a class calledANNPolicy
in the nn
module that defines the train
,
choose_category
and choose_dice
methods
in addition to the other methods listed above, for which you need not
change the implementation given in the skeleton.
- The
train
method takes a file and should train your neural network using the training data read from that file. There is an additionalargparse
object passed totrain
to facilitate experimentation with architecture and parameters; see the skeleton code for how to set up and use that argument. - The
choose_dice
method takes aStandardYahtzee
object and aStandardYahtzee.Position
object as parameters and returns theDiceRoll
object representing the dice to keep. This method should encode the given position as input suitable for your neural network, compute the output of your neural network, and interpret that output as a game action – which dice to keep. If your outputs correspond to meta-actions then you can use the functions on the list returned bystandard_dice_meta_actions
to translate the meta-actions into game actions. - The
choose_category
method takes aStandardYahtzee
object and aStandardYahtzee.Position
object as parameters and returns the index of the unused category in which to score the current roll (the roll inside thePosition
object). This method should encode the game state as input to your neural network, compute its output, and interpret that output as an unused category. If your outputs are meta-actions then you can use the meta-actions returned fromstandard_category_meta_actions
to translate from meta-actions to categories. Beware that your neural network will sometimes suggest putting a zero in a category (which is the optimal decision in some cases) and may occasionally improperly suggest choosing a category that has already been used -- you can use theis_used
method on the position object to detect such cases and then choose another category.
Additional Requirements
The time limit for this assignment is 75 CPU-minutes, which is the total for neural network training and evaluation by simulating 16,384 games. To enforce the time limit consistently, your assignment will the run with the GPU turned off. You can runexport CUDA_VISIBLE_DEVICES=""
from the command line
to turn off the GPU to check your elapsed time on your own.
Because training and evaluation use multiple CPUs,
75 CPU-minutes corresponds to about 20 minutes of real-world clock time.
The bash
built-in
time
command will give a more accurate count
of total CPU time (add the reported user
and sys
times).
Graduate Credit
For graduate credit, once you have optimized your neural network design and training plan for the standard assignment, check its performance when varying the resources allowed: for four different combinations of number of training examples and total hidden layer size, train and evaluate your neural network. Include at least two data points with more or less than 300 total hidden neurons and at least two data points with more or less than 100,000 training examples. Evaluate by averaging the score over at least 100,000 simulated games each. Submit your results in your time log.Submissions
Submit just yournn.py
module along with any other
supporting modules, your training data as training.dat
,
and your log.