Assignment 6 - Neural Network for Solitaire Yahtzee

Objectives

to gather good training data for supervised learning
to choose a good representation of inputs to a neural network
to train and use a neural network in an agent for a game

Assignment

Create a Python 3 module called nn that creates and trains a neural network for playing solitaire Yahtzee. There is an incomplete version of that module in /c/cs474/hw6/Optional. The completed module must define a class called ANNPolicy that contains the following methods:

an initializer that takes an object that encapsulates the rules of Yahtzee;
a method train, which reads your training examples from a file and trains your neural network;
methods save and load to load and save trained neural networks to and from files;
a method choose_category, which takes a position, feeds it through the neural network, and uses the output to select and return an unused category to play the roll in;
a method choose_dice, which takes a position, feeds it through the neural network, and used the output to select and return a subset of the roll to keep before the next roll.
methods start_turn and see_roll, which do nothing but are required for the generic policy interface;
a class method add_arguments that takes an argparse object and that you can use to set parameters for your ANN architecture and training (see the skeleton code for instructions and examples).

Supporting Code

The modules in /c/cs474/hw6/Required and /c/cs474/hw6/Optional define classes that model solitaire Yahtzee. Of particular use are

DiceRoll, defined in the roll module, instances of which represent outcomes of rolling dice;
StandardYahtzee, defined in the yahtzee module, which defines objects that represent the rules of solitaire Yahtzee;
StandardYahtzee.Anchor, instances of which represent game state at the beginning of turns before any dice have been rolled;
StandardYahtzee.Position, instances of which represent other game states; and
OptimalPolicy, defined in the optimal module, which implements the optimal solitaire policy when used with the data file /c/cs474/hw6/yahtzee_4295404029819880470.dat.

Meta-Actions

There are almost 500 possible game actions, which is too many for the small neural networks we aim to use. So instead of selecting from all the possible game actions, you can design your neural network to select from a set of meta-actions. The meta-actions represent instructions such as "try for 1s", "try for 2s", ..., "try for 6s", "try for n-of-a-kind", "try for full house", "try for a straight", or "try for chance", where what each meta-action means in each position is defined by a set of hand-written rules and will depend on the game state. For example, when the current roll is [2 2 2 5 6] then "try for n-of-a-kind" means "keep [2 2 2]" at the beginning of the game when no categories have been chosen yet, but means "keep [2 2 2 5 6]" when three-of-kind is the only category remaining and Yahtzee has 0 so there is no possibility of a Yahtzee bonus.

The functions in the meta_actions module define a good set of meta-actions, but you may write your own or not use them at all if you wish.

Design Process

There are four steps to creating and using the neural network.

1. Gather training input

Training examples will be solitaire Yahtzee positions and the optimal choice (dice to keep for reroll positions, category to score in for end-of-turn positions) for those positions. You may use no more than 100,000 examples for training (each position is one example, so one game produces 39 possible examples), so you will have to devise a way to sample the approximately 1.5 billion distinct positions in solitaire Yahtzee. You can use the random_player as a starting point for generating your initial collection of training inputs.

[jrg94@cicada code]$ python3 random_player.py --count=1 --log --sample=1
UP0,22246,2
UP0,22225,1
UP0,22223,0
2 UP8,26666,2
2 UP8,46666,1
2 UP8,46666,0
2 4K UP8,12556,2
2 4K UP8,14466,1
2 4K UP8,11335,0
1 2 4K UP10,14466,2
1 2 4K UP10,23566,1
1 2 4K UP10,46666,0
1 2 3K 4K UP10,24566,2
1 2 3K 4K UP10,13356,1
1 2 3K 4K UP10,23336,0
1 2 3 3K 4K UP19,11233,2
1 2 3 3K 4K UP19,12346,1
1 2 3 3K 4K UP19,12234,0
1 2 3 3K 4K SS UP19,11226,2
1 2 3 3K 4K SS UP19,22445,1
1 2 3 3K 4K SS UP19,34445,0
1 2 3 4 3K 4K SS UP31,12346,2
1 2 3 4 3K 4K SS UP31,12346,1
1 2 3 4 3K 4K SS UP31,22556,0
1 2 3 4 5 3K 4K SS UP41,24556,2
1 2 3 4 5 3K 4K SS UP41,12456,1
1 2 3 4 5 3K 4K SS UP41,44456,0
1 2 3 4 5 3K 4K SS C UP41,15566,2
1 2 3 4 5 3K 4K SS C UP41,11666,1
1 2 3 4 5 3K 4K SS C UP41,55666,0
1 2 3 4 5 3K 4K FH SS C UP41,12345,2
1 2 3 4 5 3K 4K FH SS C UP41,13356,1
1 2 3 4 5 3K 4K FH SS C UP41,33334,0
1 2 3 4 5 3K 4K FH SS C Y UP41,24444,2
1 2 3 4 5 3K 4K FH SS C Y UP41,24666,1
1 2 3 4 5 3K 4K FH SS C Y UP41,24466,0
1 2 3 4 5 6 3K 4K FH SS C Y UP53,14445,2
1 2 3 4 5 6 3K 4K FH SS C Y UP53,11455,1
1 2 3 4 5 6 3K 4K FH SS C Y UP53,14445,0
187.0

2. Label your training inputs

Once you have chosen your training inputs, you can determine the optimal choices with the methods from the OptimalPolicy in the optimal module. In particular, you can use the choose_category and choose_dice methods to determine the optimal action for any position, and you can use the choose_action and value_actions methods to determine the best of a list of meta-actions or the values of all the actions selected by the meta-actions on a list. All of those methods take StandardYahtzee.Position objects, and you can turn the output from the random player into such objects with the parse_position method in StandardYahtzee.

3. Transform and label the training data

You will have to devise a way of representing the positions as inputs to your neural network and the optimal actions as its outputs. Keep in mind the general principle that if you expect two positions to have similar outputs then those inputs corresponding to those positions should be similar to each other so that the neural network can generalize from one position to the other.

4. Design and train your neural network

You may use at most 300 neurons total across all your hidden layers (not counting Dropout layers) but the architecture may be whatever you wish. The train method in ANNPolicy should read your training examples from the file whose name is passed to the method and then train your neural network.

5. Implement a policy that uses your neural network

Create a class called ANNPolicy in the nn module that defines the train, choose_category and choose_dice methods in addition to the other methods listed above, for which you need not change the implementation given in the skeleton.

The train method takes a file and should train your neural network using the training data read from that file. There is an additional argparse object passed to train to facilitate experimentation with architecture and parameters; see the skeleton code for how to set up and use that argument.
The choose_dice method takes a StandardYahtzee object and a StandardYahtzee.Position object as parameters and returns the DiceRoll object representing the dice to keep. This method should encode the given position as input suitable for your neural network, compute the output of your neural network, and interpret that output as a game action – which dice to keep. If your outputs correspond to meta-actions then you can use the functions on the list returned by standard_dice_meta_actions to translate the meta-actions into game actions.
The choose_category method takes a StandardYahtzee object and a StandardYahtzee.Position object as parameters and returns the index of the unused category in which to score the current roll (the roll inside the Position object). This method should encode the game state as input to your neural network, compute its output, and interpret that output as an unused category. If your outputs are meta-actions then you can use the meta-actions returned from standard_category_meta_actions to translate from meta-actions to categories. Beware that your neural network will sometimes suggest putting a zero in a category (which is the optimal decision in some cases) and may occasionally improperly suggest choosing a category that has already been used -- you can use the is_used method on the position object to detect such cases and then choose another category.

Additional Requirements

The time limit for this assignment is 75 CPU-minutes, which is the total for neural network training and evaluation by simulating 16,384 games. To enforce the time limit consistently, your assignment will the run with the GPU turned off. You can run export CUDA_VISIBLE_DEVICES="" from the command line to turn off the GPU to check your elapsed time on your own. Because training and evaluation use multiple CPUs, 75 CPU-minutes corresponds to about 20 minutes of real-world clock time. The bash built-in time command will give a more accurate count of total CPU time (add the reported user and sys times).

Graduate Credit

For graduate credit, once you have optimized your neural network design and training plan for the standard assignment, check its performance when varying the resources allowed: for four different combinations of number of training examples and total hidden layer size, train and evaluate your neural network. Include at least two data points with more or less than 300 total hidden neurons and at least two data points with more or less than 100,000 training examples. Evaluate by averaging the score over at least 100,000 simulated games each. Submit your results in your time log.

Submissions

Submit just your nn.py module along with any other supporting modules, your training data as training.dat, and your log.