Assignment 6 - Neural Network for Solitaire Yahtzee

Objectives

Assignment

Create a Python 3 module called nn that creates and trains a neural network for playing solitaire Yahtzee. There is an incomplete version of that module in /c/cs474/hw6/Optional. The completed module must define a class called ANNPolicy that contains the following methods:

Supporting Code

The modules in /c/cs474/hw6/Required and /c/cs474/hw6/Optional define classes that model solitaire Yahtzee. Of particular use are

Meta-Actions

There are almost 500 possible game actions, which is too many for the small neural networks we aim to use. So instead of selecting from all the possible game actions, you can design your neural network to select from a set of meta-actions. The meta-actions represent instructions such as "try for 1s", "try for 2s", ..., "try for 6s", "try for n-of-a-kind", "try for full house", "try for a straight", or "try for chance", where what each meta-action means in each position is defined by a set of hand-written rules and will depend on the game state. For example, when the current roll is [2 2 2 5 6] then "try for n-of-a-kind" means "keep [2 2 2]" at the beginning of the game when no categories have been chosen yet, but means "keep [2 2 2 5 6]" when three-of-kind is the only category remaining and Yahtzee has 0 so there is no possibility of a Yahtzee bonus.

The functions in the meta_actions module define a good set of meta-actions, but you may write your own or not use them at all if you wish.

Design Process

There are four steps to creating and using the neural network.

1. Gather training input

Training examples will be solitaire Yahtzee positions and the optimal choice (dice to keep for reroll positions, category to score in for end-of-turn positions) for those positions. You may use no more than 100,000 examples for training (each position is one example, so one game produces 39 possible examples), so you will have to devise a way to sample the approximately 1.5 billion distinct positions in solitaire Yahtzee. You can use the random_player as a starting point for generating your initial collection of training inputs.
[jrg94@cicada code]$ python3 random_player.py --count=1 --log --sample=1
UP0,22246,2
UP0,22225,1
UP0,22223,0
2 UP8,26666,2
2 UP8,46666,1
2 UP8,46666,0
2 4K UP8,12556,2
2 4K UP8,14466,1
2 4K UP8,11335,0
1 2 4K UP10,14466,2
1 2 4K UP10,23566,1
1 2 4K UP10,46666,0
1 2 3K 4K UP10,24566,2
1 2 3K 4K UP10,13356,1
1 2 3K 4K UP10,23336,0
1 2 3 3K 4K UP19,11233,2
1 2 3 3K 4K UP19,12346,1
1 2 3 3K 4K UP19,12234,0
1 2 3 3K 4K SS UP19,11226,2
1 2 3 3K 4K SS UP19,22445,1
1 2 3 3K 4K SS UP19,34445,0
1 2 3 4 3K 4K SS UP31,12346,2
1 2 3 4 3K 4K SS UP31,12346,1
1 2 3 4 3K 4K SS UP31,22556,0
1 2 3 4 5 3K 4K SS UP41,24556,2
1 2 3 4 5 3K 4K SS UP41,12456,1
1 2 3 4 5 3K 4K SS UP41,44456,0
1 2 3 4 5 3K 4K SS C UP41,15566,2
1 2 3 4 5 3K 4K SS C UP41,11666,1
1 2 3 4 5 3K 4K SS C UP41,55666,0
1 2 3 4 5 3K 4K FH SS C UP41,12345,2
1 2 3 4 5 3K 4K FH SS C UP41,13356,1
1 2 3 4 5 3K 4K FH SS C UP41,33334,0
1 2 3 4 5 3K 4K FH SS C Y UP41,24444,2
1 2 3 4 5 3K 4K FH SS C Y UP41,24666,1
1 2 3 4 5 3K 4K FH SS C Y UP41,24466,0
1 2 3 4 5 6 3K 4K FH SS C Y UP53,14445,2
1 2 3 4 5 6 3K 4K FH SS C Y UP53,11455,1
1 2 3 4 5 6 3K 4K FH SS C Y UP53,14445,0
187.0

2. Label your training inputs

Once you have chosen your training inputs, you can determine the optimal choices with the methods from the OptimalPolicy in the optimal module. In particular, you can use the choose_category and choose_dice methods to determine the optimal action for any position, and you can use the choose_action and value_actions methods to determine the best of a list of meta-actions or the values of all the actions selected by the meta-actions on a list. All of those methods take StandardYahtzee.Position objects, and you can turn the output from the random player into such objects with the parse_position method in StandardYahtzee.

3. Transform and label the training data

You will have to devise a way of representing the positions as inputs to your neural network and the optimal actions as its outputs. Keep in mind the general principle that if you expect two positions to have similar outputs then those inputs corresponding to those positions should be similar to each other so that the neural network can generalize from one position to the other.

4. Design and train your neural network

You may use at most 300 neurons total across all your hidden layers (not counting Dropout layers) but the architecture may be whatever you wish. The train method in ANNPolicy should read your training examples from the file whose name is passed to the method and then train your neural network.

5. Implement a policy that uses your neural network

Create a class called ANNPolicy in the nn module that defines the train, choose_category and choose_dice methods in addition to the other methods listed above, for which you need not change the implementation given in the skeleton.

Additional Requirements

The time limit for this assignment is 75 CPU-minutes, which is the total for neural network training and evaluation by simulating 16,384 games. To enforce the time limit consistently, your assignment will the run with the GPU turned off. You can run export CUDA_VISIBLE_DEVICES="" from the command line to turn off the GPU to check your elapsed time on your own. Because training and evaluation use multiple CPUs, 75 CPU-minutes corresponds to about 20 minutes of real-world clock time. The bash built-in time command will give a more accurate count of total CPU time (add the reported user and sys times).

Graduate Credit

For graduate credit, once you have optimized your neural network design and training plan for the standard assignment, check its performance when varying the resources allowed: for four different combinations of number of training examples and total hidden layer size, train and evaluate your neural network. Include at least two data points with more or less than 300 total hidden neurons and at least two data points with more or less than 100,000 training examples. Evaluate by averaging the score over at least 100,000 simulated games each. Submit your results in your time log.

Submissions

Submit just your nn.py module along with any other supporting modules, your training data as training.dat, and your log.