Assignment 6 - Neural Network for Solitaire Yahtzee

Objectives

Assignment

Create a Python 3 module called nn that creates and trains a neural network for playing solitaire Yahtzee. This module must define

Design Process

There are four steps to creating and using the neural network.

1. Gather training data

Training examples will be solitaire Yahtzee positions and the optimal choice (dice to keep for reroll positions, category to score in for end-of-turn positions) for those positions. You may use no more than 100,000 examples for training, so you will have to devise a way to sample the approximately 1.5 billion distinct positions in solitaire Yahtzee. The yahtzee module contains a function evaluate_policy and a class RandomPolicy that you can use to generate an initial set of training data. Once you have chosen your training inputs, you can determine the optimal choices with the query_optimal.sh program in /c/cs474/hw6. That program reads positions from standard input where each position is given as a comma-separated list of the state of the scoresheet (as returned by YahtzeeScoresheet.as_state_string), the current roll, and the number of rerolls, and outputs the positions with either the dice chosen to keep or the string representation of the category to score in.
[jrg94@scorpion Yahtzee]$ /c/cs474/hw6/query_optimal.sh
UP0,11223,2
UP0,11223,2,[22]
2 3 4 5 6 3K 4K FH SS LS C UP50,11166,2
2 3 4 5 6 3K 4K FH SS LS C UP50,11166,2,[111]
2 3 4 5 6 3K 4K FH SS LS UP60,11666,2
2 3 4 5 6 3K 4K FH SS LS UP60,11666,2,[11]
2 3 4 5 6 3K 4K FH SS LS UP60,11666,1
2 3 4 5 6 3K 4K FH SS LS UP60,11666,1,[11]
2 3 4 5 6 3K 4K FH SS LS UP60,11666,0
2 3 4 5 6 3K 4K FH SS LS UP60,11666,0,C
2 3 4 5 6 3K 4K FH SS LS C Y UP60,16666,2
2 3 4 5 6 3K 4K FH SS LS C Y UP60,16666,2,[1]
2 3 4 5 6 3K 4K FH SS LS C Y+ UP60,16666,2
2 3 4 5 6 3K 4K FH SS LS C Y+ UP60,16666,2,[6666]
  

2. Transform and label the training data

You will have to devise a way of representing the positions as inputs to your neural network and the optimal actions as its outputs. For the latter, we suggest categorical outputs where each class corresponds to a meta-action as defined by the yahtzee.YahtzeeRoll.select_for_xxx methods. Those meta-actions represent instructions such as "try for 1s", "try for 2s", ..., "try for 6s", "try for Yahtzee", "try for full house", "try for a straight", or "try for chance", where what each meta-action means in each position is defined using the hand-written rules implemented by the functions. To convert the output of query_optimal.sh to labelled output, you will have to write code that determines which meta-action best matches the game action selected by the optimal policy.

3. Design and train your neural network

You may use at most 300 neurons total across all your hidden layers (not counting Dropout layers) but the architecture may be whatever you wish. The nn.train function should read your training examples from standard input, train your neural network, and return it.

4. Implement a policy that uses your neural network

Create a class called NNPolicy that defines three functions as described below.

Additional Requirements

Your submission must run with the testit script in the allotted time, which is 75 CPU-minutes. Because training and evaluation use multiple CPUs, 75 minutes of CPU time under testit corresponds to about 12 minutes of clock time when running the grading script directly.

Graduate Credit

For graduate credit, once you have optimized your neural network design and training plan for the standard assignment, check its performance when varying the resources allowed: for four different combinations of number of training examples and total hidden layer size, train and evaluate your neural network. Include at least two data points with more or less than 300 total hidden neurons and at least two data points with more or less than 100,000 training examples. Evaluate by averaging the score over at least 100,000 simulated games each. Submit your results in your time log.

Files

In /c/cs474/hw6/Required are three Python 3 modules:

Additionally, in /c/cs474/hw6/ there is a Java implementation of an optimal solitaire Yahtzee player and a program to determine what choices the optimal policy makes:

Submissions

Submit just your nn.py module along with any other supporting modules, your training data as training.dat, and your log.