Assignment 4 - Monte Carlo Tree Search

Objectives

Introduction

See the Wikipedia article on kalah and recall the rules of the pegging (count) phase of cribbage.

Assignment

Create a Python 3 module called mcts (so this must be in a file called mcts.py) that implements a function called mcts_policy that takes the allowed CPU time in seconds and returns a function that takes a position and returns the move suggested by running MCTS for that amount of time starting with that position.

The positions passed to the search functions will be instances of the abstract State class from the game module. The functions available for State objects are

You may also use the __hash__ and __eq__ methods indirectly by using the states as keys in a dictionary.

You can see the minimax function in the minimax module for examples of using these functions and read their documentation in the game module.

There are implementations of kalah and cribbage in /c/cs474/hw4/Required that track game state using instances of State that will be used to test your code. For cribbage, the test is in a perfect-information version of the pegging (count) phase: players are dealt 4 cards seen by both players, and pegging then continues as usual. Note that Minimax with a depth of 14 can find an optimal solution quickly, so your MCTS implementation will never beat Minimax on average. However, the perfect-information pegging game is useful as an evaluation of your MCTS agent by measuring how close to optimal it performs as it is allowed more time to search the tree. Your code may be tested on other games that conform to the interface specified in the game module.

Additional Requirements

Testing and Examples

The /c/cs474/hw4/Required directory includes the testing module test_mcts.py. Run that module with argument --help to see how to select the game to test with, the amount of time to run MCTS to determine each move, the number of test games to run, and the depth searched by the Minimax agent to compare your MCTS agent to, and the probability that the Minimax agent chooses a random move (useful to introduce some nondeterminism into kalah). The test driver alternates between using your MCTS agent for P0 and P1, but the payoffs reported by the terminal states are always for P0. Your MCTS agent must therefore determine whether it is selecting moves for the max player (P0) or the min player (P1). The first value in the output is average payoff to your MCTS agent (so average score margin for pegging and average game points for kalah). (The second is average number of wins for your MCTS agent vs. the Minimax agent, but this is not used for evaluation).

Grading will be determined by performance against the Minimax agent with various parameters. Submissions must implement Monte Carlo Tree Search, so submissions that use a different search will not be graded even if they meet the performance requirements automatically checked by the grading scripts.

[jrg94@chameleon MCTS]$ pypy3 test_mcts.py --game=pegging --count=100 --time=0.01 --depth=14
-0.27 0.415
[jrg94@chameleon MCTS]$ pypy3 test_mcts.py --game=pegging --count=100 --time=0.05 --depth=14
-0.15 0.46
[jrg94@chameleon MCTS]$ pypy3 test_mcts.py --game=kalah --count=100 --time=0.25 --depth=4 --random=0.1
0.32 0.66
[jrg94@chameleon MCTS]$ pypy3 test_mcts.py --game=kalah --count=100 --time=1.0 --depth=4 --random=0.1
0.47 0.735

Submissions

Submit just your mcts.py module along with any other supporting modules and your log. There is no need to submit a makefile or executable; the test scripts create an executable called MCTS that is used to run the test drivers using pypy3.