Assignment 4 - Monte Carlo Tree Search

Objectives

to implement Monte Carlo Tree Search

Introduction

See the Wikipedia article on kalah and recall the rules of the pegging (count) phase of cribbage.

Assignment

Create a Python 3 module called mcts (so this must be in a file called mcts.py) that implements a function called mcts_policy that takes the allowed CPU time in seconds and returns a function that takes a position and returns the move suggested by running MCTS for that amount of time starting with that position.

The positions passed to the search functions will be instances of the abstract State class from the game module. The functions available for State objects are

actor to determine which player (0 or 1) is to make the next move;
get_actions to return a list of legal moves at a state (the type of the elements of that list will vary from game to game; for example in kalah they are indices of the pit to sow from, and in cribbage they are the card to play);
is_legal to determine if an action is legal at a given state (all the actions returned by get_actions are legal, so you might not have a use for this function);
successor that takes one of the moves returned by get_actions and returns the state that results from making that move;
is_terminal to determine if a position is terminal; and
payoff to determine the payoff to player 0 at terminal positions (the range of values returned will vary from game to game; for kalah the returned value is 1, 0, or -1 for a P0 win, draw, and loss respectively, and for cribbage the value is the different in points earned for P0 compared to P1; whatever the range, the games are zero-sum, and more positive is always better for P0)

You may also use the __hash__ and __eq__ methods indirectly by using the states as keys in a dictionary.

You can see the minimax function in the minimax module for examples of using these functions and read their documentation in the game module.

There are implementations of kalah and cribbage in /c/cs474/hw4/Required that track game state using instances of State that will be used to test your code. For cribbage, the test is in a perfect-information version of the pegging (count) phase: players are dealt 4 cards seen by both players, and pegging then continues as usual. Note that Minimax with a depth of 14 can find an optimal solution quickly, so your MCTS implementation will never beat Minimax on average. However, the perfect-information pegging game is useful as an evaluation of your MCTS agent by measuring how close to optimal it performs as it is allowed more time to search the tree. Your code may be tested on other games that conform to the interface specified in the game module.

Additional Requirements

You may not use more than six numeric values as global variables or class variables (or any other construct that results in a variable whose lifetime is the entire execution of the program) in any of your modules. (Global constants are allowed and encouraged where appropriate.) You may, however, use variables whose lifetime spans an entire game (which corresponds to a single call to your mcts_policy function).
Tests will be executed with pypy3, so write your code for Python 3.9 and only use modules are available for pypy3 on the Zoo (this excludes numpy).
Observe the given time limit. You'll be given a grace period to complete an iteration and select a move after time expires, but submissions that exceed the time limit by more than the time to do that will incur a penalty.

Testing and Examples

The /c/cs474/hw4/Required directory includes the testing module test_mcts.py. Run that module with argument --help to see how to select the game to test with, the amount of time to run MCTS to determine each move, the number of test games to run, and the depth searched by the Minimax agent to compare your MCTS agent to, and the probability that the Minimax agent chooses a random move (useful to introduce some nondeterminism into kalah). The test driver alternates between using your MCTS agent for P0 and P1, but the payoffs reported by the terminal states are always for P0. Your MCTS agent must therefore determine whether it is selecting moves for the max player (P0) or the min player (P1). The first value in the output is average payoff to your MCTS agent (so average score margin for pegging and average game points for kalah). (The second is average number of wins for your MCTS agent vs. the Minimax agent, but this is not used for evaluation).

Grading will be determined by performance against the Minimax agent with various parameters. Submissions must implement Monte Carlo Tree Search, so submissions that use a different search will not be graded even if they meet the performance requirements automatically checked by the grading scripts.

[jrg94@chameleon MCTS]$ pypy3 test_mcts.py --game=pegging --count=100 --time=0.01 --depth=14
-0.27 0.415
[jrg94@chameleon MCTS]$ pypy3 test_mcts.py --game=pegging --count=100 --time=0.05 --depth=14
-0.15 0.46
[jrg94@chameleon MCTS]$ pypy3 test_mcts.py --game=kalah --count=100 --time=0.25 --depth=4 --random=0.1
0.32 0.66
[jrg94@chameleon MCTS]$ pypy3 test_mcts.py --game=kalah --count=100 --time=1.0 --depth=4 --random=0.1
0.47 0.735

Submissions

Submit just your mcts.py module along with any other supporting modules and your log. There is no need to submit a makefile or executable; the test scripts create an executable called MCTS that is used to run the test drivers using pypy3.