Assignment 4 - Monte Carlo Tree Search
Objectives
- to implement Monte Carlo Tree Search
Introduction
See the Wikipedia article on kalah and recall the rules of the pegging (count) phase of cribbage.
Assignment
Create a Python 3 module calledmcts
(so this must be in a file called
mcts.py
) that implements a function called
mcts_policy
that takes
the allowed CPU time in seconds and returns a function that takes a position
and returns the move suggested by running MCTS for that amount of time
starting with that position.
The positions passed to the search functions will be instances of
the abstract State
class from the game
module.
The functions available for State
objects are
-
actor
to determine which player (0 or 1) is to make the next move; -
get_actions
to return a list of legal moves at a state (the type of the elements of that list will vary from game to game; for example in kalah they are indices of the pit to sow from, and in cribbage they are the card to play); -
is_legal
to determine if an action is legal at a given state (all the actions returned byget_actions
are legal, so you might not have a use for this function); -
successor
that takes one of the moves returned byget_actions
and returns the state that results from making that move; -
is_terminal
to determine if a position is terminal; and -
payoff
to determine the payoff to player 0 at terminal positions (the range of values returned will vary from game to game; for kalah the returned value is 1, 0, or -1 for a P0 win, draw, and loss respectively, and for cribbage the value is the different in points earned for P0 compared to P1; whatever the range, the games are zero-sum, and more positive is always better for P0)
__hash__
and __eq__
methods
indirectly by using the states as keys in a dictionary.
You can see the minimax
function in the minimax
module for
examples of using these functions and read their documentation in the game
module.
There are implementations
of kalah and cribbage in /c/cs474/hw4/Required
that
track game state using instances of State
that will
be used to test your code. For cribbage, the test is in
a perfect-information version of the pegging (count) phase:
players are dealt 4 cards seen by both players,
and pegging then continues as usual. Note that Minimax
with a depth of 14 can find an optimal solution quickly, so your
MCTS implementation will never beat Minimax on average.
However, the perfect-information pegging game is useful
as an evaluation of your MCTS agent by measuring how close
to optimal it performs as it is allowed more time to search the
tree.
Your code may be tested on other games
that conform to the interface specified in the game
module.
Additional Requirements
- You may not use more than six numeric values as
global variables or class variables (or any other
construct that results in a variable whose lifetime is the entire
execution of the program) in any of your modules. (Global constants
are allowed and encouraged where appropriate.) You may, however,
use variables whose lifetime spans an entire game
(which corresponds to a single call to your
mcts_policy
function). - Tests will be executed with
pypy3
, so write your code for Python 3.9 and only use modules are available forpypy3
on the Zoo (this excludesnumpy
). - Observe the given time limit. You'll be given a grace period to complete an iteration and select a move after time expires, but submissions that exceed the time limit by more than the time to do that will incur a penalty.
Testing and Examples
The/c/cs474/hw4/Required
directory includes
the testing module test_mcts.py
. Run that module with
argument --help
to see how to select the game to test
with, the amount of time to run MCTS to determine each move,
the number of test games to run, and the depth searched by the
Minimax agent to compare your MCTS agent to, and the probability
that the Minimax agent chooses a random move (useful to introduce
some nondeterminism into kalah). The test driver alternates between
using your MCTS agent for P0 and P1, but the payoffs reported by
the terminal states are always for P0. Your MCTS agent must therefore
determine whether it is selecting moves for the max player (P0)
or the min player (P1). The first value in the output
is average payoff to your MCTS agent (so average score margin for pegging and
average game points for kalah). (The second is average number of wins
for your MCTS agent vs. the Minimax agent, but this is not used for
evaluation).
Grading will be determined by performance against the Minimax agent with various parameters. Submissions must implement Monte Carlo Tree Search, so submissions that use a different search will not be graded even if they meet the performance requirements automatically checked by the grading scripts.
[jrg94@chameleon MCTS]$ pypy3 test_mcts.py --game=pegging --count=100 --time=0.01 --depth=14 -0.27 0.415 [jrg94@chameleon MCTS]$ pypy3 test_mcts.py --game=pegging --count=100 --time=0.05 --depth=14 -0.15 0.46 [jrg94@chameleon MCTS]$ pypy3 test_mcts.py --game=kalah --count=100 --time=0.25 --depth=4 --random=0.1 0.32 0.66 [jrg94@chameleon MCTS]$ pypy3 test_mcts.py --game=kalah --count=100 --time=1.0 --depth=4 --random=0.1 0.47 0.735
Submissions
Submit just yourmcts.py
module along with any other supporting modules
and your log.
There is no need to submit a makefile or executable; the test scripts create an executable
called MCTS
that is used to run the test drivers
using pypy3
.