Assignment 4 - Monte Carlo Tree Search

Objectives

to implement Monte Carlo Tree Search

Introduction

Assignment

Create a Python 3 module called mcts (so this must be in a file called mcts.py that implements a function called mcts_strategy that takes a number of iterations and returns a function that takes a position and returns the move suggested by running MCTS for that number of iterations starting with that position.

The positions passed to the search functions will be instances of Kalah.Position from the kalah module. Python does not have a mechanism to prevent access to members, but your search will not be graded if it accesses members whose names begin with an underscore (the Python convention for indicating members not intended for use by client code). The functions you are allowed to use are

legal_moves to return a list of legal moves from a position, where the moves are given as the indices of the pits to sow from, numbered counterclockwise starting with 0 for P1's leftmost pit;
next_player to determine which player is to make the next move;
result to return the position that results from making a move;
game_over to determine if a position is terminal;
winner to determine which player won at terminal positions; and
is_initial to determine if a position is the starting position.

You can see the minimax function in the minimax module for examples of using these functions and read their documentation in the kalah module.

Additional Requirements

You may not use global variables or class variables (or any other construct that results in a variable whose lifetime is the entire execution of the program) in any of your modules. (Global constants are allowed and encouraged where appropriate.)
Tests will be executed with pypy3, so write your code for Python 3.5 and only use modules are available for pypy3 on the Zoo (this excludes numpy).
Observe the given iteration limit. Each playout is considered an iteration (so if you expand and perform a playout from all children at once in a single iterations of your loop, that still counts as multiple iterations).

Graduate Credit

Experiment with an enhancement to the default policy that will be tested separately from your plain MCTS functions. To do so, create a function called mcts_strategy_enhanced in your search module that works like mcts_strategy but returns a function that uses your enhancement. If your experiment is successful (that is, actually improves the results) then the two functions may be the same. Your new function may use the "private" members of the Kalah.Position class using their names as given in the kalah.py module as well as the functions is_move_again and is_capture. Describe your enhancement and your efforts to ensure that you've implemented your enhancement correctly in your log file (statistically significant improved results are sufficient evidence of a correct implementation).

Files

In /c/cs474/hw4/Required are three Python 3 modules:

kalah.py that defines classes that implement the rules of Kalah
minimax.py that defines functions that implement minimax
test_mcts.py the implements the test drivers

Examples

  [jrg94@chameleon Kalah]$ pypy3 test_mcts.py -game 100 4 0.1 1000
  0.53
  [jrg94@chameleon Kalah]$ pypy3 test_mcts.py -game 100 4 0.1 10000
  0.705

Note that the first test takes about four minutes and the second test takes the better part of an hour. This is the cost of using a search technique that doesn't require any domain knowledge.

Submissions

Submit just your mcts.py module along with any other supporting modules and your log. There is no need to submit a makefile or executable; the test scripts create an executable called MCTS that is used to run the test drivers using pypy3.

The time limit for the single public test is set to allow 1000 iterations per move. The private test cases may use more iterations and the time limit will be set proportionally higher. Faster implementations may be rewarded with more iterations.

Submissions must implement Monte Carlo Tree Search, so submissions that use a different search will not be graded even if they meet the performance requirements automatically checked by the grading scripts.