{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Game Tree Search\n", "\n", "We start with defining the abstract class `Game`, for turn-taking *n*-player games. We rely on, but do not define yet, the concept of a `state` of the game; we'll see later how individual games define states. For now, all we require is that a state has a `state.to_move` attribute, which gives the name of the player whose turn it is. (\"Name\" will be something like `'X'` or `'O'` for tic-tac-toe.) \n", "\n", "We also define `play_game`, which takes a game and a dictionary of `{player_name: strategy_function}` pairs, and plays out the game, on each turn checking `state.to_move` to see whose turn it is, and then getting the strategy function for that player and applying it to the game and the state to get a move." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "from collections import namedtuple, Counter, defaultdict\n", "import random\n", "import math\n", "import functools \n", "cache = functools.lru_cache(10**6)" ] }, { "cell_type": "code", "execution_count": 73, "metadata": {}, "outputs": [], "source": [ "class Game:\n", " \"\"\"A game is similar to a problem, but it has a terminal test instead of \n", " a goal test, and a utility for each terminal state. To create a game, \n", " subclass this class and implement `actions`, `result`, `is_terminal`, \n", " and `utility`. You will also need to set the .initial attribute to the \n", " initial state; this can be done in the constructor.\"\"\"\n", "\n", " def actions(self, state):\n", " \"\"\"Return a collection of the allowable moves from this state.\"\"\"\n", " raise NotImplementedError\n", "\n", " def result(self, state, move):\n", " \"\"\"Return the state that results from making a move from a state.\"\"\"\n", " raise NotImplementedError\n", "\n", " def is_terminal(self, state):\n", " \"\"\"Return True if this is a final state for the game.\"\"\"\n", " return not self.actions(state)\n", " \n", " def utility(self, state, player):\n", " \"\"\"Return the value of this final state to player.\"\"\"\n", " raise NotImplementedError\n", " \n", "\n", "def play_game(game, strategies: dict, verbose=False):\n", " \"\"\"Play a turn-taking game. `strategies` is a {player_name: function} dict,\n", " where function(state, game) is used to get the player's move.\"\"\"\n", " state = game.initial\n", " while not game.is_terminal(state):\n", " player = state.to_move\n", " move = strategies[player](game, state)\n", " state = game.result(state, move)\n", " if verbose: \n", " print('Player', player, 'move:', move)\n", " print(state)\n", " return state" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Minimax-Based Game Search Algorithms\n", "\n", "We will define several game search algorithms. Each takes two inputs, the game we are playing and the current state of the game, and returns a a `(value, move)` pair, where `value` is the utility that the algorithm computes for the player whose turn it is to move, and `move` is the move itself.\n", "\n", "First we define `minimax_search`, which exhaustively searches the game tree to find an optimal move (assuming both players play optimally), and `alphabeta_search`, which does the same computation, but prunes parts of the tree that could not possibly have an affect on the optimnal move. " ] }, { "cell_type": "code", "execution_count": 66, "metadata": {}, "outputs": [], "source": [ "def minimax_search(game, state):\n", " \"\"\"Search game tree to determine best move; return (value, move) pair.\"\"\"\n", "\n", " player = state.to_move\n", "\n", " def max_value(state):\n", " if game.is_terminal(state):\n", " return game.utility(state, player), None\n", " v, move = -infinity, None\n", " for a in game.actions(state):\n", " v2, _ = min_value(game.result(state, a))\n", " if v2 > v:\n", " v, move = v2, a\n", " return v, move\n", "\n", " def min_value(state):\n", " if game.is_terminal(state):\n", " return game.utility(state, player), None\n", " v, move = +infinity, None\n", " for a in game.actions(state):\n", " v2, _ = max_value(game.result(state, a))\n", " if v2 < v:\n", " v, move = v2, a\n", " return v, move\n", "\n", " return max_value(state)\n", "\n", "infinity = math.inf\n", "\n", "def alphabeta_search(game, state):\n", " \"\"\"Search game to determine best action; use alpha-beta pruning.\n", " As in [Figure 5.7], this version searches all the way to the leaves.\"\"\"\n", "\n", " player = state.to_move\n", "\n", " def max_value(state, alpha, beta):\n", " if game.is_terminal(state):\n", " return game.utility(state, player), None\n", " v, move = -infinity, None\n", " for a in game.actions(state):\n", " v2, _ = min_value(game.result(state, a), alpha, beta)\n", " if v2 > v:\n", " v, move = v2, a\n", " alpha = max(alpha, v)\n", " if v >= beta:\n", " return v, move\n", " return v, move\n", "\n", " def min_value(state, alpha, beta):\n", " if game.is_terminal(state):\n", " return game.utility(state, player), None\n", " v, move = +infinity, None\n", " for a in game.actions(state):\n", " v2, _ = max_value(game.result(state, a), alpha, beta)\n", " if v2 < v:\n", " v, move = v2, a\n", " beta = min(beta, v)\n", " if v <= alpha:\n", " return v, move\n", " return v, move\n", "\n", " return max_value(state, -infinity, +infinity)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# A Simple Game: Tic-Tac-Toe\n", "\n", "We have the notion of an abstract game, we have some search functions; now it is time to define a real game; a simple one, tic-tac-toe. Moves are `(x, y)` pairs denoting squares, where `(0, 0)` is the top left, and `(2, 2)` is the bottom right (on a board of size `height=width=3`)." ] }, { "cell_type": "code", "execution_count": 67, "metadata": {}, "outputs": [], "source": [ "class TicTacToe(Game):\n", " \"\"\"Play TicTacToe on an `height` by `width` board, needing `k` in a row to win.\n", " 'X' plays first against 'O'.\"\"\"\n", "\n", " def __init__(self, height=3, width=3, k=3):\n", " self.k = k # k in a row\n", " self.squares = {(x, y) for x in range(width) for y in range(height)}\n", " self.initial = Board(height=height, width=width, to_move='X', utility=0)\n", "\n", " def actions(self, board):\n", " \"\"\"Legal moves are any square not yet taken.\"\"\"\n", " return self.squares - set(board)\n", "\n", " def result(self, board, square):\n", " \"\"\"Place a marker for current player on square.\"\"\"\n", " player = board.to_move\n", " board = board.new({square: player}, to_move=('O' if player == 'X' else 'X'))\n", " win = k_in_row(board, player, square, self.k)\n", " board.utility = (0 if not win else +1 if player == 'X' else -1)\n", " return board\n", "\n", " def utility(self, board, player):\n", " \"\"\"Return the value to player; 1 for win, -1 for loss, 0 otherwise.\"\"\"\n", " return board.utility if player == 'X' else -board.utility\n", "\n", " def is_terminal(self, board):\n", " \"\"\"A board is a terminal state if it is won or there are no empty squares.\"\"\"\n", " return board.utility != 0 or len(self.squares) == len(board)\n", "\n", " def display(self, board): print(board) \n", "\n", "\n", "def k_in_row(board, player, square, k):\n", " \"\"\"True if player has k pieces in a line through square.\"\"\"\n", " def in_row(x, y, dx, dy): return 0 if board[x, y] != player else 1 + in_row(x + dx, y + dy, dx, dy)\n", " return any(in_row(*square, dx, dy) + in_row(*square, -dx, -dy) - 1 >= k\n", " for (dx, dy) in ((0, 1), (1, 0), (1, 1), (1, -1)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "States in tic-tac-toe (and other games) will be represented as a `Board`, which is a subclass of `defaultdict` that in general will consist of `{(x, y): contents}` pairs, for example `{(0, 0): 'X', (1, 1): 'O'}` might be the state of the board after two moves. Besides the contents of squares, a board also has some attributes: \n", "- `.to_move` to name the player whose move it is; \n", "- `.width` and `.height` to give the size of the board (both 3 in tic-tac-toe, but other numbers in related games);\n", "- possibly other attributes, as specified by keywords. \n", "\n", "As a `defaultdict`, the `Board` class has a `__missing__` method, which returns `empty` for squares that have no been assigned but are within the `width` × `height` boundaries, or `off` otherwise. The class has a `__hash__` method, so instances can be stored in hash tables." ] }, { "cell_type": "code", "execution_count": 68, "metadata": {}, "outputs": [], "source": [ "class Board(defaultdict):\n", " \"\"\"A board has the player to move, a cached utility value, \n", " and a dict of {(x, y): player} entries, where player is 'X' or 'O'.\"\"\"\n", " empty = '.'\n", " off = '#'\n", " \n", " def __init__(self, width=8, height=8, to_move=None, **kwds):\n", " self.__dict__.update(width=width, height=height, to_move=to_move, **kwds)\n", " \n", " def new(self, changes: dict, **kwds) -> 'Board':\n", " \"Given a dict of {(x, y): contents} changes, return a new Board with the changes.\"\n", " board = Board(width=self.width, height=self.height, **kwds)\n", " board.update(self)\n", " board.update(changes)\n", " return board\n", "\n", " def __missing__(self, loc):\n", " x, y = loc\n", " if 0 <= x < self.width and 0 <= y < self.height:\n", " return self.empty\n", " else:\n", " return self.off\n", " \n", " def __hash__(self): \n", " return hash(tuple(sorted(self.items()))) + hash(self.to_move)\n", " \n", " def __repr__(self):\n", " def row(y): return ' '.join(self[x, y] for x in range(self.width))\n", " return '\\n'.join(map(row, range(self.height))) + '\\n'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Players\n", "\n", "We need an interface for players. I'll represent a player as a `callable` that will be passed two arguments: `(game, state)` and will return a `move`.\n", "The function `player` creates a player out of a search algorithm, but you can create your own players as functions, as is done with `random_player` below:" ] }, { "cell_type": "code", "execution_count": 69, "metadata": {}, "outputs": [], "source": [ "def random_player(game, state): return random.choice(list(game.actions(state)))\n", "\n", "def player(search_algorithm):\n", " \"\"\"A game player who uses the specified search algorithm\"\"\"\n", " return lambda game, state: search_algorithm(game, state)[1]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Playing a Game\n", "\n", "We're ready to play a game. I'll set up a match between a `random_player` (who chooses randomly from the legal moves) and a `player(alphabeta_search)` (who makes the optimal alpha-beta move; practical for tic-tac-toe, but not for large games). The `player(alphabeta_search)` will never lose, but if `random_player` is lucky, it will be a tie." ] }, { "cell_type": "code", "execution_count": 74, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Player X move: (0, 0)\n", "X . .\n", ". . .\n", ". . .\n", "\n", "Player O move: (1, 1)\n", "X . .\n", ". O .\n", ". . .\n", "\n", "Player X move: (1, 2)\n", "X . .\n", ". O .\n", ". X .\n", "\n", "Player O move: (0, 1)\n", "X . .\n", "O O .\n", ". X .\n", "\n", "Player X move: (2, 1)\n", "X . .\n", "O O X\n", ". X .\n", "\n", "Player O move: (2, 0)\n", "X . O\n", "O O X\n", ". X .\n", "\n", "Player X move: (2, 2)\n", "X . O\n", "O O X\n", ". X X\n", "\n", "Player O move: (0, 2)\n", "X . O\n", "O O X\n", "O X X\n", "\n" ] }, { "data": { "text/plain": [ "-1" ] }, "execution_count": 74, "metadata": {}, "output_type": "execute_result" } ], "source": [ "play_game(TicTacToe(), dict(X=random_player, O=player(alphabeta_search)), verbose=True).utility" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The alpha-beta player will never lose, but sometimes the random player can stumble into a draw. When two optimal (alpha-beta or minimax) players compete, it will always be a draw:" ] }, { "cell_type": "code", "execution_count": 75, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Player X move: (0, 1)\n", ". . .\n", "X . .\n", ". . .\n", "\n", "Player O move: (0, 0)\n", "O . .\n", "X . .\n", ". . .\n", "\n", "Player X move: (2, 0)\n", "O . X\n", "X . .\n", ". . .\n", "\n", "Player O move: (2, 1)\n", "O . X\n", "X . O\n", ". . .\n", "\n", "Player X move: (1, 2)\n", "O . X\n", "X . O\n", ". X .\n", "\n", "Player O move: (0, 2)\n", "O . X\n", "X . O\n", "O X .\n", "\n", "Player X move: (1, 0)\n", "O X X\n", "X . O\n", "O X .\n", "\n", "Player O move: (1, 1)\n", "O X X\n", "X O O\n", "O X .\n", "\n", "Player X move: (2, 2)\n", "O X X\n", "X O O\n", "O X X\n", "\n" ] }, { "data": { "text/plain": [ "0" ] }, "execution_count": 75, "metadata": {}, "output_type": "execute_result" } ], "source": [ "play_game(TicTacToe(), dict(X=player(alphabeta_search), O=player(minimax_search)), verbose=True).utility" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Connect Four\n", "\n", "Connect Four is a variant of tic-tac-toe, played on a larger (7 x 6) board, and with the restriction that in any column you can only play in the lowest empty square in the column." ] }, { "cell_type": "code", "execution_count": 76, "metadata": {}, "outputs": [], "source": [ "class ConnectFour(TicTacToe):\n", " \n", " def __init__(self): super().__init__(width=7, height=6, k=4)\n", "\n", " def actions(self, board):\n", " \"\"\"In each column you can play only the lowest empty square in the column.\"\"\"\n", " return {(x, y) for (x, y) in self.squares - set(board)\n", " if y == board.height - 1 or (x, y + 1) in board}" ] }, { "cell_type": "code", "execution_count": 77, "metadata": { "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Player X move: (2, 5)\n", ". . . . . . .\n", ". . . . . . .\n", ". . . . . . .\n", ". . . . . . .\n", ". . . . . . .\n", ". . X . . . .\n", "\n", "Player O move: (1, 5)\n", ". . . . . . .\n", ". . . . . . .\n", ". . . . . . .\n", ". . . . . . .\n", ". . . . . . .\n", ". O X . . . .\n", "\n", "Player X move: (5, 5)\n", ". . . . . . .\n", ". . . . . . .\n", ". . . . . . .\n", ". . . . . . .\n", ". . . . . . .\n", ". O X . . X .\n", "\n", "Player O move: (4, 5)\n", ". . . . . . .\n", ". . . . . . .\n", ". . . . . . .\n", ". . . . . . .\n", ". . . . . . .\n", ". O X . O X .\n", "\n", "Player X move: (4, 4)\n", ". . . . . . .\n", ". . . . . . .\n", ". . . . . . .\n", ". . . . . . .\n", ". . . . X . .\n", ". O X . O X .\n", "\n", "Player O move: (2, 4)\n", ". . . . . . .\n", ". . . . . . .\n", ". . . . . . .\n", ". . . . . . .\n", ". . O . X . .\n", ". O X . O X .\n", "\n", "Player X move: (2, 3)\n", ". . . . . . .\n", ". . . . . . .\n", ". . . . . . .\n", ". . X . . . .\n", ". . O . X . .\n", ". O X . O X .\n", "\n", "Player O move: (1, 4)\n", ". . . . . . .\n", ". . . . . . .\n", ". . . . . . .\n", ". . X . . . .\n", ". O O . X . .\n", ". O X . O X .\n", "\n", "Player X move: (0, 5)\n", ". . . . . . .\n", ". . . . . . .\n", ". . . . . . .\n", ". . X . . . .\n", ". O O . X . .\n", "X O X . O X .\n", "\n", "Player O move: (5, 4)\n", ". . . . . . .\n", ". . . . . . .\n", ". . . . . . .\n", ". . X . . . .\n", ". O O . X O .\n", "X O X . O X .\n", "\n", "Player X move: (5, 3)\n", ". . . . . . .\n", ". . . . . . .\n", ". . . . . . .\n", ". . X . . X .\n", ". O O . X O .\n", "X O X . O X .\n", "\n", "Player O move: (6, 5)\n", ". . . . . . .\n", ". . . . . . .\n", ". . . . . . .\n", ". . X . . X .\n", ". O O . X O .\n", "X O X . O X O\n", "\n", "Player X move: (1, 3)\n", ". . . . . . .\n", ". . . . . . .\n", ". . . . . . .\n", ". X X . . X .\n", ". O O . X O .\n", "X O X . O X O\n", "\n", "Player O move: (6, 4)\n", ". . . . . . .\n", ". . . . . . .\n", ". . . . . . .\n", ". X X . . X .\n", ". O O . X O O\n", "X O X . O X O\n", "\n", "Player X move: (5, 2)\n", ". . . . . . .\n", ". . . . . . .\n", ". . . . . X .\n", ". X X . . X .\n", ". O O . X O O\n", "X O X . O X O\n", "\n", "Player O move: (0, 4)\n", ". . . . . . .\n", ". . . . . . .\n", ". . . . . X .\n", ". X X . . X .\n", "O O O . X O O\n", "X O X . O X O\n", "\n", "Player X move: (0, 3)\n", ". . . . . . .\n", ". . . . . . .\n", ". . . . . X .\n", "X X X . . X .\n", "O O O . X O O\n", "X O X . O X O\n", "\n", "Player O move: (0, 2)\n", ". . . . . . .\n", ". . . . . . .\n", "O . . . . X .\n", "X X X . . X .\n", "O O O . X O O\n", "X O X . O X O\n", "\n", "Player X move: (0, 1)\n", ". . . . . . .\n", "X . . . . . .\n", "O . . . . X .\n", "X X X . . X .\n", "O O O . X O O\n", "X O X . O X O\n", "\n", "Player O move: (0, 0)\n", "O . . . . . .\n", "X . . . . . .\n", "O . . . . X .\n", "X X X . . X .\n", "O O O . X O O\n", "X O X . O X O\n", "\n", "Player X move: (5, 1)\n", "O . . . . . .\n", "X . . . . X .\n", "O . . . . X .\n", "X X X . . X .\n", "O O O . X O O\n", "X O X . O X O\n", "\n", "Player O move: (4, 3)\n", "O . . . . . .\n", "X . . . . X .\n", "O . . . . X .\n", "X X X . O X .\n", "O O O . X O O\n", "X O X . O X O\n", "\n", "Player X move: (6, 3)\n", "O . . . . . .\n", "X . . . . X .\n", "O . . . . X .\n", "X X X . O X X\n", "O O O . X O O\n", "X O X . O X O\n", "\n", "Player O move: (5, 0)\n", "O . . . . O .\n", "X . . . . X .\n", "O . . . . X .\n", "X X X . O X X\n", "O O O . X O O\n", "X O X . O X O\n", "\n", "Player X move: (3, 5)\n", "O . . . . O .\n", "X . . . . X .\n", "O . . . . X .\n", "X X X . O X X\n", "O O O . X O O\n", "X O X X O X O\n", "\n", "Player O move: (1, 2)\n", "O . . . . O .\n", "X . . . . X .\n", "O O . . . X .\n", "X X X . O X X\n", "O O O . X O O\n", "X O X X O X O\n", "\n", "Player X move: (1, 1)\n", "O . . . . O .\n", "X X . . . X .\n", "O O . . . X .\n", "X X X . O X X\n", "O O O . X O O\n", "X O X X O X O\n", "\n", "Player O move: (3, 4)\n", "O . . . . O .\n", "X X . . . X .\n", "O O . . . X .\n", "X X X . O X X\n", "O O O O X O O\n", "X O X X O X O\n", "\n" ] }, { "data": { "text/plain": [ "-1" ] }, "execution_count": 77, "metadata": {}, "output_type": "execute_result" } ], "source": [ "play_game(ConnectFour(), dict(X=random_player, O=random_player), verbose=True).utility" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Transposition Tables\n", "\n", "By treating the game tree as a tree, we can arrive at the same state through different paths, and end up duplicating effort. In state-space search, we kept a table of `reached` states to prevent this. For game-tree search, we can achieve the same effect by applying the `@cache` decorator to the `min_value` and `max_value` functions. We'll use the suffix `_tt` to indicate a function that uses these transisiton tables." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "def minimax_search_tt(game, state):\n", " \"\"\"Search game to determine best move; return (value, move) pair.\"\"\"\n", "\n", " player = state.to_move\n", "\n", " @cache\n", " def max_value(state):\n", " if game.is_terminal(state):\n", " return game.utility(state, player), None\n", " v, move = -infinity, None\n", " for a in game.actions(state):\n", " v2, _ = min_value(game.result(state, a))\n", " if v2 > v:\n", " v, move = v2, a\n", " return v, move\n", "\n", " @cache\n", " def min_value(state):\n", " if game.is_terminal(state):\n", " return game.utility(state, player), None\n", " v, move = +infinity, None\n", " for a in game.actions(state):\n", " v2, _ = max_value(game.result(state, a))\n", " if v2 < v:\n", " v, move = v2, a\n", " return v, move\n", "\n", " return max_value(state)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For alpha-beta search, we can still use a cache, but it should be based just on the state, not on whatever values alpha and beta have." ] }, { "cell_type": "code", "execution_count": 79, "metadata": {}, "outputs": [], "source": [ "def cache1(function):\n", " \"Like lru_cache(None), but only considers the first argument of function.\"\n", " cache = {}\n", " def wrapped(x, *args):\n", " if x not in cache:\n", " cache[x] = function(x, *args)\n", " return cache[x]\n", " return wrapped\n", "\n", "def alphabeta_search_tt(game, state):\n", " \"\"\"Search game to determine best action; use alpha-beta pruning.\n", " As in [Figure 5.7], this version searches all the way to the leaves.\"\"\"\n", "\n", " player = state.to_move\n", "\n", " @cache1\n", " def max_value(state, alpha, beta):\n", " if game.is_terminal(state):\n", " return game.utility(state, player), None\n", " v, move = -infinity, None\n", " for a in game.actions(state):\n", " v2, _ = min_value(game.result(state, a), alpha, beta)\n", " if v2 > v:\n", " v, move = v2, a\n", " alpha = max(alpha, v)\n", " if v >= beta:\n", " return v, move\n", " return v, move\n", "\n", " @cache1\n", " def min_value(state, alpha, beta):\n", " if game.is_terminal(state):\n", " return game.utility(state, player), None\n", " v, move = +infinity, None\n", " for a in game.actions(state):\n", " v2, _ = max_value(game.result(state, a), alpha, beta)\n", " if v2 < v:\n", " v, move = v2, a\n", " beta = min(beta, v)\n", " if v <= alpha:\n", " return v, move\n", " return v, move\n", "\n", " return max_value(state, -infinity, +infinity)" ] }, { "cell_type": "code", "execution_count": 81, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 593 ms, sys: 52 ms, total: 645 ms\n", "Wall time: 655 ms\n" ] }, { "data": { "text/plain": [ "O X X\n", "X O O\n", "O X X" ] }, "execution_count": 81, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%time play_game(TicTacToe(), {'X':player(alphabeta_search_tt), 'O':player(minimax_search_tt)})" ] }, { "cell_type": "code", "execution_count": 82, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 3.07 s, sys: 30.7 ms, total: 3.1 s\n", "Wall time: 3.15 s\n" ] }, { "data": { "text/plain": [ "O X X\n", "X O O\n", "O X X" ] }, "execution_count": 82, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%time play_game(TicTacToe(), {'X':player(alphabeta_search), 'O':player(minimax_search)})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Heuristic Cutoffs" ] }, { "cell_type": "code", "execution_count": 63, "metadata": {}, "outputs": [], "source": [ "def cutoff_depth(d):\n", " \"\"\"A cutoff function that searches to depth d.\"\"\"\n", " return lambda game, state, depth: depth > d\n", "\n", "def h_alphabeta_search(game, state, cutoff=cutoff_depth(6), h=lambda s, p: 0):\n", " \"\"\"Search game to determine best action; use alpha-beta pruning.\n", " As in [Figure 5.7], this version searches all the way to the leaves.\"\"\"\n", "\n", " player = state.to_move\n", "\n", " @cache1\n", " def max_value(state, alpha, beta, depth):\n", " if game.is_terminal(state):\n", " return game.utility(state, player), None\n", " if cutoff(game, state, depth):\n", " return h(state, player), None\n", " v, move = -infinity, None\n", " for a in game.actions(state):\n", " v2, _ = min_value(game.result(state, a), alpha, beta, depth+1)\n", " if v2 > v:\n", " v, move = v2, a\n", " alpha = max(alpha, v)\n", " if v >= beta:\n", " return v, move\n", " return v, move\n", "\n", " @cache1\n", " def min_value(state, alpha, beta, depth):\n", " if game.is_terminal(state):\n", " return game.utility(state, player), None\n", " if cutoff(game, state, depth):\n", " return h(state, player), None\n", " v, move = +infinity, None\n", " for a in game.actions(state):\n", " v2, _ = max_value(game.result(state, a), alpha, beta, depth + 1)\n", " if v2 < v:\n", " v, move = v2, a\n", " beta = min(beta, v)\n", " if v <= alpha:\n", " return v, move\n", " return v, move\n", "\n", " return max_value(state, -infinity, +infinity, 0)" ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "CPU times: user 367 ms, sys: 7.9 ms, total: 375 ms\n", "Wall time: 375 ms\n" ] }, { "data": { "text/plain": [ "O X X\n", "X O O\n", "O X X" ] }, "execution_count": 54, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%time play_game(TicTacToe(), {'X':player(h_alphabeta_search), 'O':player(h_alphabeta_search)})" ] }, { "cell_type": "code", "execution_count": 60, "metadata": { "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ ". . . . . . .\n", ". . . . . . .\n", ". . . . . . .\n", ". . . . . . .\n", ". . . . . . .\n", ". . . . . X .\n", "\n", ". . . . . . .\n", ". . . . . . .\n", ". . . . . . .\n", ". . . . . . .\n", ". . . . . . .\n", ". . . . . X O\n", "\n", ". . . . . . .\n", ". . . . . . .\n", ". . . . . . .\n", ". . . . . . .\n", ". . . . . . X\n", ". . . . . X O\n", "\n", ". . . . . . .\n", ". . . . . . .\n", ". . . . . . .\n", ". . . . . . .\n", ". . . . . O X\n", ". . . . . X O\n", "\n", ". . . . . . .\n", ". . . . . . .\n", ". . . . . . .\n", ". . . . . . .\n", ". . . . . O X\n", ". . . . X X O\n", "\n", ". . . . . . .\n", ". . . . . . .\n", ". . . . . . .\n", ". . . . . O .\n", ". . . . . O X\n", ". . . . X X O\n", "\n", ". . . . . . .\n", ". . . . . . .\n", ". . . . . . .\n", ". . . . . O .\n", ". . . . X O X\n", ". . . . X X O\n", "\n", ". . . . . . .\n", ". . . . . . .\n", ". . . . . . .\n", ". . . . . O .\n", ". . . . X O X\n", ". O . . X X O\n", "\n", ". . . . . . .\n", ". . . . . . .\n", ". . . . . . .\n", ". . . . . O .\n", ". X . . X O X\n", ". O . . X X O\n", "\n", ". . . . . . .\n", ". . . . . . .\n", ". . . . . . .\n", ". . . . . O O\n", ". X . . X O X\n", ". O . . X X O\n", "\n", ". . . . . . .\n", ". . . . . . .\n", ". . . . . . .\n", ". . . . X O O\n", ". X . . X O X\n", ". O . . X X O\n", "\n", ". . . . . . .\n", ". . . . . . .\n", ". . . . . . .\n", ". . . . X O O\n", ". X . . X O X\n", ". O . O X X O\n", "\n", ". . . . . . .\n", ". . . . . . .\n", ". . . . . X .\n", ". . . . X O O\n", ". X . . X O X\n", ". O . O X X O\n", "\n", ". . . . . . .\n", ". . . . . O .\n", ". . . . . X .\n", ". . . . X O O\n", ". X . . X O X\n", ". O . O X X O\n", "\n", ". . . . . . .\n", ". . . . . O .\n", ". . . . . X .\n", ". X . . X O O\n", ". X . . X O X\n", ". O . O X X O\n", "\n", ". . . . . . .\n", ". . . . . O .\n", ". . . . . X O\n", ". X . . X O O\n", ". X . . X O X\n", ". O . O X X O\n", "\n", ". . . . . . .\n", ". . . . . O .\n", ". X . . . X O\n", ". X . . X O O\n", ". X . . X O X\n", ". O . O X X O\n", "\n", ". . . . . . .\n", ". . . . . O .\n", ". X . . . X O\n", ". X . . X O O\n", ". X . . X O X\n", ". O O O X X O\n", "\n", ". . . . . . .\n", ". . . . . O .\n", ". X . . . X O\n", ". X . . X O O\n", ". X . . X O X\n", "X O O O X X O\n", "\n", ". . . . . . .\n", ". . . . . O O\n", ". X . . . X O\n", ". X . . X O O\n", ". X . . X O X\n", "X O O O X X O\n", "\n", ". . . . . . X\n", ". . . . . O O\n", ". X . . . X O\n", ". X . . X O O\n", ". X . . X O X\n", "X O O O X X O\n", "\n", ". . . . . . X\n", ". . . . . O O\n", ". X . . . X O\n", ". X . . X O O\n", "O X . . X O X\n", "X O O O X X O\n", "\n", ". . . . . X X\n", ". . . . . O O\n", ". X . . . X O\n", ". X . . X O O\n", "O X . . X O X\n", "X O O O X X O\n", "\n", ". . . . . X X\n", ". . . . . O O\n", ". X . . . X O\n", ". X . . X O O\n", "O X . O X O X\n", "X O O O X X O\n", "\n", ". . . . . X X\n", ". . . . . O O\n", ". X . . . X O\n", ". X . X X O O\n", "O X . O X O X\n", "X O O O X X O\n", "\n", ". . . . . X X\n", ". . . . . O O\n", ". X . . O X O\n", ". X . X X O O\n", "O X . O X O X\n", "X O O O X X O\n", "\n", ". . . . . X X\n", ". . . . . O O\n", ". X . X O X O\n", ". X . X X O O\n", "O X . O X O X\n", "X O O O X X O\n", "\n", ". . . . . X X\n", ". . . . O O O\n", ". X . X O X O\n", ". X . X X O O\n", "O X . O X O X\n", "X O O O X X O\n", "\n", ". . . . . X X\n", ". . . X O O O\n", ". X . X O X O\n", ". X . X X O O\n", "O X . O X O X\n", "X O O O X X O\n", "\n", ". . . . O X X\n", ". . . X O O O\n", ". X . X O X O\n", ". X . X X O O\n", "O X . O X O X\n", "X O O O X X O\n", "\n", ". . . X O X X\n", ". . . X O O O\n", ". X . X O X O\n", ". X . X X O O\n", "O X . O X O X\n", "X O O O X X O\n", "\n", "CPU times: user 8.82 s, sys: 146 ms, total: 8.96 s\n", "Wall time: 9.19 s\n" ] }, { "data": { "text/plain": [ "1" ] }, "execution_count": 60, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%time play_game(ConnectFour(), {'X':player(h_alphabeta_search), 'O':random_player}, verbose=True).utility" ] }, { "cell_type": "code", "execution_count": 61, "metadata": { "scrolled": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ ". . . . . . .\n", ". . . . . . .\n", ". . . . . . .\n", ". . . . . . .\n", ". . . . . . .\n", ". . . . . X .\n", "\n", ". . . . . . .\n", ". . . . . . .\n", ". . . . . . .\n", ". . . . . . .\n", ". . . . . O .\n", ". . . . . X .\n", "\n", ". . . . . . .\n", ". . . . . . .\n", ". . . . . . .\n", ". . . . . . .\n", ". . . . . O .\n", ". . . . X X .\n", "\n", ". . . . . . .\n", ". . . . . . .\n", ". . . . . . .\n", ". . . . . . .\n", ". . . . . O .\n", ". . O . X X .\n", "\n", ". . . . . . .\n", ". . . . . . .\n", ". . . . . . .\n", ". . . . . . .\n", ". . . . X O .\n", ". . O . X X .\n", "\n", ". . . . . . .\n", ". . . . . . .\n", ". . . . . . .\n", ". . . . . . .\n", ". . . . X O .\n", ". . O O X X .\n", "\n", ". . . . . . .\n", ". . . . . . .\n", ". . . . . . .\n", ". . . . . . .\n", ". . . . X O .\n", ". X O O X X .\n", "\n", ". . . . . . .\n", ". . . . . . .\n", ". . . . . . .\n", ". . . . . . .\n", ". O . . X O .\n", ". X O O X X .\n", "\n", ". . . . . . .\n", ". . . . . . .\n", ". . . . . . .\n", ". X . . . . .\n", ". O . . X O .\n", ". X O O X X .\n", "\n", ". . . . . . .\n", ". . . . . . .\n", ". O . . . . .\n", ". X . . . . .\n", ". O . . X O .\n", ". X O O X X .\n", "\n", ". . . . . . .\n", ". . . . . . .\n", ". O . . . . .\n", ". X . . . . .\n", ". O . . X O .\n", "X X O O X X .\n", "\n", ". . . . . . .\n", ". . . . . . .\n", ". O . . . . .\n", ". X . . . . .\n", "O O . . X O .\n", "X X O O X X .\n", "\n", ". . . . . . .\n", ". . . . . . .\n", ". O . . . . .\n", ". X . . X . .\n", "O O . . X O .\n", "X X O O X X .\n", "\n", ". . . . . . .\n", ". . . . . . .\n", ". O . . O . .\n", ". X . . X . .\n", "O O . . X O .\n", "X X O O X X .\n", "\n", ". . . . . . .\n", ". . . . X . .\n", ". O . . O . .\n", ". X . . X . .\n", "O O . . X O .\n", "X X O O X X .\n", "\n", ". . . . . . .\n", ". . . . X . .\n", ". O . . O . .\n", ". X . . X O .\n", "O O . . X O .\n", "X X O O X X .\n", "\n", ". . . . . . .\n", ". . . . X . .\n", ". O . . O X .\n", ". X . . X O .\n", "O O . . X O .\n", "X X O O X X .\n", "\n", ". . . . . . .\n", ". . . . X . .\n", ". O . . O X .\n", ". X . . X O .\n", "O O O . X O .\n", "X X O O X X .\n", "\n", ". . . . . . .\n", ". . . . X . .\n", ". O . . O X .\n", ". X . . X O .\n", "O O O X X O .\n", "X X O O X X .\n", "\n", ". . . . . . .\n", ". . . . X O .\n", ". O . . O X .\n", ". X . . X O .\n", "O O O X X O .\n", "X X O O X X .\n", "\n", ". . . . . . .\n", ". . . . X O .\n", ". O . . O X .\n", ". X . X X O .\n", "O O O X X O .\n", "X X O O X X .\n", "\n", ". . . . . . .\n", ". . . . X O .\n", ". O . O O X .\n", ". X . X X O .\n", "O O O X X O .\n", "X X O O X X .\n", "\n", ". . . . . . .\n", ". . . X X O .\n", ". O . O O X .\n", ". X . X X O .\n", "O O O X X O .\n", "X X O O X X .\n", "\n", ". . . O . . .\n", ". . . X X O .\n", ". O . O O X .\n", ". X . X X O .\n", "O O O X X O .\n", "X X O O X X .\n", "\n", ". . . O . . .\n", ". . . X X O .\n", ". O . O O X .\n", ". X X X X O .\n", "O O O X X O .\n", "X X O O X X .\n", "\n", "CPU times: user 12.1 s, sys: 237 ms, total: 12.4 s\n", "Wall time: 12.9 s\n" ] }, { "data": { "text/plain": [ "1" ] }, "execution_count": 61, "metadata": {}, "output_type": "execute_result" } ], "source": [ "%time play_game(ConnectFour(), {'X':player(h_alphabeta_search), 'O':player(h_alphabeta_search)}, verbose=True).utility" ] }, { "cell_type": "code", "execution_count": 83, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Result states: 6,589; Terminal tests: 3,653; for alphabeta_search_tt\n", "Result states: 25,703; Terminal tests: 25,704; for alphabeta_search\n", "Result states: 4,687; Terminal tests: 2,805; for h_alphabeta_search\n", "Result states: 16,167; Terminal tests: 5,478; for minimax_search_tt\n" ] } ], "source": [ "class CountCalls:\n", " \"\"\"Delegate all attribute gets to the object, and count them in ._counts\"\"\"\n", " def __init__(self, obj):\n", " self._object = obj\n", " self._counts = Counter()\n", " \n", " def __getattr__(self, attr):\n", " \"Delegate to the original object, after incrementing a counter.\"\n", " self._counts[attr] += 1\n", " return getattr(self._object, attr)\n", " \n", "def report(game, searchers):\n", " for searcher in searchers:\n", " game = CountCalls(game)\n", " searcher(game, game.initial)\n", " print('Result states: {:7,d}; Terminal tests: {:7,d}; for {}'.format(\n", " game._counts['result'], game._counts['is_terminal'], searcher.__name__))\n", " \n", "report(TicTacToe(), (alphabeta_search_tt, alphabeta_search, h_alphabeta_search, minimax_search_tt))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Monte Carlo Tree Search" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "class Node:\n", " def __init__(self, parent, )\n", "def mcts(state, game, N=1000):" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Heuristic Search Algorithms" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "t = CountCalls(TicTacToe())\n", " \n", "play_game(t, dict(X=minimax_player, O=minimax_player), verbose=True)\n", "t._counts" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "for tactic in (three, fork, center, opposite_corner, corner, any):\n", " for s in squares:\n", " if tactic(board, s,player): return s\n", " for s ins quares:\n", " if tactic(board, s, opponent): return s" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "\n", "\n", "def ucb(U, N, C=2**0.5, parentN=100):\n", " return round(U/N + C * math.sqrt(math.log(parentN)/N), 2)\n", "\n", "{C: (ucb(60, 79, C), ucb(1, 10, C), ucb(2, 11, C)) \n", " for C in (1.4, 1.5)}\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def ucb(U, N, parentN=100, C=2):\n", " return U/N + C * math.sqrt(math.log(parentN)/N)\n", "\n", "\n", "C = 1.4 \n", "\n", "class Node:\n", " def __init__(self, name, children=(), U=0, N=0, parent=None, p=0.5):\n", " self.__dict__.update(name=name, U=U, N=N, parent=parent, children=children, p=p)\n", " for c in children:\n", " c.parent = self\n", " \n", " def __repr__(self):\n", " return '{}:{}/{}={:.0%}{}'.format(self.name, self.U, self.N, self.U/self.N, self.children)\n", " \n", "def select(n):\n", " if n.children:\n", " return select(max(n.children, key=ucb))\n", " else:\n", " return n\n", " \n", "def back(n, amount):\n", " if n:\n", " n.N += 1\n", " n.U += amount\n", " back(n.parent, 1 - amount)\n", " \n", " \n", "def one(root): \n", " n = select(root)\n", " amount = int(random.uniform(0, 1) < n.p)\n", " back(n, amount)\n", " \n", "def ucb(n): \n", " return (float('inf') if n.N == 0 else\n", " n.U / n.N + C * math.sqrt(math.log(n.parent.N)/n.N))\n", "\n", "\n", "tree = Node('root', [Node('a', p=.8, children=[Node('a1', p=.05), \n", " Node('a2', p=.25,\n", " children=[Node('a2a', p=.7), Node('a2b')])]),\n", " Node('b', p=.5, children=[Node('b1', p=.6,\n", " children=[Node('b1a', p=.3), Node('b1b')]), \n", " Node('b2', p=.4)]),\n", " Node('c', p=.1)])\n", "\n", "for i in range(100):\n", " one(tree); \n", "for c in tree.children: print(c)\n", "'select', select(tree), 'tree', tree\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "us = (100, 50, 25, 10, 5, 1)\n", "infinity = float('inf')\n", "\n", "@lru_cache(None)\n", "def f1(n, denom):\n", " return (0 if n == 0 else\n", " infinity if n < 0 or not denom else\n", " min(1 + f1(n - denom[0], denom),\n", " f1(n, denom[1:])))\n", " \n", "@lru_cache(None)\n", "def f2(n, denom):\n", " @lru_cache(None)\n", " def f(n):\n", " return (0 if n == 0 else\n", " infinity if n < 0 else\n", " 1 + min(f(n - d) for d in denom))\n", " return f(n)\n", "\n", "@lru_cache(None)\n", "def f3(n, denom):\n", " return (0 if n == 0 else\n", " infinity if n < 0 or not denom else\n", " min(k + f2(n - k * denom[0], denom[1:]) \n", " for k in range(1 + n // denom[0])))\n", " \n", "\n", "def g(n, d=us): return f1(n, d), f2(n, d), f3(n, d)\n", " \n", "n = 12345\n", "%time f1(n, us)\n", "%time f2(n, us)\n", "%time f3(n, us)\n", " " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.2" } }, "nbformat": 4, "nbformat_minor": 2 }