Assignment 5 - Q-Learning for NFL Strategy

Objectives

to implement a model-free, bootstrapping learning algorithm

Introduction

Recall Assignment #2, for which you computed optimal and equilibrium strategies for the North American gridiron football simulation game. Those calculations required knowledge of the model of the game – the possible outcomes of each position given the actions (plays) selected, the probability of those outcomes, and the conditions for reward at the end of the game.

For this assignment, we will use reinforcement learning, specifially Q-learning, to learn an effective policy (strategy) for selecting plays. Q-learning is model-free, so can be applied even when you don't know the probability distribution of the outcomes as long as you can observe the outcomes. (Q-learning also works when you can observe rewards but don't have a model of when they occur; we will not use this property, as we still know that the objective is to score before time runs out.)

Assignment

Write a function called q_learn in a module called qfl that returns a function that takes a non-terminal position and returns the selected play. Your q_learn function can take up to 10 seconds to execute (measured on the Zoo). The function that it returns should execute in less than 10 microseconds for any position.

Your q_learn function can observe outcomes from a particular position given a choice of action (offensive play) by calling the outcome method in the game module. The outcome method takes a position and action and returns the outcome as a triple specifying the number of yards gained, the time elapsed (in 5-second ticks), and a Boolean flag indicating whether the action resulted in a turnover (in which case the game is over and the offense loses). Although the current implementation of the game module uses the same model as Assignment #2 and chooses defensive plays uniformly randomly, your q_learn function will be tested on other models, and the only parts of the game module you may access are those whose names do not start with an underscore:

the outcome function;
the initial variable, which defines the initial position of the game as a 4-tuple containing the yards required to score, the downs left, the yards required to get a new set of down, and the number of timer ticks left; and
the playbook_size variable, which defines the number of offensive plays to choose from (where the plays are then numbered 0, 1, ..., playbook_size - 1.

The game module also contains a simulate function that can be used to test the function returned by your q_learn function. To use simulate and to be graded by the final test scripts, you must write three additional functions in your qfl module:

game_over, which takes a position as a 4-tuple as described above and returns True if that is a terminal position and False otherwise;
win, which takes a terminal position as a 4-tuple as described above and returns True if the offense won and False otherwise; and
result, which takes a non-terminal position as described above and the outcome of an action as returned by outcome and returns the resulting next position as a 4-tuple in the form described above.

Because there are too many positions to compute a Q-value for each individually within the time bound, you should use a function approximator such as a linear approximator. You should be able to achieve around 50% wins against a defense that selects plays uniformly randomly using the probability distribution of play outcomes from Assignment #2. By comparison, the optimal strategy against the random defense achieves around 57%, always selecting play 0, 1, or 2 achieves around 29%, 23%, and 0.2% respectively, and selecting an offensive play uniformly randomly achieves around 12%.

Additional Requirements

Tests will be executed with pypy3, so write your code for Python 3.5 and only use modules are available for pypy3 on the Zoo (this excludes numpy).

Examples

[jrg94@perch QFL]$ pypy3 game.py 1000
0.507

Submissions

Submit just your qfl.py module along with any other supporting modules and your log. There is no need to submit a makefile or executable; the test scripts create an executable called QFL that is used to run the test drivers.