Assignment 5 - Q-Learning for NFL Strategy
Objectives
- to implement a model-free, bootstrapping learning algorithm
Introduction
Recall Assignment #2, for which you computed optimal and equilibrium strategies for the North American gridiron football simulation game. Those calculations required knowledge of the model of the game – the possible outcomes of each position given the actions (plays) selected, the probability of those outcomes, and the conditions for reward at the end of the game.
For this assignment, we will use reinforcement learning, specifially Q-learning, to learn an effective policy (strategy) for selecting plays. Q-learning is model-free, so can be applied even when you don't know the probability distribution of the outcomes as long as you can observe the outcomes. (Q-learning also works when you can observe rewards but don't have a model of when they occur; we will not use this property, as we still know that the objective is to score before time runs out.)
Assignment
Write a function called q_learn
in a module called qfl
that returns a
function that takes a non-terminal position and returns the selected
play. Your q_learn
function can take up to 10 seconds
to execute (measured on the Zoo).
The function that it returns should execute in less than 10
microseconds for any position.
Your q_learn
function can observe outcomes from a particular position
given a choice of action (offensive play) by calling the outcome
method
in the game
module. The outcome
method takes a position
and action and returns the outcome as a triple specifying the number of yards gained,
the time elapsed (in 5-second ticks), and a Boolean flag indicating whether the action
resulted in a turnover (in which case the game is over and the offense loses).
Although the current implementation of the game
module uses the same model
as Assignment #2 and chooses defensive plays uniformly randomly, your q_learn
function will be tested on other models, and the only parts of the game
module you may access are those whose names do not start with an underscore:
- the
outcome
function; - the
initial
variable, which defines the initial position of the game as a 4-tuple containing the yards required to score, the downs left, the yards required to get a new set of down, and the number of timer ticks left; and - the
playbook_size
variable, which defines the number of offensive plays to choose from (where the plays are then numbered 0, 1, ...,playbook_size - 1
.
The game
module also contains a simulate
function that can
be used to test the function returned by your q_learn
function.
To use simulate
and to be graded by the final test scripts, you must write
three additional functions in your qfl
module:
-
game_over
, which takes a position as a 4-tuple as described above and returnsTrue
if that is a terminal position andFalse
otherwise; -
win
, which takes a terminal position as a 4-tuple as described above and returnsTrue
if the offense won andFalse
otherwise; and -
result
, which takes a non-terminal position as described above and the outcome of an action as returned byoutcome
and returns the resulting next position as a 4-tuple in the form described above.
Because there are too many positions to compute a Q-value for each individually within the time bound, you should use a function approximator such as a linear approximator. You should be able to achieve around 50% wins against a defense that selects plays uniformly randomly using the probability distribution of play outcomes from Assignment #2. By comparison, the optimal strategy against the random defense achieves around 57%, always selecting play 0, 1, or 2 achieves around 29%, 23%, and 0.2% respectively, and selecting an offensive play uniformly randomly achieves around 12%.
Additional Requirements
- Tests will be executed with
pypy3
, so write your code for Python 3.5 and only use modules are available forpypy3
on the Zoo (this excludesnumpy
).
Examples
[jrg94@perch QFL]$ pypy3 game.py 1000 0.507
Submissions
Submit just yourqfl.py
module along with any other supporting modules and your
log. There is no need to submit a makefile or executable;
the test scripts create an executable
called QFL
that is used to run the test drivers.