Assignment 5 - Q-Learning for NFL Strategy

Objectives

Introduction

Recall Assignment #2, for which you computed optimal and equilibrium strategies for the North American gridiron football simulation game. Those calculations required knowledge of the model of the game – the possible outcomes of each position given the actions (plays) selected, the probability of those outcomes, and the conditions for reward at the end of the game.

For this assignment, we will use reinforcement learning, specifially Q-learning, to learn an effective policy (strategy) for selecting plays. Q-learning is model-free, so can be applied even when you don't know the probability distribution of the outcomes as long as you can observe the outcomes. (Q-learning also works when you can observe rewards but don't have a model of when they occur; we will not use this property, as we still know that the objective is to score before time runs out.)

Assignment

Write a function called q_learn in a module called qfl that returns a function that takes a non-terminal position and returns the selected play. Your q_learn function can take up to 10 seconds to execute (measured on the Zoo). The function that it returns should execute in less than 10 microseconds for any position.

Your q_learn function can observe outcomes from a particular position given a choice of action (offensive play) by calling the outcome method in the game module. The outcome method takes a position and action and returns the outcome as a triple specifying the number of yards gained, the time elapsed (in 5-second ticks), and a Boolean flag indicating whether the action resulted in a turnover (in which case the game is over and the offense loses). Although the current implementation of the game module uses the same model as Assignment #2 and chooses defensive plays uniformly randomly, your q_learn function will be tested on other models, and the only parts of the game module you may access are those whose names do not start with an underscore:

The game module also contains a simulate function that can be used to test the function returned by your q_learn function. To use simulate and to be graded by the final test scripts, you must write three additional functions in your qfl module:

Because there are too many positions to compute a Q-value for each individually within the time bound, you should use a function approximator such as a linear approximator. You should be able to achieve around 50% wins against a defense that selects plays uniformly randomly using the probability distribution of play outcomes from Assignment #2. By comparison, the optimal strategy against the random defense achieves around 57%, always selecting play 0, 1, or 2 achieves around 29%, 23%, and 0.2% respectively, and selecting an offensive play uniformly randomly achieves around 12%.

Additional Requirements

Examples

[jrg94@perch QFL]$ pypy3 game.py 1000
0.507
  

Submissions

Submit just your qfl.py module along with any other supporting modules and your log. There is no need to submit a makefile or executable; the test scripts create an executable called QFL that is used to run the test drivers.