hw7 - reinforcement learning

CS470/CS570

  • Assigned Monday April 6th.
  • Due: Thursday April 16th, 11:59pm
  • Extension: Sunday April 19th, 11:59pm EDT

Reading:

Edit this file and submit it on the zoo.

  • Name: [enter]
  • Email address: [enter]
  • Hours: [enter]

The RL Deal

Reinforcement learning (RL) is an area of machine learning concerned with how software agents ought to take actions in an environment in order to maximize the notion of cumulative reward. Reinforcement learning is one of three basic machine learning paradigms, alongside supervised learning and unsupervised learning.

Reinforcement learning differs from supervised learning in not needing labelled input/output pairs be presented, and in not needing sub-optimal actions to be explicitly corrected. Instead the focus is on finding a balance between exploration (of uncharted territory) and exploitation (of current knowledge).[1] The environment is typically stated in the form of a Markov decision process (MDP), because many reinforcement learning algorithms for this context utilize dynamic programming techniques.[2] The main difference between the classical dynamic programming methods and reinforcement learning algorithms is that the latter do not assume knowledge of an exact mathematical model of the MDP and they target large MDPs where exact methods become infeasible. (https://en.wikipedia.org/wiki/Reinforcement_learning)

In this assignment, you will implement RL programs in two domains: blackjack, like in hw3, and another domain of your choice.

The OpenAI project (https://openai.com/)

OpenAI is an independent research organization consisting of the for-profit corporation OpenAI LP and its parent organization, the non-profit OpenAI Inc. The corporation conducts research in the field of artificial intelligence (AI) with the stated aim to promote and develop friendly AI in such a way as to benefit humanity as a whole. (https://en.wikipedia.org/wiki/OpenAI)

Within OpenAI, there is the gym toolkit (https://gym.openai.com/) for developing and comparing reinforcement learning algorithms. It supports teaching agents everything from walking to playing games like Pong or Pinball.

Part 1: Learning Blackjack

For this assignment, you will first use the gym toolkit for blackjack. In gym terminology, blackjack is an environment (https://gym.openai.com/envs/Blackjack-v0/> - an easy one at that.

Your job is to develop a reinforcement learning agent for blackjack. We recommend a Q-learning agent. You will then conduct experiments to see how its performance varies when you modify various parameters. Tha primary parameter is the number of training trials. You should plot the results using mathplotlib inside this jupyter notebook. You want to show the error rate as a function of the trials or the other parameters.

Your notebook should include a discussion of your choices and results. Also, compare the RL blackjack performance to your Monte Carlo program from hw3.

Installing gym

See (https://reinforcementlearning4.fun/2019/05/24/how-to-install-openai-gym/)

On the zoo, you can install your own version of gym with the bash command:

pip3 install gym --user

You can then access gym inside jupyter notebooks, as seen below. It is important to use pip3 instead of pip to get version 3 of python.

Note: not all environments may be included in the default installation. For example, the atari games are absent. Tant pis.

In [1]:
## Note gym is not fully installed on the zoo yet.  You can try installing it locally.
import gym
env = gym.make('Blackjack-v0')

Part 2: Learn something else

OpenAI gym has lots of environments. (https://gym.openai.com/envs/#classic_control) For part 2, pick another environment and apply your learning algorithm from part 1. You may need to modify it. Then perform the same experiments as before, varying the paraments and measuring and graphing the performance. Include all the results inside this jupyter notebook.

In [ ]:
 
In [ ]: