Final Project

Objectives

to do something interesting with computational intelligence for games

Guidelines

for 474 students, workload should be in the same range as the first five assignments (something that would take an average 474 student 8-12 hours); for 574 students the expectation is 10-15 hours (and additional comments on other students' work as described below)
try to find a game that is interesting (takes some time to learn to play well) but simple enough so you don't have to spend too much time writing code to implement the rules of the game (the model)
you should implement a relevant algorithm (so just developing the model of a game and running it through OpenAIGym is not enough)
you are encouraged to work in pairs or even larger groups if each member performs the expected amount of work; a team could, for example, share work on the model of a game and then each implement a different algorithm and compare results
if sharing work on the model then you should still implement an algorithm on your own, unless you're working on a very complicated algorithm (for example something like AlphaGo with many steps, each interesting on its own)
you may use existing code for the model of a game (with citation) if you've chosen a sufficiently complicated game (something more complex than Pig) so that you may focus on the algorithm
you may write in any language as long as your code can be built and run on the Zoo; for Python please make it clear what packages need to be installed in a virtual environment to run your code

Project Ideas

Cribbage
- Schnell's discard tables (or something similar optimized against the greedy algorithm) loses some information: the opponent's discards are sampled offline and so don't consider your hand or the turn card (for example, the opponent can't discard the 8H 7D if you hold the 8H and the turn card is 7D). How much better can you do with online sampling? How does the number of samples affect the game results?
- Tune the constants in your heuristics using a genetic algorithm. Compare to the results from tuning with hill climbing.
- Train a neural network to estimate the probability of each possible match result when two reasonably good agents play against each other (your Assignment 1 agent is probably reasonably good), given the score at the beginning of a hand as input. Compare the results of supervised learning vs. unsupervised.
- Train a neural network to estimate the points earned by you and your opponent during the pegging phase, given the cards you have in your hand. Does using that estimate to determine which cards to keep improve performance against the greedy agent?
- What is the equilibrium strategy for discarding when you must choose whether to accept the result of a greedy policy that Schnell's table to estimate the number of points in the crib, one that uses a similar table optimized against the greedy agent, or a similar table optimized against that agent, etc, and your opponent must do the same? What happens when you optimize a new agent against that equilibrium agent?
- Implement a sampling perfect-information pegging agent and compare to the pegging policy you used for Assignment 1.
Backtracking
- Backtracking (DFS) or BFS by themselves are not too interesting, but could be
  - enhanced with rules to prune branches early or to select promising branches to explore first
  - compared to A* with different heuristics
Dynamic Programming (Backwards Induction)/Value Iteration
- Hoot Owl Hoot
- Multi-player Pig (>= 3 players)
- Variants of solitaire Yahtzee (Kismet, Yatzy, Super Yatzy, Generala)
- Backgammon endgames
- Small versions of Kalah
Simultaneous Games
- evolutionary game theory – something like Conway's Game of Life but where each cell is inhabited by something that plays the game differently; see how the population evolves over time
- Implement Counterfactual Regret Minimization for some small game
Search
- alpha-beta with transposition table, or scout, or MTD-f for some game you can find a reasonable heuristic for
- various enhancements of MCTS for something other than Kalah
Genetic Algorithms
- tune the parameters of a heuristic used with minimax or alpha-beta for some game
Reinforcement Learning
- implement DQN
Supervised Learning
- Fall 2019's solitaire Yahtzee assignment (I will copy the supporting code and scripts to /c/cs474/hw6 upon request; it is also available on the Zoo in /c/cs474/2019-fall/hw6)
- Train a neural network to recognize valid moves in Qwixx. Input would be game state (rightmost marked, number marked) for each row, value for each die, phase of turn (white dice, colored dice), and flag for mark in previous phase. Output is union of all possible actions (white in red, white in blue, ..., red+1st white, red+2nd white, ...). Goal is to validate idea that policy gradient methods will learn quickly what actions are valid if given huge penalty for chooing invalid actions.
Neural Networks
Content Generation
- invent a game and tune the parameters to be hard for agents using computational intelligence for some definition of "hard" (for example, "hard to beat a greedy agent using MCTS" or "hard for Q-learning")
Old competitions
- Gin Rummy EAAI Undergraduate Research Challenge (closed to submissions but the code is still available)

Videos

So that other students my comment on your approach, create a short video (~5 minutes?) describing the game you're developing an agent for and your approach (you are not expected to have completed the code for your approach yet). If you do not include an example of gameplay then provide a link to a source that describes the rules of your game. A narrated or captioned presentation exported from PowerPoint or similar presentation software will suffice.

Address the following questions as appropriate (some questions may not apply to your game/approach).

What is interesting about your game from a strategic point of view?
What is challenging about your game in terms of developing an agent that plays your game (high branching factor, large state space, hidden information, etc.)?
What simplifications have you made in order to improve your probability of success in a short period of time?
Why is your approach reasonable?
What other approaches might be better or worse?
What are the similarities and differences between your approach and projects and examples from class? (This should be more detailed the closer your game is to one addressed in class. For example, for a variant of Yahtzee, the algorithm will be the same and so you should focus on the differences in the implementation like what information is contained in states for Yahtzee vs. what information is in states for your variant.)
How do you plan to evaluate your agent?

Groups should submit one video, which may be slightly longer than the guideline to give each group member time to address the questions for their particular approach.

In the last week of class, watch 3 (graduate students: 4) videos from the Final Project Videos folder in the Canvas Media Library or from the YouTube links provided on Canvas and write short responses to them. There are several students or groups working independently on some games, so you should choose presentations on different games (don't watch 3 videos about Hoot Owl Hoot). Indicate which presentations you watched and then write your responses. Write at least 5 (graduate students: 7) total responses across the videos you watched (so 1-3 each). Each response should be a sentence or two giving your thoughts on the project, for example:

agreement with the approach ("you say A should work because it is similar to what worked for B; I agree because it is also similar to C, which worked");
disagreement with the approach ("you say A should work because it is similar to what worked for B; I disagree because it is more like C, which didn't work well);
questions about the proposed approach ("how are you going to handle X?");
any difficulties you foresee and any ideas you might have for overcoming them ("you're probably going to run into trouble with A but you can get around that with B");
alternative approaches that you think might work ("you could also try C, which might work because of D"); or
anything else interesting that comes to mind.

Course staff will not be able to collect, collate, and distribute your responses, so if you want the project authors to see them, please enter them as comments on Canvas or YouTube (optional).

Deliverables and Submissions

Videos and Responses
- upload videos to Canvas by Mon Dec. 12 11:59pm
- upload responses to Canvas by Thu Dec. 15 5:00pm (no late days)
Code and Results
- due Dec. 21 5:30pm (no late days per University policy)
- use the Zoo submission system, assignment number 6; for groups, only one member should submit, but a list of group members must be clearly given in the test script
- include a build script/makefile and a test script so I don't have to figure out how to run your code; the test script should include a brief description of your game, what your code is doing, and the results (for example, "I'm using Q-learning to play NFL Strategy. Here's my winning percentage over 250000 games" or "I'm using MCTS to play Kalah. Here's my winning percentage over 100 games when playing with a minimax player with a simple heuristic searching to depth 4, with both sides making random moves 10% of the time:")

Grading

Video: 17%
Responses: 17%
Project: 66%

Your final project submission will be graded on the the soundness of your approach, including whether your chosen algorithm is appropriate for your chosen problem, and whether you have chosen an appropriate means of assessing the results.

To assess the performance of an agent, you will need some baseline agent to compare against. Consider comparison to a random agent only as a fallback when you can't easily implement a better baseline agent (such as greedy or rule-based), unless there is some reasonable expectation that a random agent performs reasonably well. For agents where you can vary the computational resources available (CPU time, for example), a comparison using different levels of resources can be useful (for example, alpha-beta to depth 4 vs. alpha-beta to depth 6). These comparisons are most useful when sampled over many different games, but deterministic agents will always return the same result from the same initial position. In such cases, you can introduce some randomness by adding noise to the heuristic function if you're using one, having the agents select a random move some small percentage of the time, or by starting with a randomly chosen (but still balanced) initial position.

You will not be penalized if your approach results in poor performance, as long as your approach was otherwise sound.