{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## February 24 - Reasoning under Uncertainty\n", "\n", "AIMA Chapter 13" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "from probability import *\n", "from utils import print_table\n", "from notebook import psource, pseudocode, heatmap" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Failures of Logic\n", "\n", "First-order logic represents a certainty\n", "- ∀x Symptom(x,Toothache) ⇒ Disease(x,Cavity)\n", "\n", "
To make the rule true, we must add an almost unlimited set of causes\n", "\n", "- ∀x Symptom(x,Toothache) ⇒ Disease(x,Cavity) ∨ Disease(x,GumDisease) ∨ Disease(x,Abscess) ∨ ...\n", "\n", "
Conversion to a causal rule does not help\n",
"\n",
"- ∀x Disease(x,Cavity) ⇒ Symptom(x,Toothache) \n",
"\n",
"Not all cavities cause pain"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Why does Logic Fail?\n",
"\n",
"Laziness\n",
"\n",
"- Too much work to list entire sets of consequents or antecedents.\n",
"- Cannot list all possible causes for a toothache.\n",
"\n",
"Theoretical Ignorance\n",
"\n",
"- No complete theory for domain exists (pace **blocks world**, versus medicine).\n",
"- Cannot describe precisely the conditions that cause cancer.\n",
"\n",
"Practical Ignorance\n",
"\n",
"- Even if we know all the rules, we may be uncertain about a particular event.\n",
"- What was the white blood cell count of the patient two years ago?\n",
"\n",
"Probability provides a way of summarizing the uncertainty that comes from our laziness and ignorance. \n",
"\n",
"> We can say that it will probably take 2 hours to drive to the\n",
"airport, without listing all possible events that might intervene. \n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Basic Probability I\n",
"\n",
"We assume you have some basic understanding of probability. For example, by now, from homework 3, you should know that the only way to win consistently at blackjack is to own a casino.\n",
"\n",
"Here is quick review.\n",
"\n",
"A probability is a defined numerically as a real number between 0 and 1, inclusive.\n",
"\n",
"Unconditional (prior) probability:\n",
"\n",
"- P(Cavity) = 0.1\n",
"\n",
"Random variable - a variable that represents a probability of an event, e.g., *Weather*\n",
"\n",
"- P(*Weather* = snow) = 0.05\n",
"\n",
"Probability Distribution\n",
"\n",
"\n",
"\n",
"Examples of Probability Distributions:\n",
"\n",
"* **Normal**: heights of 14 year old girls, weights of apples (https://statisticsbyjim.com/basics/normal-distribution/) [Note: Yale College GPA's are **NOT** normal, in any meaningful sense.]\n",
"
\n", "* **Triangular**: \"lack of knowledge distribution\", use best-case, worst-case, average case, e.g., project management. (https://en.wikipedia.org/wiki/Three-point_estimation)\n", "
\n", "* **Uniform**: heads or tail for fair coin, random throw of fair die (https://corporatefinanceinstitute.com/resources/knowledge/other/uniform-distribution/)\n", "
\n", "* **Log-normal**: length of blog comments, length of hair or nails, time to solve a Rubik's cube (https://en.wikipedia.org/wiki/Log-normal_distribution)\n", "
\n", "\n", "Conditional (posterior) probability: (Note: \"|\" here is pronounced \"given\", not \"or\")\n", "\n", "- P(Cavity | Toothache) = 0.8\n", "\n", "That means, whenever *toothache* is true, *and we have no further information*, conclude that *cavity* is it true with probability 0.8.\n", "\n", "Similarly, we have\n", "\n", "* P(Cavity | Dentures) = 0.0\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Basic Probability II\n", "\n", "\n", "Begin with a set $\\Omega$ — the sample space,\n", "e.g., 6 possible rolls of a die.\n", "\n", "$\\omega{} \\in{} \\Omega$ \n", "is a sample point/possible world/atomic event, e.g., a roll\n", "of the die.\n", "\n", "A probability space or probability model is a sample space\n", "with an assignment \n", "$P(\\omega)$ for every $\\omega ∈ \\Omega$ such that $0 ≤ P(\\omega) ≤ 1$ and\n", "\n", "$$\\sum\\nolimits_{\\omega}P(\\omega) = 1$$\n", "\n", "e.g., for a die, P(1) = P(2) = P(3) = P(4) = P(5) = P(6) = 1/6\n", "\n", "#### Axioms\n", "\n", " $$0 \\leq P(A) \\leq 1$$\n", " \n", " $$P(True) = 1$$\n", "
\n", " $$P(False) = 0$$\n", "
\n",
"\n",
" $$P(A ∨ B) = P(A) + P(B) - P(A ∧ B)$$\n",
"\n",
"\n",
"\n",
" $$P(¬A) = 1 - P(A)$$\n",
"
\n", " $$P(A ∧ B) = P(A|B) P(B)$$ (the product rule)\n", " \n", "We can restate the product rule as: *for A and B to be true, we need B to be true, and we also need A to be true, given B.*\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Bayes' Rule\n", "\n", "I have known about Bayes' Rule since college. This is an easy\n", "way to remember it (and derive it).\n", "\n", "From the product rule\n", "\n", " $$P(A ∧ B) = P(A|B) P(B)$$\n", " $$P(A ∧ B) = P(B|A) P(A)$$\n", "\n", "Bayes' Rule\n", "\n", " $$P(A|B) P(B) = P(B|A) P(A)$$\n", "
solve for $P(B|A)$\n", " $$P(B|A) = \\frac{P(A|B) P(B)}{P(A)}$$\n", "\n", "\n", "For what is Bayes' Rule good? It provides a basis for diagnosing causes given the evidence, or effects we observe. We see a problem, like a fever or smoke coming out of our car's hood. We can treat the symptom, but it is usually better to treat the cause. That means we need to find the cause. Using this framework, Bayes' Rule becomes:\n", "\n", "\n", "
\n", " $$P(cause|symptom) = \\frac{P(symptom|cause) P(cause)}{P(symptom)}$$\n", "
\n", " \n", "If we know $P(symptom|cause)$, and $P(cause)$, and $P(symptom)$, we can calculate\n", "the odds of the particular diagnosis.\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Application of Bayes' Rule\n", "\n", "Bayes was both a minister and a mathematician, as was his friend Richard Price. After Bayes' death, Price discovered the formula among Bayes' papers and used it as the basis of an argument for \n", "miracles (and the existence of God), trying to refute David Hume.\n", "\n", "Today Bayes' rule is pervasive in statistics, artificial intelligence, and machine learning. Not so much in religion.\n", "\n", "We can apply Bayes' rule to the current coronavirus epidemic, COVID-19.\n", "\n", "According to the Centers for Disease Control (CDC) \n", "\n", "> Current symptoms reported for patients with COVID-19 have included mild to severe respiratory illness with fever, cough, and difficulty breathing.\n", "\n", "Let's assume that 1 in 20 people report these symptoms.\n", "\n", " $$P(S) = 1/20$$\n", "\n", "We assume that 1 person in 100,000 has the coronavirus.\n", "\n", " $$P(CV) = 1/100000$$\n", "\n", "The coronavirus causes these symptoms in half the afflicted patients.\n", "\n", " $$P(S|CV) = 0.50$$\n", "\n", "If I have the symptoms of a respiratory illness, what is the chance that I have the coronavirus?\n", "\n", " $$P(CV|S) = \\frac{P(S|CV)P(CV)}{P(S)}$$\n", "\n" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "P_CV_S = 0.5 * (1/100000)/(1/20)" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.0001" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "P_CV_S" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.01" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "P_CV_S * 100" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "That is, .01 %. In 10,000 patients who report respiratory symptoms, only 1 would \n", "actually have the coronavirus, based on my fabricated data.\n", "\n", "\n", "We can refine the calculation by looking at other factors, like traveling in China in the last few weeks. There also are tests, but \n", "\n", "> A study in the journal Radiology showed five out of 167 patients tested negative for the disease despite lung scans showing they were ill. They then tested positive for the virus at a later date. (https://www.bbc.com/news/health-51491763 \"BBC News\")\n", "\n", "Thus, testing negative does not guarantee that you do NOT have the disease. Testing \n", "is an ongoing issue, involve both false negatives (showing that you do **not** have the \n", "disease when you do), and false positives (showing that you **have** the disease \n", "when you do not.)\n", "\n", "In any event, assuming that you can nail down the respective probabilities, you can use Bayes' Rule to calculate the odds. A lot of public health workers are now trying to \n", "do exactly that. (https://www.kaggle.com/sudalairajkumar/novel-corona-virus-2019-dataset) \n", "\n", "We should also caution that some of these numbers are moving targets. $P(CV)$ might be\n", "changing daily. $P(S|CV)$ is likely a more stable statistic. $P(S)$ might change every\n", "year during flu season." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Uncertainty and Rational Decisions\n", "\n", "I propose the following game. You roll a fair die, with 6 sides. If the number that appears is even, you get that number of dollars. If the number is odd, you pay that number of dollars. For example, you roll a 4, you get \\\\$4. You roll a 3, you pay me \\\\$3.\n", "\n", "How much would you be willing to pay to play this game?\n", "\n", "Using economic decision theory, we can calculate the expected value of the game by summing the expected value of all possible outcomes.\n", "\n", "* roll 1, pay \\\\$1, P(1) = 1/6, EV(1) = -\\\\$1 * P(1) = -\\\\$1/6\n", "\n", "* roll 2, get \\\\$2, P(2) = 1/6, EV(2) = \\\\$2 * P(2) = +\\\\$2/6\n", "\n", "* roll 3, pay \\\\$3, P(3) = 1/6, EV(3) = -\\\\$3 * P(3) = -\\\\$3/6\n", "\n", "* roll 4, get \\\\$4, P(4) = 1/6, EV(4) = \\\\$4 * P(4) = +\\\\$4/6\n", "\n", "* roll 5, pay \\\\$5, P(5) = 1/6, EV(5) = -\\\\$5 * P(5) = -\\\\$5/6\n", "\n", "* roll 6, get \\\\$6, P(6) = 1/6, EV(6) = \\\\$6 * P(6) = +\\\\$6/6\n", "\n", "The total expected value is -1/6 + 2/6 - 3/6 + 4/6 - 5/6 + 6/6 = \\\\$3/6 = \\\\$0.50\n", "\n", "That is, if you pay 50 cents each time, you should break even if you play enough games. \n", "\n", "The expected value calculation is at the heart\n", "of economic decision theory. It has the advantage that if you\n", "can accurately identify all the options, and their respective \n", "probabilities and payoffs, you can arrive at the optimum decision.\n", "It has the disadvantage that you can rarely find all the options and their accurate probabilities and payoffs, so you end up making up numbers (as I did in the coronavirus example above), and you reach erroneous conclusions.\n", "\n", "> This is largely what happened during the financial crisis when you were in elementary school, and boatloads of people lost their homes and their jobs. Thanks, in part, to good old economic decision theory.\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Suppose you have a loaded die, that is, one that is not fair. On this die an odd number is three times as likely as an even number. We can model such a die with a probability distribution." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## PROBABILITY DISTRIBUTION\n", "\n", "Let us begin by specifying discrete probability distributions. The class **ProbDist** defines a discrete probability distribution. \n", "It is discrete because we can enumerate the finite number of $\\omega$ sample points. We name our random variable and then assign probabilities to the different values of the random variable. Assigning probabilities to the values works similarly to that of using a dictionary with keys being the Value and we assign to it the probability. This is possible because of the magic methods **_ _getitem_ _** and **_ _setitem_ _** which store the probabilities in the prob dict of the object. " ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "
\n", "class ProbDist:\n",
" """A discrete probability distribution. You name the random variable\n",
" in the constructor, then assign and query probability of values.\n",
" >>> P = ProbDist('Flip'); P['H'], P['T'] = 0.25, 0.75; P['H']\n",
" 0.25\n",
" >>> P = ProbDist('X', {'lo': 125, 'med': 375, 'hi': 500})\n",
" >>> P['lo'], P['med'], P['hi']\n",
" (0.125, 0.375, 0.5)\n",
" """\n",
"\n",
" def __init__(self, varname='?', freqs=None):\n",
" """If freqs is given, it is a dictionary of values - frequency pairs,\n",
" then ProbDist is normalized."""\n",
" self.prob = {}\n",
" self.varname = varname\n",
" self.values = []\n",
" if freqs:\n",
" for (v, p) in freqs.items():\n",
" self[v] = p\n",
" self.normalize()\n",
"\n",
" def __getitem__(self, val):\n",
" """Given a value, return P(value)."""\n",
" try:\n",
" return self.prob[val]\n",
" except KeyError:\n",
" return 0\n",
"\n",
" def __setitem__(self, val, p):\n",
" """Set P(val) = p."""\n",
" if val not in self.values:\n",
" self.values.append(val)\n",
" self.prob[val] = p\n",
"\n",
" def normalize(self):\n",
" """Make sure the probabilities of all values sum to 1.\n",
" Returns the normalized distribution.\n",
" Raises a ZeroDivisionError if the sum of the values is 0."""\n",
" total = sum(self.prob.values())\n",
" if not isclose(total, 1.0):\n",
" for val in self.prob:\n",
" self.prob[val] /= total\n",
" return self\n",
"\n",
" def show_approx(self, numfmt='{:.3g}'):\n",
" """Show the probabilities rounded and sorted by key, for the\n",
" sake of portable doctests."""\n",
" return ', '.join([('{}: ' + numfmt).format(v, p)\n",
" for (v, p) in sorted(self.prob.items())])\n",
"\n",
" def __repr__(self):\n",
" return "P({})".format(self.varname)\n",
"
class JointProbDist(ProbDist):\n",
" """A discrete probability distribute over a set of variables.\n",
" >>> P = JointProbDist(['X', 'Y']); P[1, 1] = 0.25\n",
" >>> P[1, 1]\n",
" 0.25\n",
" >>> P[dict(X=0, Y=1)] = 0.5\n",
" >>> P[dict(X=0, Y=1)]\n",
" 0.5"""\n",
"\n",
" def __init__(self, variables):\n",
" self.prob = {}\n",
" self.variables = variables\n",
" self.vals = defaultdict(list)\n",
"\n",
" def __getitem__(self, values):\n",
" """Given a tuple or dict of values, return P(values)."""\n",
" values = event_values(values, self.variables)\n",
" return ProbDist.__getitem__(self, values)\n",
"\n",
" def __setitem__(self, values, p):\n",
" """Set P(values) = p. Values can be a tuple or a dict; it must\n",
" have a value for each of the variables in the joint. Also keep track\n",
" of the values we have seen so far for each variable."""\n",
" values = event_values(values, self.variables)\n",
" self.prob[values] = p\n",
" for var, val in zip(self.variables, values):\n",
" if val not in self.vals[var]:\n",
" self.vals[var].append(val)\n",
"\n",
" def values(self, var):\n",
" """Return the set of possible values for a variable."""\n",
" return self.vals[var]\n",
"\n",
" def __repr__(self):\n",
" return "P({})".format(self.variables)\n",
"
\n", "$$\\textbf{P}(X | \\textbf{e}) = \\alpha \\textbf{P}(X, \\textbf{e}) = \\alpha \\sum_{y} \\textbf{P}(X, \\textbf{e}, \\textbf{y})$$\n", "\n", "Here α is the normalizing factor. X is our query variable and e is the evidence. According to the equation we enumerate on the remaining variables y (not in evidence or query variable) i.e. all possible combinations of y\n", "\n", "We will be using the same example as the book. Let us create the full joint distribution from **Figure 13.3**. " ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [], "source": [ "full_joint = JointProbDist(['Cavity', 'Toothache', 'Catch'])\n", "full_joint[dict(Cavity=True, Toothache=True, Catch=True)] = 0.108\n", "full_joint[dict(Cavity=True, Toothache=True, Catch=False)] = 0.012\n", "full_joint[dict(Cavity=True, Toothache=False, Catch=True)] = 0.016\n", "full_joint[dict(Cavity=True, Toothache=False, Catch=False)] = 0.064\n", "full_joint[dict(Cavity=False, Toothache=True, Catch=True)] = 0.072\n", "full_joint[dict(Cavity=False, Toothache=False, Catch=True)] = 0.144\n", "full_joint[dict(Cavity=False, Toothache=True, Catch=False)] = 0.008\n", "full_joint[dict(Cavity=False, Toothache=False, Catch=False)] = 0.576" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The above data corresponds to the following table:\n", "\n", "
toothache | ¬toothache | |||
---|---|---|---|---|
catch | ¬catch | \n", "catch | ¬catch | |
cavity | .108 | .012 | .072 | .008 |
¬cavity | .016 | .064 | .144 | .576 |
def enumerate_joint(variables, e, P):\n",
" """Return the sum of those entries in P consistent with e,\n",
" provided variables is P's remaining variables (the ones not in e)."""\n",
" if not variables:\n",
" return P[e]\n",
" Y, rest = variables[0], variables[1:]\n",
" return sum([enumerate_joint(rest, extend(e, Y, y), P)\n",
" for y in P.values(Y)])\n",
"
def enumerate_joint_ask(X, e, P):\n",
" """Return a probability distribution over the values of the variable X,\n",
" given the {var:val} observations e, in the JointProbDist P. [Section 13.3]\n",
" >>> P = JointProbDist(['X', 'Y'])\n",
" >>> P[0,0] = 0.25; P[0,1] = 0.5; P[1,1] = P[2,1] = 0.125\n",
" >>> enumerate_joint_ask('X', dict(Y=1), P).show_approx()\n",
" '0: 0.667, 1: 0.167, 2: 0.167'\n",
" """\n",
" assert X not in e, "Query variable must be distinct from evidence"\n",
" Q = ProbDist(X) # probability distribution for X, initially empty\n",
" Y = [v for v in P.variables if v != X and v not in e] # hidden variables.\n",
" for xi in P.values(X):\n",
" Q[xi] = enumerate_joint(Y, extend(e, X, xi), P)\n",
" return Q.normalize()\n",
"