YALE UNIVERSITY

DEPARTMENT OF COMPUTER SCIENCE

CPSC 461b: Foundations of Cryptography | Notes 11 (rev. 1) | |

Professor M. J. Fischer | February 17, 2009 | |

Lecture Notes 11

Let X = {X_{n}}_{nℕ}, Y = {Y _{n}}_{nℕ} be probability ensembles. X, Y are statistically close if their statistical
difference Δ(n) is negligible, where

Here’s the proof that I only sketched in class.

Proof: We prove the contrapositive. Suppose X, Y are not indistinguishable in polynomial time. Then there exists a p.p.t. algorithm D and a positive polynomial p(⋅) such that for infinitely many n,

| (1) |

For α a length-n string, let p(α)Pr[D(α,1^{n}) = 1]. Then

| (2) |

| (3) |

Plugging (2) and (3) into (1) gives

The converse to theorem 1 does not hold.

Theorem 2 There exists X = {X_{n}}_{nℕ} that is indistinguishable from the uniform ensemble
U = {U_{n}}_{nℕ} in polynomial time, yet X and U are not statistically close. Furthermore, X_{n} assigns
all probability mass to a set S_{n} consisting of at most 2^{n∕2} strings of length n.

Proof: We construct the ensemble X = {X_{n}}_{nℕ} by choosing for each n a set S_{n} ⊆{0,1}^{n} of
cardinality N = 2^{n∕2} and letting X_{n} be the uniformly distributed on S_{n}. Thus, Pr[X_{n} = α] = 1∕N
for α S_{n}, and Pr[X_{n} = α] = 0 for α ⁄ S_{n}.

The fact that X, U are not statistically close is immediate from the above. Using the facts that 2^{n} = N^{2}
and |S_{n}| = N, and |S_{n}| = N^{2} - N, we get

The proof in the textbook supplies the low-level details needed to establish this theorem, but it is a little
unclear about the construction itself, particularly about how the set S_{n} is chosen.

We wish to choose a set S_{n} for which the corresponding distribution X_{n} is indistinguishable from U_{n}
by every polynomial size circuit C. We do this by diagonalizing over all circuits of size 2^{n∕8}.
We start with all size 2^{N} subsets of {0,1}^{n} as candidates for S_{n}. For each such circuit C,
we discard from consideration all candidates on which C is too successful at distinguishing
the corresponding ensemble from uniform. By a counting argument, we show that not very
many candidates get thrown out at each stage—so few in fact that there are still candidates left
after all of the size 2^{n∕8} circuits have been considered. We choose any remaining candidate
for S_{n} and conclude that no size 2^{n∕8} circuit is very successful at distinguishing X_{n} from
U_{n}.

More precisely, here’s how to determine which candidates to discard. First, consider an n-input circuit
C with at most 2^{n∕8} gates. Let p_{C} be C’s expected output on uniformly chosen inputs. Then C(x) = 1 for a
p_{C} fraction of all length n strings, and C(x) = 0 for the remainder.

Let _{n} = {S ⊆{0,1}^{n}∣|S| = 2^{N}}. This is the initial family of candidate sets. Let f_{C} : _{n} →{0,1},
where

Thus, f_{C}(S) is the amount that the average value of C(s) taken over strings s S differs from the average
value of C(u) taken over all length-n strings u. By the law of large numbers, we would expect f_{C}(S) to be
very small with high probability for randomly chosen S . Call a set S bad for C if f_{C}(S) ≥ 2^{-n∕8}.
Using the Chernoff bound, one shows that the fraction of sets S _{n} that are bad for C is less than 2^{-2n∕4
}.
(Details are in the book.)

Next, one argues that there are at most 2^{2n∕4
} circuits of size 2^{n∕8}. (This is by a counting argument.
Details are not in the book and should be verified.) From this, it follows that there is at least one set
S_{n} _{n} which is not bad for any such circuit. Fix such a set.

Now, let X_{n} be uniformly distributed over S_{n}. Observe that the following three quantities are all the
same: the expected value of C(X_{n}), Pr[C(X_{n}) = 1], and ∑
_{sS}C(s)∕N. Hence, for all circuits C of size
at most 2^{n∕8}, we have |Pr[C(X_{n}) = 1] -Pr[C(U_{n}) = 1]| = f_{C}(S_{n}) < 2^{-n∕8}, which grows more slowly
than 1∕p(n) for any polynomial p(⋅). We conclude that the probabilistic ensembles U and X are
indistinguishable by polynomial-size circuits, which also implies polynomial-time indistinguishability by
probabilistic polynomial-time Turing machines. __

We remark that a consequence of theorem 2 is that the set S_{n} on which X_{n} has non-zero probability
mass cannot be recognized in polynomial time. Assume to the contrary that it could be recognized by some
polynomial time algorithm A, that is, A(x) = 1 if x S_{n} and A(x) = 0 otherwise.. Then A itself would
distinguish X_{n} from U_{n}. Clearly, Pr[A(X_{n}) = 1] = 1 but Pr[A(U_{n}) = 1] = |S_{n}|∕2^{n}. Since |S_{n}| = 2^{n∕2},
these two probabilities differ by 1 - which is greater than for all sufficiently large n. (Note that the
constant 2 is also a polynomial!)

The definition of polynomial time indistinguishability given in section 26 gives the distinguishing algorithm D a single random sample from either X or Y and compare the two probabilities of it outputting a 1. We can generalize that definition in a straightforward way by providing D with multiple samples, as long as the number of samples is itself bounded by a polynomial m(n). If the difference in output probabilities in this case is a negligible function, we say that X, Y are indistinguishable by polynomial-time sampling. See Definition 3.2.4 of the textbook for details

Giving D multiple samples allows for new possible distinguishing algorithms. For example, consider the algorithm Eq(x,y) that outputs 1 if x = y and 0 otherwise. Eq able to distinguish the ensemble X of Theorem 2 from U. Let’s analyze the probabilities.

since no matter what value X_{n}^{1} assumes, there is a 1∕N chance that the second (independent) sample is
equal to it. (Recall that N = 2^{n∕2}.) On the other hand,

The difference of these two probabilities is clearly non-negligible.

However, it turns out that multiple samples are only helpful in cases such as this where at least one of the distributions cannot be constructed in polynomial time, as we shall see.

We say that an ensemble X = {X_{n}}_{nℕ} is polynomial-time constructible if there exists a
polynomial-time probabilistic algorithm S such that the output distribution S(1^{n}) and X_{n} are identically
distributed.

Theorem 3 Let probability ensembles X, Y be indistinguishable in polynomial time, and suppose both are polynomial-time constructible. Then X, Y are indistinguisable by polynomial-time sampling.

Proof: The proof is an example of the hybrid technique, also sometimes called an interpolation proof. Here’s the outline of it.

Assume X, Y are distinguishable by D using m = m(n) samples. Let X_{n}^{(1)},…,X_{n}^{(m)} be
independent random variables identically distributed to X_{n} and similarly for Y . Let

and let

By assumption, D can distinguish X, Y , so the difference δ(n) = |p(x) = p(y)| is non-negligible.

We now construct a sequence of hybrid m-tuples of random variables for k = 0,…,m:

Clearly, H_{n}^{0} consists of all Y ’s, and H_{n}^{m} consists of all X’s. Hence, D distinguishes between
H_{n}^{0} and H_{n}^{m} with probability δ(n).

Now let δ_{k}(n) be the absolute value of the difference in D’s probability of outputting a 1 given
H_{n}^{k} and H_{n}^{k+1}. It is easily seen that ∑
_{k=0}^{m-1}δ_{k}(n) ≥ δ(n); hence, for some particular value of
k = k_{0},

We now describe a single-sample distinguisher D′. On input α, it first chooses a random
number k from {0,…,m-1} Next, it generates k independent random numbers x_{1},…,x_{k} distributed
according to X_{n} and m - k - 1 random numbers y_{k+2},…,y_{m} distributed according to Y _{n}. It
can do this by the assumption that X and Y are polynomial-time constructible. It then constructs
h = (x_{1},…,x_{k},α,y_{k+2},…,y_{m}), runs D(h), and outputs the result.

Note that h is distributed according to H_{n}^{k} if α was chosen according to Y , and h is distributed
according to H_{n}^{k+1} if α was chosen according to X. Thus, the probability that D′ outputs 1 given
a sample from X or a sample from Y is at least 1∕m, the probability that D′ chooses k = k_{0}, times
δ_{k0}(n). Hence, D′ distinguishes with probability difference at least δ(n)∕m^{2}, which contradicts the
assumption that X, Y are indistinguishable in polynomial time. __