YALE UNIVERSITY
DEPARTMENT OF COMPUTER SCIENCE

	CPSC 461b: Foundations of Cryptography	Notes 14 (rev. 1)
Professor M. J. Fischer		February 26, 2009

Lecture Notes 14

35 Construction of a Pseudorandom Function Ensemble

Let ℓ(n) = n. We show how to construct an efficiently computable ℓ-bit pseudorandom function ensemble, starting from a pseudorandom generator G with expansion factor 2n.

Let s be an n-bit string, so G(s) is a 2n-bit string. Define G₀(s) (resp. G₁(s)) to be the first (resp. last) n bits of G(s). Define a function f_s : {0,1}ⁿ →{0,1}ⁿ as follows: For every σ = σ₁σ₂…σ_n of length n, let

The construction of f_s can be represented by a tree, the first three levels of which are shown in Figure 35.1. Each node on level k is labeled by an n-bit string s_τ, where τ is a string of length k. The left son of node s_τ is labeled by s_τ0 = G₀(s_τ), and the right son is labeled by s_τ1 = G₁(s_τ). The leaf label s_σ is the value of the function f_s(σ).

Figure 35.1:

Tree representation of the function f_s.

Let F_n be the random variable that selects s uniformly from {0,1}ⁿ, places it at the root of the tree, and then generates the labels of the remaining nodes as described above to obtain the resulting function f_s. Let F = {F_n}_nℕ be the function ensemble that corresponds to this random process.

Theorem 1 If G is a pseudorandom generator, then F is an efficiently computable ensemble of pseudorandom functions.

The k^th hybrid H_n^k is a random variable ranging over functions that assign labels uniformly at the k^th level of the tree. In more detail, H_n^k selects s_τ uniformly from {0,1}ⁿ for each τ of length k and places those 2^k values in the corresponding nodes of level k of the tree. It then generates the labels of the nodes below level k using the functions G₀ and G₁ as described above. The leaves of the tree define the resulting function, which in turn is uniquely determined by the level-k node labels s_τ₁,…s_{τ_2^k}. We denote this function by

(Note that the nodes at levels less than k do not get labeled by this process, but only the leaves are relevant to the defined function.)

By this construction, we have H_n⁰ = F_n, the original pseudorandom function construction, and H_nⁿ = H_n, the uniform distribution over functions {0,1}ⁿ →{0,1}ⁿ.

Now, assume that F is not pseudorandom. Then there exists a probabilistic polynomial-time oracle machine M and a positive polynomial p(⋅) such that for infinitely many n,

Let t(⋅) be a polynomial bound on the running time of M(1ⁿ). It follows that M(1ⁿ) makes at most t(n) queries of its oracle.

We construct a probabilistic polynomial-time distinguisher D that distinguishes t(n) samples from {G(U_n)}_nℕ from t(n) samples from {U_2n}_nℕ with non-negligible advantage.

The idea is that the oracle queries should be answered according to a tree that is dynamically constructed during the execution of D. Initially, all nodes are unlabeled. For each query q_i = σ₁…σ_n, the nodes at levels > k that lie on the path q_i receive labels, as well as the brother of the level-(k + 1) node that lies on that path. Once a node is labeled, it retains that label throughout the duration of the run.

The labeling in turn works as follows. If the level-(k + 1) node (σ₁…σ_kσ_k+1) has not yet been labeled, it and its brother are labeled using input string α_i as follows:

where for any string y of length 2n, P₀(y) is the left half of y and P₁(y) is the right half of y.¹ The nodes below (σ₁…σ_k+1) are labeled as usual using G, i.e.,

Observe that when the inputs to D are t(n) samples from U_2n, then P₀(α_i) and P₁(α_i) are both uniformly distributed. This means that whenever a node on level-(k + 1) receives a label, the label is uniformly distributed. Hence, D behaves as M would given oracle H_n^k+1.

When the inputs to D are t(n) samples from G(U_n), then the nodes on level-(k + 1) are labeled according to G₀(s) and G₁(s) for randomly chosen s. That is, if α_i = G(U_n), then there is some s such that α_i = G(s). The construction sets s_τ0 = P_o(α_i) = G₀(s) and s_τ1 = P_o(α_i) = G₁(s), so D behaves as M would given oracle H_n^k. Note that D does not actually label node τ with s, nor is it able to compute s from α_i, but the nodes below τ are labeled just as if node τ were labeled by s.

The following claims relate the success probability of D to the probability that M outputs 1. In both claims, K is the random value chosen in step 1 of Algorithm D.

Claim 1 $(1) (t(n)) k P r[D(G (Un ),...,G (U n )) = 1 | K = k] = Pr[M Hn(n) = 1].$

Claim 2 $P r[D(U (12n)),...,U (2t(nn)))) = 1 | K = k] = Pr[M Hkn+1(1n) = 1].$

Thus, D distinguishes with inverse polynomial advantage and polynomially many samples between G(U_n) and U_2n. Since {G(U_n)}_nℕ and {U_2n}_nℕ are both efficiently constructible ensembles, it follows from Theorem 3 in section 28.2 of lecture 11 (Theorem 3.2.6 of the textbook) that G(U_n) and U_2n are distinguishable in polynomial time by a single sample with polynomial advantage. Hence, G is not pseudorandom, a contradiction.

We conclude that F is an efficiently computable ensemble of pseudorandom functions, as desired. __

36 An Application of a Pseudorandom Function Ensemble

Here’s a simple example from the textbook of an application of a pseudorandom function ensemble F . Suppose a a secret society wants a way to identify its members. Imagine each club member is given the secret seed s that defines a function f_s = F_n. Members are instructed never to divulge s. Rather, to authenticate someone claiming to be a member, ask them to tell you the value of the secret function f_s on a random challenge string x. You also compute f_s(x) and accept the person as a member if the two values agree. Note that you are giving away very little information on each such authentication. It can be shown that an adversary succeeds with negligible probability, even after a polynomial number of attempts. If not, the adversary can be used to distinguish F from H. (See section 3.6.3 of the textbook for further details.)