YALE UNIVERSITY
DEPARTMENT OF COMPUTER SCIENCE

	CPSC 467a: Cryptography and Computer Security	Notes 9 (rev. 1)
Professor M. J. Fischer		October 5, 2006

Lecture Notes 9

43 Generating RSA Encryption and Decryption Exponents

We showed in section 42 (lecture notes 8) that RSA decryption works for m

Z*_n if e and d are chosen so that

We now turn to the question of how Alice chooses e and d to satisfy (1). One way she can do this is to choose a random integer e

Z*_φ(n) and then solve (1) for d. We will show how to do this in Sections 45 and 46 below.

However, there is another issue, namely, how does Alice find random e

Z*_φ(n)? If Z*_φ(n) is large enough, then she can just choose random elements from Z_φ(n) until she encounters one that lies in Z*_φ(n). But how large is large enough? If φ(φ(n)) (the size of Z*_φ(n)) is much smaller than φ(n) (the size of Z_φ(n)), she might have to search for a long time before finding a suitable candidate for e.

In general, Z*_m can be considerably smaller than m. For example, if m = ∣Z_m∣ = 210, then ∣Z*_m∣ = 48. In this case, the probability that a randomly-chosen element of Z_m falls in Z*_m is only 48∕210 = 8∕35 = 0.228… .

The following theorem provides a crude lower bound on how small Z*_m can be relative to the size of Z_m that is nevertheless sufficient for our purposes.

Theorem 1 For all m ≥ 2,

∣Z *∣ 1 ---m-≥ -----------. ∣Zm ∣ 1+ ⌊log2m ⌋

Proof: Write m in factored form as m = ∏ _i=1^tp_i^e_i, where p_i is the i^th prime that divides m and e_i ≥ 1. Then φ(m) = ∏ _i=1^t(p_i - 1)p_i^e_i-1, so

To estimate the size of ∏ _i=1^t(p_i - 1)∕p_i, note that (p_i - 1)∕p_i ≥ i∕(i + 1). This follows since (x- 1)∕x is monotonic increasing in x, and p_i ≥ i + 1. Then

Clearly t ≤⌊log ₂m⌋ since 2^t ≤∏ _i=1^tp_i ≤ m and t is an integer. Combining this fact with equations (2) and (3) gives the desired result. __

For n a 1024-bit integer, φ(n) < n < 2¹⁰²⁴. Hence, log ₂(φ(n)) < 1024, so ⌊log ₂(φ(n))⌋≤ 1023. By Theorem 1, the fraction of elements in Z_φ(n) that also lie in Z*_φ(n) is at least 1/1024. Therefore, the expected number of random trials before Alice finds a number in Z*_φ(n) is provably at most 1024 and is most likely much smaller.

44 Euclidean algorithm

To test if d

Z*_φ(n), Alice can test if gcd(d,φ(n)) = 1. How does she do this?

The basic ideas underlying the Euclidean algorithm were sketched in section 40.2 (lecture notes 8). Euclid’s algorithm is remarkable, not only because it was discovered a very long time ago, but also because it works without knowing the factorization of a and b. It relies on the equation

which holds when a ≥ b ≥ 0. This allows the problem of computing gcd(a,b) to be reduced to the problem of computing gcd(a - b,b), which is “smaller” if b > 0. Here we measure the size of the problem (a,b) by the sum a + b of the two arguments. (4) leads in turn leads to an easy recursive algorithm:

int gcd(int a, int b)
{
  if ( a < b ) return gcd(b, a);
  else if ( b == 0 ) return a;
  else return gcd(a-b, b);
}

Nevertheless, this algorithm is not very efficient, as you will quickly discover if you attempt to use it, say, to compute gcd(1000000,2).

Repeatedly applying (4) to the pair (a,b) until it can’t be applied any more produces the sequence of pairs (a,b),(a-b,b),(a- 2b,b),…,(a-qb,b). The sequence stops when a-qb < b. But the number of times you can subtract b from a is just the quotient ⌊a∕b⌋, and the amount a - qb that is left is just the remainder a mod b. Hence, one can go directly from the pair (a,b) to the pair (a mod b,b). Since a mod b < b, it is also convenient to swap the elements of the pair. This results in the Euclidean algorithm (in C notation):

int gcd(int a, int b)
{
if ( b == 0 ) return a;
else return gcd(b, a % b);
}

45 Diophantine equations and modular inverses

Now that Alice knows how to choose d

Z*_φ(n), how does she find e? That is, how does she solve (1)? Note that e, if it exists, is a multiplicative inverse of d (mod n), that is, a number that, when multiplied by d, gives 1.

Here, a,b,c are given integers. A solution consists of integer values for the unknowns x and y. To put (1) into this form, we note that ed ≡ 1 (mod φ(n)) iff ed + uφ(n) = 1 for some integer u. This is seen to be an equation in the form of (5) where the unknowns x and y are e and u, respectively, and the coefficients a,b,c are d, φ(n), and 1, respectively.

46 Extended Euclidean algorithm

It turns out that (5) is closely related to the greatest common divisor, for it has a solution iff gcd(a,b)∣c. It can be solved by a process akin to the Euclidean algorithm, which we call the Extended Euclidean algorithm. Here’s how it works.

The algorithm generates a sequence of triples of numbers T_i = (r_i,u_i,v_i), each satisfying the invariant

The first triple T₁ is (a,1,0) if a ≥ 0 and (-a,-1,0) if a < 0. The second trip T₂ is (b,0,1) if b ≥ 0 and (-b,0,-1) if b < 0.

The algorithm generates T_i+2 from T_i and T_i+1 much the same as the Euclidean algorithm generates (a mod b) from a and b. More precisely, let q_i+1 = ⌊r_i∕r_i+1⌋. Then T_i+2 = T_i - q_i+1T_i+1, that is,

ri+2 = ri - qi+1ri+1 u = u - q u i+2 i i+1 i+1 vi+2 = vi - qi+1vi+1

Returning to equation (5), if c = gcd(a,b), then x = u_t-1 and y = v_t-1 is a solution. If c is a multiple of gcd(a,b), then c = k gcd(a,b) for some k and x = ku_t-1 and y = kv_t-1 is a solution. Otherwise, gcd(a,b) does not divide c, and one can show that (5) has no solution. See Handout 5 for further details, as well as for a discussion of how many solutions (5) has and how to find all solutions.

47 Generating RSA Modulus

We finally turn to the question of generating the RSA modulus, n = pq. Recall that the numbers p and q should be random distinct primes of about the same length. The method for finding p and q is similar to the “guess-and-check” method used in Section 43 to find random numbers in Z*_n. Namely, keep generating random numbers p of the right length until a prime is found. Then keep generating random numbers q of the right length until one is found that is prime and different from p.

To generate a random prime of a given length, say k bits long, generate k - 1 random bits, put a “1” at the front, regard the result as binary number, and test if it is prime. We defer the question of how to test if the number is prime and look now at the expected number of trials before this procedure will terminate.

The above procedure samples uniformly from the set B_k = Z_2^k - Z_2^k-1 of binary numbers of length exactly k. Let p_k be the fraction of elements in B_k that are prime. Then the expected number of trials to find a prime will be 1∕p_k. While p_k is difficult to determine exactly, the celebrated Prime Number Theorem allows us to get a good estimate on that number.

Let π(n) be the number of numbers ≤ n that are prime. For example, π(10) = 4 since there are four primes ≤ 10, namely, 2, 3, 5, 7. The prime number theorem asserts that π(n) is “approximately”² n∕(lnn), where lnn is the natural logarithm (log _e) of n. The chance that a randomly picked number in Z_n is prime is then π(n - 1)∕n ≈ ((n - 1)∕ln(n - 1))∕n ≈ 1∕(lnn).

π (2k - 1)- π (2k- 1 - 1) pk = ----------k-1--------- 2 = 2π-(2k---1)- π(2k-1 --1) 2k 2k-1 --2-- ---1--- -1--- ≈ ln2k - ln2k- 1 ≈ ln 2k = --1--. k ln 2

The remaining problem for generating an RSA key is how to test if a large number is prime. Until very recently, no deterministic polynomial time algorithm was known for testing primality, and even now it is not known whether any deterministic algorithm is feasible in practice. However, there do exist fast probabilistic algorithms for testing primality, which we now discuss

48 Probabilistic Primality Tests

A deterministic test for primality is a procedure that, given as input a number n, correctly returns the answer ‘composite’ or ‘prime’.³ To arrive at a probabilistic algorithm, we extend the notion of a deterministic primality test in two ways: We give it an extra “helper” string a, and we allow it to answer ‘?’, meaning “I don’t know”. Given input n and helper string a, such an output may correctly answer either ‘composite’ or ‘?’ when n is composite, and it may correctly answer either ‘prime’ or ‘?’ when n is prime. If the algorithm gives a non-”?” answer, we say that the helper string a is a witness to that answer.

Given an extended primality test T(n,a), we can use it to build a strong probabilistic primality testing algorithm. On input n, do the following:

This algorithm has the property that it might not terminate (in case there are no witnesses to the correct answer for n), but when it does terminate, the answer is correct.

Unfortunately, we do not know of any test that results in an efficient strong probabilistic primality testing algorithm. However, the above algorithm can be weakened slightly and still be useful. What we do is to add a parameter t which is the maximum number of trials that we are willing to perform. The algorithm then becomes:

Now the algorithm is allowed to give up and return ‘?’, but only after trying t times to find the correct answer. If there are lots of witnesses to the correct answer, then the probability will be high of finding one, so most of the time the algorithm will succeed. But even this assumption is stronger than we know how to achieve.

49 Tests of compositeness

The tests that we will present are asymmetric. When n is composite, there are many witnesses to that effect, but when n is prime, there are none. Hence, the test either outputs ‘composite’ or ‘?’ but never ‘prime’. We call these tests of compositeness since an answer of ‘composite’ means that n is definitely composite, but these tests can never say for sure that n is prime.

When algorithm P₂ uses a test of compositeness, an answer of ‘composite’ likewise means that n is definitely composite. Moreover, if there are many witnesses to n’s being composite and t is sufficiently large, then the probability that P₂(n,t) outputs ‘composite’ will be high. However, if n is prime, then both the test and P₂ will always output ‘?’. It is tempting to interpret P₂’s output of ‘?’ to mean “n is probably prime”, but of course, it makes no sense to say that n is probably prime; n either is or is not prime. But what does make sense is to say that the probability is very small that P₂ answers ‘?’ when n is composite.

In practice, we will indeed interpret the output ‘?’ to mean ‘prime’, but we understand that the algorithm has the possibility of giving the wrong answer when n is composite. Whereas before our algorithm would only report an answer when it was sure and would answer ‘?’ otherwise, now we are considering algorithms that are allowed to make mistakes with (hopefully) small probability.