CPSC 467a Lecture Notes 2

[Course home page] [Lecture notes] YALE UNIVERSITY
DEPARTMENT OF COMPUTER SCIENCE

CPSC 467a: Cryptography and Computer Security

Notes 2 (rev. 1)

Professor M. J. Fischer

September 6, 2005

Lecture Notes 2

1 Classical cryptography

A symmetric cryptosystem consists of a set M of plaintext messages, a set C of ciphertexts, a set K of keys, and two functions, an encryption function E: K ×M → C and a decryption function D: K ×C → M. We often write E_k(m) = E(k,m) and D_k(c) = D(k, c) and may talk about a cryptosystem consisting of a family of encryption and decryption functions indexed by k, but this should not obscure the fact that in order to be useful, it must be feasible to evaluate the functions given an arbitrary key k.

We require that D_k be the (one-sided) inverse of E_k, that is, for all keys k ∈ K and all plaintext messages m ∈ M, we have D_k(E_k(m)) = m.

To be useful, it should certainly be difficult to find k given only c = E_k(m). But we generally want much more. The goal is to hide m, not k, so a procedure that found m given c would compromise the cryptosystem even if k remained well hidden. Thus, we want the system to resist a ciphertext only attack: the attacker knows c and tries to find m.

In practice, this also might not be enough. The attacker might know a number of pairs plaintext-ciphertext pairs (m₁, c₁), …, (m_r,c_r) where c_i = E_k(m_i) for i = 1, …, r, in addition to knowing c. We want it to be difficult for the attacker to find m in this case also. Thus, we want the system to resist a known plaintext attack.

In many contexts, one needs even stronger forms of attack resistance, where the attacker not only knows a number of plaintext-ciphertext pairs but also gets to choose the plaintext or the cipher text, or both. Where the attacker gets to experiment with the encryption system, it should still be difficult to find the plaintext corresponding to any ciphertext c that differs from those in the known plaintext-ciphertext pairs. These so-called chosen plaintext, chosen ciphertext, and chosen text attacks are much stronger than the simple ciphertext only attack and much harder to defend against. For example, in a chosen ciphertext attack, the attacker would be allowed to obtain the decryption of c+1 when trying to decrypt c.

1.1 Caeser cipher

The Caeser cipher is said to go back to Roman times. Let the message space M and ciphertext space C consist of strings of uppercase letters from the Roman alphabet and spaces. Number the letters of the alphabet starting at 0, so A = 0, B = 1, ..., Z = 25. The key space K = {0, …, 25}. To encrypt, E_k replaces each letter m by the letter (m+k) mod 26.¹ To decrypt, D_k replaces each letter c by the letter (c−k) mod 26.

Consider the problem of breaking the Caeser cipher. Suppose you intercept the ciphertext JXQ. You quickly discover that E₃(GUN) = JXQ. But can you conclude that k=3 and GUN is the corresponding plaintext? Upon further investigation, you discover that E₂₃(MAT) = JXQ. So now you are in a quandary. You have two plausible decryptions and no way to tell for sure which is correct. Have you broken the system or haven't you? According to the "definition" above, you haven't since you haven't found the plaintext with certainty. But you've certainly inflicted serious damage. You've reduced the possible set of 3-letter words that the message might have been down to a small set of possibilities. In many contexts, this is nearly as good as having completely broken the system. This suggests that a formal definition of security must take into account the possibility of "partially breaking" a system, as we did here.

The longer the message, the more likely that only one key will lead to a sensible message. For example, if the ciphertext were "EXB JXQ", you would know that k=3, since D₃(EXB JXQ) = BUY GUN, whereas D₂₃(EXB JXQ) = HAE MAT is nonsense.

At the opposite extreme, consider the Caeser cipher with a one-letter message. If I tell you that the ciphertext is V, you learn absolutely nothing about the plaintext. You reason as follows: If k=0, then m=V. If k=1, then m=U. If k=2, then m=T. And so on through the alphabet. For every possible plaintext message m, there is exactly one key for which m encrypts to V.² So knowing that c = V doesn't give the attacker any information about the plaintext m.

We say such a system is information-theoretically secure. This means that there is no statistical correlation between the plaintext and the ciphertext. To make this definition precise, one must assume that the key k is chosen uniformly at random and is unknown to the attacker. What the attacker sees is a random variable c = E_k(m), where m is a constant and k a uniformly distributed random variable. c is identically distributed for each choice of m; hence, the distribution of c does not depend on m, so knowing c gives no information about m.

Thus, we see that the same Caeser cipher can be information-theoretically secure, or partially secure, or completely breakable, depending on the length of the message to which it is applied.

2 Unbreakable Codes

A code can be hard for Eve to break for two different reasons: (i) She may not have enough information to uniquely determine the plaintext, or (ii) she may have enough information, but it takes her too much time to find the plaintext. In the first case, we say the code is information-theoretically secure. In the second, it is computationally secure. The first notion is based on statistical inference; the second on computational complexity.

2.1 Statistical inference

Statistics is about guessing underlying parameter values from samples drawn from a random distribution. For example, given a biased coin with probability p of heads, one wants to determine p by observing the outcomes of several coin flips. Of course, one can never determine p exactly, so one tries instead to guess a value p′ that is "likely" to be "close" to p.

Eve's job in attempting to break a cryptosystem is analogous to that of the statistician. The analog of a coin flip is for Alice to choose a random key k and compute c = E_k(m). Eve tries to guess a value m′ that is "likely" to be "close" to m. Unlike the coin-flip example, Eve generally will only get one sample for a given message m. Neverless, the same notions apply, and the closer Eve can get to finding the true message m′, and the more likely she is to be correct, the weaker our cryptosystem is.

The strongest case of security is where Eve can get no information at all about m. This case obtains when c is completely independent of m, that is, when the distribution of c does not depend on m. Recall that a (discrete) probability distribution is a function that assigns a probability p(x) to each point x in the sample space. In our case, each value of m results in a potentially different probability distribution p_m(c), where p_m(c) is the probability of choosing a key for which E_k(m) = c. To say that the distribution of c does not depend on m is the same as saying that all of the functions p_m() are the same.³

For the Caeser cipher with one-letter messages, we have p_m(c) = 1/26 for each possible ciphertext letter c. This is because there are 26 equiprobable keys, and each key maps m to a different letter. In this case, c is independent of m, and therefore knowledge of c gives no information about m. We say the system is information-theoretically secure.

For the Caeser cipher with two-letter messages, we lose this independence. For example, let c = VV. Then p_AA(VV) = 1/26, but p_BC(VV) = 0. As functions, p_AA() ≠ p_BC(), and Eve can get information about m from observing c. In particular, Eve knows that m is one of the messages in the set {AA, BB, …ZZ}.

2.2 One-time pad

A one-time pad is a simple information-theoretically secure cryptosystem based on the bitwise exclusive-or operator (XOR), which we write as ⊕. Recall for Boolean values x and y that x ⊕y is true when exactly one of x and y is true, but is false if x and y are either both false or both true. Exclusive-or is therefore equivalent to sum modulo two, when true is represented by 1 and false by 0. In symbols,

x ⊕y = (x + y) mod 2.

With a one-time pad, the plaintext, ciphertext, and key are all assumed to be binary strings of the same length. The encryption function is E_k(m) = k ⊕m, where ⊕ is applied componentwise to corresponding bits of k and m. It's a basic fact of mod 2 addition that addition and subtraction are the same. This implies that E_k is its own inverse; hence D_k(c) = E_k(c) is the decryption function.

D_k(E_k(m)) = k ⊕( k ⊕m) = (k ⊕k) ⊕m = 0 ⊕m = m.

Like the one-letter Caeser cipher, the simple XOR cryptosystem has the property that for every ciphertext string c and every plaintext m, there is exactly one key k such that E_k(m) = c (namely, m ⊕c). Hence, every ciphertext is equally likely no matter what m is, so the one-time pad is information-theoretically secure.

The simple XOR cryptosystem would seem to be the perfect cryptosystem. It works for messages of any length (by choosing a key of the same length). It is simple, easy to encrypt and decrypt, and information-theoretically secure. In fact, it is sometimes used for highly sensitive data. However, it suffers from two major weaknesses that make it not so useful in practice:

The key k must be as long as the message to be encrypted. Key management and key distribution are both difficult problems which we will be discussing later in the course. The longer the keys, the more difficult these problems become.
The same key must never be used more than once. (Hence the term "one-time".) The reason is that the simple XOR cryptosystem immediately succumbs to a known plaintext attack. If Eve knows just one plaintext-ciphertext pair (m₁, c₁), then she can easily compute k = m₁ ⊕c₁, totally breaking the system and allowing her to decrypt all future messages sent with that key. Even in a ciphertext-only situation, if Eve has two ciphertexts c₁ and c₂ encrypted by the same key k, she can gain significant partial information about the corresponding plaintexts m₁ and m₂. In particular, she can compute m₁ ⊕m₂, since

m₁ ⊕m₂ = (c₁ ⊕k) ⊕(c₂ ⊕k) = c₁ ⊕c₂.

That information, together with other information she might have about the likely content of the messages, may be enough for her to seriously compromise the secrecy of the data.

3 Theoretically Breakable Codes

For any fixed message m, the number of possible ciphertexts is at most the size of the key space. (It will be even less if two or more keys map m to the same ciphertext, which is perfectly possible.) Often the message space and the ciphertext space are the same, in which case E_k(m) is a bijection for each fixed value of k. From these two facts, it follows that for each fixed ciphertext c, the number of possible plaintext messages is at most |K|. Hence, knowing the ciphertext c immediately reduces the uncertainty about m from being a member of a set of size |M| to a member of a set of size |K|. The key spaces of most codes used in practice are much smaller than the space of possible messages and ciphertext. Hence, information-theoretically, Eve gains a lot of information about m just knowing c. Moreover, if the set of meaningful messages is sparse in M and not correlated with the structure of E_k (the usual case), then it highly likely that the only meaningful message that maps to c is Alice's original message m.

3.1 Brute force attacks

A brute force attack is one that tries all possible keys k. For each k, Eve computes m_k = D_k(c) and tests if m_k is meaningful. If so, then m_k is a plausible decryption of c. If exactly one m_k is found that is meaningful, then Eve knows that m_k = m.

Given long enough messages, the Caeser cipher is easily broken by brute force-one simply tries all 26 possible keys and sees which leads to a sensible plaintext. The Caeser cipher succumbs because the key space is so small.

With modern computers, it is quite feasible for an attacker to try millions ( ∼ 2²⁰) or billions ( ∼ 2³⁰) of keys. Of course, the attacker also needs some automated test to determine when she has a likely candidate for the real key, but such tests are often easy to produce given a little knowledge of the probable message space.

How big is big enough? The DES (Data Encryption Standard) cryptosystem (which we will talk about shortly) has only 56-bit keys for a key space of size 2⁵⁶. A special DES Key Search Machine was built as a collaborative project by Cryptography Research, Advanced Wireless Technologies, and EFF. (See http://www.cryptography.com/resources/whitepapers/DES.html for details.) This machine was capable of searching 90 billion keys/second and discovered the RSA DES Challenge key on July 15, 1998, after searching for 56 hours. The entire project cost was under $250,000.

Today, 80-bit keys are probably still safe, but only barely so. 2⁸⁰ is only about 16 million times as large as the DES key space, and tens of thousand of machine on the internet could conceivably be deployed in breaking a code. If not quite feasible using today's PC's (and I'm not saying it isn't), it's not that far-fetched to imagine searching an 80-bit key space in the foreseeable future. Much better are systems like triple DES (with 112-bit keys) and AES (with 128-bit keys), which will probably always be safe from brute-force attacks (but not necessarily from other kinds of attacks).

Footnotes:

¹x mod 26 is the remainder of x divided by 26.

²Why does it matter that there is exactly one?

³One often assumes that the parameters themselves are also drawn from some distribution. This enables one to talk, for example, about the probability that the ciphertext = "QBA". But it seems somewhat counterintuitive to assume that Alice merely sends random message according to some fixed (but unknown) probability distribution. Therefore, we prefer to avoid such assumptions whenever possible.

File translated from T_EX by T_TH, version 3.66.
On 18 Sep 2005, 17:46.