CPSC 467b Lecture Notes, Week 3

[Course home page] [Lecture notes] YALE UNIVERSITY
DEPARTMENT OF COMPUTER SCIENCE

CPSC 467b: Cryptography and Computer Security

Week 3 (rev. 2)

Professor M. J. Fischer

January 25 & 27, 2005

Lecture Notes, Week 3

1 Stream ciphers from CFB and OFB modes

CFB and OFB block chaining modes (see Lecture Notes, Week 2) can be naturally extended to stream ciphers on units smaller than full blocks. The idea is to use a shift register R to accumulate the feedback bits from previous stages of encryption so that the full-sized blocks needed by the block chaining method are available. R is initialized to some public initialization vector.

Assume for sake of discussion a 64-bit block size for the underlying block cipher and a character size of s-bits. (Think of s=8.) Let B = {0, 1}. We define two operations: σ: B⁶⁴ → B^s and μ: B⁶⁴×B^s → B⁶⁴, where σ(x) is the leftmost s bits of x, and μ(x,c) is the rightmost 64 bits of the bit string x c.

The extended version of CFB and OFB are very similar. Both compute a byte key k_i and use it to encrypt message byte m_i with a simple XOR cipher. That is, c_i = m_i ⊕k_i. In both modes, k_i can be computed knowing only the ciphertext and master key, so Bob computes k_i and then decrypts by computing m_i = c_i ⊕k_i. Finally, both modes compute k_i = σ(E_k(R_i)) where R_i is the new contents of the shift register at stage i. The two modes differ in how they update the shift register. In extended CFB mode, R_i = μ(R_i−1, c_i−1). In extended OFB mode, R_i = μ(R_i−1, k_i−1). Thus, CFB updates R using the previous ciphertext byte, whereas OFB updates it using the previous byte key.

The differences between the two modes seem minor, but they have profound implications on the resulting cryptosystem. In CFB mode, the loss of ciphertext byte c_i will cause m_i and several succeeding message bytes to become undecipherable. At first sight it might seem that all future message bytes would be lost, but if one looks carefully at the shift register updating algorithm, one sees that R_j = c_j−8 c_j−7 …c_j−2 c_j−1 (in our special case of s=8), so it depends on only the last eight ciphertext bytes. Hence, Bob will be able to recover plaintext bytes beginning with m_i+8 after the loss of c_i. In OFB mode, R_i depends only on i and the master key k (and the initialization vector IV), so loss of a ciphertext byte causes loss of only the corresponding plaintext byte.

The downside of OFB is the same as for the one-time pad and other simple XOR ciphers, namely, if two message streams are encrypted using the same master key, then the XOR of their encryptions is the same as the XOR of the plaintexts. This allows Eve to recover potentially useful information about the plaintexts and renders the method vulnerable to a known plaintext attack. CFB does not suffer from this problem since different messages lead to different ciphertexts and hence different key streams. However, even CFB mode has the undesirable property that the key streams will be the same up to and including the first byte in which the two message streams differ. This will enable Eve to determine the length of the common prefix of the two message streams and also to determine the XOR of the first bytes at which they differ.

One way around this problem in both ciphers is to use a different initialization vector for each message. The IV is sent to B in the clear, along with the ciphertext. R=R₀ is initialized to IV, then k₀ = σ(E_k(R₀)) is computed, and then normal encryption proceeds.

2 Entropy

I have been talking loosely about the "information content" of a message and the fact that normal English text is quite "redundant". Information theory, pioneered by Claude Shannon half a century ago, provides an elegant and rigorous framework for studying these notions. I present just a bit of it here to give you a flavor of how one might proceed.

Redundancy in language refers to the fact that most combinations of letters do not form a meaningful English-language message. Meaning of course exists at many different levels. "QXUUUOV FMMZ" is not meaningful because the individual "words" are non-sense. "FUZZY WILL ARE" contains three valid words, but they are combined in an ungrammatical and non-sensible order. "PURPLE COWS SURF MODEMS" is grammatically correct but lacks semantic meaning.

In our abstract theory, we assume some classification of the entire message space \mcM into a meaningful subset \mcM′ and a meaningless complement \mcM−\mcM′, and we assume Alice sends only meaningful messages.

Consider the usual case where Σ is a finite alphabet, and \mcM = Σ^* consists of all strings over Σ. For a set of strings S, let S_n be the subset of strings in S of length n. We define the "number of bits" of information in an arbitrary member of S_n to be B(S_n) = log₂(|S_n|). This makes sense, for if one imagines listing the elements of S_n in a table, the length of the table is |S_n|, and integers of length \ceillog₂(|S_n|) are adequate to index the table. Let b(S,n) = B(S_n)/n, the average bits per character for a string in S_n. Clearly, b(Σ^*, n) = log₂ s, where s = |Σ|. The ratio b(S,n)/b(Σ^*,n) = b(S,n)/(log₂ s) compares the amount of information per character for strings in S_n to the amount of information per character for arbitrary strings. This ratio is 1 when S_n = Σⁿ and is 0 when |S_n| = 1.¹

The above argues that there is a coding scheme for S_n such that the average length representation is ≈ log₂(|S_n|). However, one does not always want to assume that all words are equally likely. When different words of S_n have different probabilities of occurring, the expected length representation is minimized by a Huffman code and can be much less than the average encoding length in the unweighted case.

The notion of entropy extends the informal definition of B(S_n) to this case. We imagine an idealized encoding scheme in which each word x is represented by −log₂(p_x) bits, where p_x is its probability of occurring.² Then the expected length of an encoding of a random word in S_n is the weighted average, where we weight the length of each representation x by its probability p_x. Formally, if X is a random variable ranging over S_n, we define its entropy H(X) to be

H(X) =

∑
x ∈ S_n

−p_x log₂(p_x)

Note that H(X) = B(S_n) in the special case that all elements are equally likely. That is, if p_x = 1/|S_n| for all x ∈ S_n, we have

H(X)

∑
x ∈ S_n

−

|S_n|

log₂




|S_n|




−log₂




|S_n|




log₂(|S_n|)

B(S_n)

In a similar way, we extend the definition of b(S, n), the "average" bits per character, by letting h(X, n) = H(X)/n. We now define the redundancy of X to be ρ_n = 1 − h(X)/(log₂s). Thus, the redundancy is 0 when all strings of S_n are equally likely and 1 when one string x occurs with probability p_x = 1 and all other strings have probability 0.

Now, suppose \mcM′ is the set of meaningful English language messages, however one wants to define that, and to keep this discussion simple, we assume that all meaningful messages are equally likely. It is reasonable to assume in our model that b(\mcM′,n) approaches a limit b as n → ∞. The reason for this is that |\mcM′_n+m| ≈ |\mcM′_n| ·|\mcM′_m| when n and m are both large. Hence, b(\mcM′,n) = log₂(|\mcM′|)/n ≈ b. Thus, we will assume that |\mcM′_n| = 2^bn and the redundancy is ρ_n = 1 − b/(log₂ s).

Suppose Eve intercepts the ciphertext c = E_k(m) for a message m of length n, where k ∈ \mcK is a randomly selected key. The preimage C of c in \mcM under the various choices of k has size at most |\mcK|, and possibly less, since we in general make no assumption that c always decrypts to different messages under different keys. Obviously, m ∈ C. If it happens that C ∩\mcM′ is the singleton set {m}, then Eve knows that m is the correct plaintext decryption of c. The probability of this occurring depends on both the length of the plaintext and also on the particular encryption function being used.

Two extreme cases are worth mentioning. Suppose the encryption function is "well matched" to the meaningful messages in the sense that E_k(m) is always in some subset \mcC′ of \mcC whenever m ∈ \mcM′, and E_k(m) is in \mcC − \mcC′ whenever m ∈ \mcM−\mcM′. Then the inverse image of every c ∈ \mcC′ is contained in \mcM′, so Eve is never able to uniquely identify m from c, no matter how much time she spends (unless the inverse image happens to be a singleton set, which is obviously very insecure).

To the contrary, one could imagine an encryption function such that the inverse image of every c ∈ \mcC contains either zero or one meaningful message. In such a case, Eve can always find m from c, given enough time.

In practice, it is often assumed that the encryption function is not at all correlated with the notion of meaningful messages. This means that the family of encryption functions {E_k}_{k ∈ \mcK} behaves like a randomly-chosen such family. In that case, Eve has a certain probability of unique decryption, which goes to 1 as n gets large.

We can approximate this probability by a simple calculation. Let m₀ ∈ \mcM_n′, k₀ ∈ \mcK, and let c₀ = E_k₀(m₀). We wish to estimate

p_n = \prob[ { m ∈ \mcM_n′ | E_k(m) = c₀ for some k ∈ \mcK} = {m₀} ] .

This is the probability that the only meaningful message in the preimage of c₀ is the original message m₀. For this to be the case, it must be that every other key k ≠ k₀ decrypts c₀ to a message in \mcM_n−\mcM_n′. Assuming |\mcM_n| = |\mcC_n| and each decryption function is a random permutation, the probability that one key causes c₀ to decrypt to a meaningless message is

q_n =

|\mcM_n − \mcM_n′|

|\mcM_n|

= 1 −

|\mcM_n′|

|\mcM_n|

q_n = 1 −

2^bn

sⁿ

= 1 −

s^(1−ρ_n)n

sⁿ

= 1 − (s^−ρ_n)ⁿ .

Now, the probability that all t = |\mcK|−1 keys other than k₀ cause c₀ to decrypt to a meaningless message is

p_n = (q_n)^t = (1−s^−n·ρ_n)^t

By the binomial theorem, p_n is approximately 1 −ts^−n·ρ_n when s^−n·ρ_n is small. Since t, s, and ρ_n are all constants independent of n, we see that s^−n·ρ_n→ 0 as n→ ∞. Hence, p_n → 1 as n → ∞, so large messages can be uniquely decrypted with probability approaching 1.

3 Data Encryption Standard (DES)

The Data Encryption Standard is a block cipher that operates on 64-bit blocks and uses a 56-bit key. It was the standard algorithm for data encryption for over 20 years until it became widely acknowledged that the key length was too short and it was subject to brute force attack. (The new standard used the Rijndael algorithm and is called AES.)

3.1 Feistel Networks

DES is based on a Feistel network. This is a general method for building an invertible function from any function f that scrambles bits. It consists of some number of stages. Each stage i maps a pair of 32-bit words (L_i, R_i) to a new pair (L_i+1,R_i+1). By applying the stages in sequence, a t-stage network maps (L₀, R₀) to (L_t, R_t). The (L₀, R₀) is the plaintext, and (L_t, R_t) is the corresponding ciphertext.

Each stage works as follows:

L_i+1 = R_i

(1)

R_i+1 = L_i ⊕f(R_i, K_i)

(2)

Here, K_i is a subkey, which is generally derived in some systematic way from the master key k.

The security of a Feistel-based code lies in the construction of the function f and in the method for producing the subkeys K_i, However, the invertibility follows just from properties of ⊕.

The inversion problem is to find (L_i, R_i) given (L_i+1,R_i+1). Equation 1 gives us R_i. Knowing R_i and K_i, we can compute f(R_i, K_i). We can then solve equation 2 to get

L_i = R_i+1 ⊕f(R_i, K_i)

DES uses a 16 stage Feistel network. The pair L₀ R₀ is constructed from a 64-bit message by a fixed initial permutation IP. The ciphertext output is obtained by applying IP⁻¹ to R₁₆ L₁₆.

The scrambling function f(R_i, K_i) operates on a 32-bit data block and a 48-bit key block. Thus, a total of 48×16 = 768 key bits are used. They are all derived in a systematic way from the 56-bit primary key and are far from independent of each other.

3.2 The Scrambling Function

The scrambling function f(R_i, K_i) is the heart of DES. It operates on a 32-bit data block and a 48-bit key block. Thus, a total of 48×16 = 768 key bits are used. They are all derived in a systematic way from the 56-bit master key k and are far from independent of each other. In a little more detail, k is split into two 28-bit pieces C and D. At each stage, C and D are rotated by one or two bit positions. Subkey K_i is then obtained by applying a fixed permutation (transposition) to CD. (See Table 3.4c of the text.)

The scrambling function itself is rather involved. However, at its heart are eight "S-boxes". These are boxes with 6 binary inputs c₀, x₁, x₂, x₃, x₄, c₁ and 4 binary outputs y₁, y₂, y₃,y₄. Each computes some fixed function in {0,1}⁶ → {0,1}⁴. Moreover, each S-box has the very special property that for each of the four possible ways of fixing the values of (c₀, c₁) to Boolean constants, the resulting function on the remaining four inputs x₁,…, x₄ is a permutation from {0,1}⁴ → {0,1}⁴. Therefore, we can regard an S-box as performing a substitution on four-bit "characters", where the substitution performed depends both on the structure of the particular S-box and on the values of its "control inputs" c₀ and c₁. The eight S-boxes are all different and are specified by their truth tables.

The S-boxes together have a total of 48 input lines. Each of these lines is the output of a corresponding ⊕-gate. One input of each of these ⊕-gates is connected to a corresponding bit of the 48-bit subkey K_i. (This is the only place that the key enters into DES.) The other input of each ⊕-gate is connected to one of the 32 bits of the first argument of f. Since there are 48 ⊕-gates and only 32 bits in the first argument to f, some of those bits get used more than once. The mapping of input bits to ⊕-gates is called the expansion permutation E and is given by Table 3.2(c) in the text. By looking at the table, one sees that the ⊕-gates connected to the six inputs c₀, x₁, x₂,x₃, x₄, c₁ for S-box 1 are in turn connected to the input bits 32, 1, 2, 3, 4, 5, respectively. For S-box 2, they go to bits 4,5, 6, 7, 8, 9, etc. Thus, inputs bits 1, 4, 5, 8, 9, …28, 29,32 are each used twice, and the remaining input bits are each used once.

Finally, the 32 bits of output from the S-boxes are passed through a fixed permutation P (transposition) that spreads out the output bits. The outputs of a single S-box at one stage of DES become inputs to several different S-boxes at the next stage. This helps provide the desirable "avalanche" effect described in the text.

3.3 Security considerations

We have mentioned previously that DES is vulnerable to a brute force attack because of its small key size of only 56 bits. However, it has turned out to be remarkably resistant to two recently discovered cryptanalysis attacks, differential cryptanalysis and linear cryptanalysis. The former can break DES using "only" 2⁴⁷ chosen ciphertext pairs. The latter works with 2⁴³ chosen plaintext pairs. Neither attack is feasible in practice.

DES has now been replaced as a national standard by the new AES (Advanced Encryption Standard), based on the Rijndael algorithm, developed by two Dutch computer scientists. AES supports key sizes of 128, 192, and 256 bits and works on 128-bit blocks. We will say more about it later in the course.

Footnotes:

¹It is undefined when S_n = ∅.

²This assumption is approximately true for Huffman encodings and becomes exactly true in a "limiting" sense that is beyond the scope of these notes.

File translated from T_EX by T_TH, version 3.40.
On 1 Feb 2005, 14:54.