[Course home page] [Lecture notes]
YALE UNIVERSITY
DEPARTMENT OF COMPUTER SCIENCE
CPSC 467a: Cryptography and Computer Security
Notes 5 (rev. 1)

Professor M. J. Fischer September 15, 2005
 
Lecture Notes 5

1  Stream ciphers from CFB and OFB modes

CFB and OFB block chaining modes (see Lecture Notes 4) can be naturally extended to stream ciphers on units smaller than full blocks. The idea is to use a shift register X to accumulate the feedback bits from previous stages of encryption so that the full-sized blocks needed by the block chaining method are available. X is initialized to some public initialization vector.
Assume for sake of discussion a 64-bit block size for the underlying block cipher and a character size of s-bits. (Think of s=8.) Let B = {0, 1}. We define two operations: Lm and Rm : B64Bm. Lm(x) are the leftmost m bits of x, and Rm(x) are the rightmost m bits of x.
The extended version of CFB and OFB are very similar. Both compute a byte key ki and use it to encrypt message byte mi with a simple XOR cipher. That is, ci = miki. In both modes, ki can be computed knowing only the ciphertext and master key, so Bob computes ki and then decrypts by computing mi = ciki. Finally, both modes compute ki = Ls(Ek(Xi)) where Xi is the new contents of the shift register at stage i. The two modes differ in how they update the shift register. In extended CFB mode,
Xi = R64−s(Xi−1ci−1,
where `·' denotes concatenation. In extended OFB mode,
Xi = R64−s(Xi−1) ·ki−1.
Thus, CFB updates X using the previous ciphertext byte, whereas OFB updates it using the previous byte key.
The differences between the two modes seem minor, but they have profound implications on the resulting cryptosystem. In CFB mode, the loss of ciphertext byte ci will cause mi and several succeeding message bytes to become undecipherable. At first sight it might seem that all future message bytes would be lost, but if one looks carefully at the shift register updating algorithm, one sees that Xj = cj−8 cj−7cj−2 cj−1 (in our special case of s=8), so it depends on only the last eight ciphertext bytes. Hence, Bob will be able to recover plaintext bytes beginning with mi+8 after the loss of ci. In OFB mode, Xi depends only on i and the master key k (and the initialization vector IV), so loss of a ciphertext byte causes loss of only the corresponding plaintext byte.
The downside of OFB is the same as for the one-time pad and other simple XOR ciphers, namely, if two message streams are encrypted using the same master key, then the XOR of their encryptions is the same as the XOR of the plaintexts. This allows Eve to recover potentially useful information about the plaintexts and renders the method vulnerable to a known plaintext attack. CFB does not suffer from this problem since different messages lead to different ciphertexts and hence different key streams. However, even CFB mode has the undesirable property that the key streams will be the same up to and including the first byte in which the two message streams differ. This will enable Eve to determine the length of the common prefix of the two message streams and also to determine the XOR of the first bytes at which they differ.
One way around this problem in both ciphers is to use a different initialization vector for each message. The IV is sent to Bob in the clear, along with the ciphertext. X=X0 is initialized to IV, then k0 = Ls(Ek(X0)) is computed, and then normal encryption proceeds.

2  Entropy

I have been talking loosely about the "information content" of a message and the fact that normal English text is quite "redundant". Information theory, pioneered by Claude Shannon half a century ago, provides an elegant and rigorous framework for studying these notions. I present just a bit of it here to give you a flavor of how one might proceed.
Redundancy in language refers to the fact that most combinations of letters do not form a meaningful English-language message. Meaning of course exists at many different levels. "QXUUUOV FMMZ" is not meaningful because the individual "words" are non-sense. "FUZZY WILL ARE" contains three valid words, but they are combined in an ungrammatical and non-sensible order. "PURPLE COWS SURF MODEMS" is grammatically correct but lacks semantic meaning.
In our abstract theory, we assume some classification of the entire message space M into a meaningful subset M′ and a meaningless complement MM′, and we assume Alice sends only meaningful messages.
Consider the usual case where Σ is a finite alphabet, and M = Σ* consists of all strings over Σ. For a set of strings S, let Sn be the subset of strings in S of length n. We define the "number of bits" of information in an arbitrary member of Sn to be B(Sn) = log2(|Sn|). This makes sense, for if one imagines listing the elements of Sn in a table, the length of the table is |Sn|, and integers of length \ceillog2(|Sn|) are adequate to index the table. Let b(S,n) = B(Sn)/n, the average bits per character for a string in Sn. Clearly, b*, n) = log2 s, where s = |Σ|. The ratio b(S,n)/b*,n) = b(S,n)/(log2 s) compares the amount of information per character for strings in Sn to the amount of information per character for arbitrary strings. This ratio is 1 when Sn = Σn and is 0 when |Sn| = 1.1
The above argues that there is a coding scheme for Sn such that the average length representation is ≈ log2(|Sn|). However, one does not always want to assume that all words are equally likely. When different words of Sn have different probabilities of occurring, the expected length representation is minimized by a Huffman code and can be much less than the average encoding length in the unweighted case.
The notion of entropy extends the informal definition of B(Sn) to this case. We imagine an idealized encoding scheme in which each word x is represented by −log2(px) bits, where px is its probability of occurring.2 Then the expected length of an encoding of a random word in Sn is the weighted average, where we weight the length of each representation x by its probability px. Formally, if X is a random variable ranging over Sn, we define its entropy H(X) to be
H(X) =

xSn 
px log2(px)
Note that H(X) = B(Sn) in the special case that all elements are equally likely. That is, if px = 1/|Sn| for all xSn, we have
H(X)
=


xSn 
1

|Sn|
log2
1

|Sn|

=
−log2
1

|Sn|

=
log2(|Sn|)
=
B(Sn)
In a similar way, we extend the definition of b(S, n), the "average" bits per character, by letting h(X, n) = H(X)/n. We now define the redundancy of X to be ρn = 1 − h(X)/(log2s). Thus, the redundancy is 0 when all strings of Sn are equally likely and 1 when one string x occurs with probability px = 1 and all other strings have probability 0.
Now, suppose M′ is the set of meaningful English language messages, however one wants to define that, and to keep this discussion simple, we assume that all meaningful messages are equally likely. It is reasonable to assume in our model that b(M′,n) approaches a limit b as n → ∞. The reason for this is that |Mn+m| ≈ |Mn| ·|Mm| when n and m are both large. Hence, b(M′,n) = log2(|M′|)/nb. Thus, we will assume that |Mn| = 2bn and the redundancy is ρn = 1 − b/(log2 s).
Suppose Eve intercepts the ciphertext c = Ek(m) for a message m of length n, where kK is a randomly selected key. The preimage C of c in M under the various choices of k has size at most |K|, and possibly less, since we in general make no assumption that c always decrypts to different messages under different keys. Obviously, mC. If it happens that CM′ is the singleton set {m}, then Eve knows that m is the correct plaintext decryption of c. The probability of this occurring depends on both the length of the plaintext and also on the particular encryption function being used.
Two extreme cases are worth mentioning. Suppose the encryption function is "well matched" to the meaningful messages in the sense that Ek(m) is always in some subset C′ of C whenever mM′, and Ek(m) is in CC′ whenever mMM′. Then the inverse image of every cC′ is contained in M′, so Eve is never able to uniquely identify m from c, no matter how much time she spends (unless the inverse image happens to be a singleton set, which is obviously very insecure).
To the contrary, one could imagine an encryption function such that the inverse image of every cC contains either zero or one meaningful message. In such a case, Eve can always find m from c, given enough time.
In practice, it is often assumed that the encryption function is not at all correlated with the notion of meaningful messages. This means that the family of encryption functions {Ek}kK behaves like a randomly-chosen such family. In that case, Eve has a certain probability of unique decryption, which goes to 1 as n gets large.
We can approximate this probability by a simple calculation. Let m0Mn′, k0K, and let c0 = Ek0(m0). We wish to estimate
pn = \prob[  { mMn′ | Ek(m) = c0 for some kK} = {m0}  ]  .
This is the probability that the only meaningful message in the preimage of c0 is the original message m0. For this to be the case, it must be that every other key kk0 decrypts c0 to a message in MnMn′. Assuming |Mn| = |Cn| and each decryption function is a random permutation, the probability that one key causes c0 to decrypt to a meaningless message is
qn = |MnMn′|

|Mn|
= 1 − |Mn′|

|Mn|
  .
But under our assumptions, |Mn′| = 2bn and |Mn| = sn, so |Mn|/|Mn| = 2bn/sn. Since b = (1−ρn)log2 s, it follows that
qn = 1 − 2bn

sn
= 1 − s(1−ρn)n

sn
= 1 − (s−ρn)n  .
Now, the probability that all t = |K|−1 keys other than k0 cause c0 to decrypt to a meaningless message is
pn = (qn)t = (1−sn·ρn)t
By the binomial theorem, pn is approximately 1 −tsn·ρn when sn·ρn is small. Since t, s, and ρn are all constants independent of n, we see that sn·ρn→ 0 as n→ ∞. Hence, pn → 1 as n → ∞, so large messages can be uniquely decrypted with probability approaching 1.

Footnotes:

1It is undefined when Sn = ∅.
2This assumption is approximately true for Huffman encodings and becomes exactly true in a "limiting" sense that is beyond the scope of these notes.


File translated from TEX by TTH, version 3.66.
On 18 Sep 2005, 23:16.