CFB and OFB block chaining modes (see
Lecture Notes, Week 2) can be
naturally extended to stream ciphers on units smaller than full
blocks. The idea is to use a shift register R to accumulate the
feedback bits from previous stages of encryption so that the
full-sized blocks needed by the block chaining method are available.
R is initialized to some public initialization vector.
Assume for sake of discussion a 64-bit block size for the underlying
block cipher and a character size of s-bits. (Think of s=8.) Let
B = {0, 1}. We define two operations: σ: B64 → Bs
and μ: B64×Bs → B64, where σ(x) is the
leftmost s bits of x, and μ(x,c) is the rightmost 64 bits of
the bit string xc.
The extended version of CFB and OFB are very similar. Both compute a
byte keyki and use it to encrypt message byte mi with
a simple XOR cipher. That is, ci = mi ⊕ki. In both modes,
ki can be computed knowing only the ciphertext and master key, so
Bob computes ki and then decrypts by computing mi = ci ⊕ki. Finally, both modes compute ki = σ(Ek(Ri)) where
Ri is the new contents of the shift register at stage i. The two
modes differ in how they update the shift register. In extended CFB
mode, Ri = μ(Ri−1, ci−1). In extended OFB mode, Ri = μ(Ri−1, ki−1). Thus, CFB updates R using the previous
ciphertext byte, whereas OFB updates it using the previous byte key.
The differences between the two modes seem minor, but they have
profound implications on the resulting cryptosystem. In CFB mode, the
loss of ciphertext byte ci will cause mi and several succeeding
message bytes to become undecipherable. At first sight it might seem
that all future message bytes would be lost, but if one looks
carefully at the shift register updating algorithm, one sees that Rj = cj−8cj−7 …cj−2cj−1 (in our special case of
s=8), so it depends on only the last eight ciphertext bytes. Hence,
Bob will be able to recover plaintext bytes beginning with mi+8
after the loss of ci. In OFB mode, Ri depends only on i and
the master key k (and the initialization vector IV), so loss of a
ciphertext byte causes loss of only the corresponding plaintext byte.
The downside of OFB is the same as for the one-time pad and other
simple XOR ciphers, namely, if two message streams are encrypted using
the same master key, then the XOR of their encryptions is the same as
the XOR of the plaintexts. This allows Eve to recover potentially
useful information about the plaintexts and renders the method
vulnerable to a known plaintext attack. CFB does not suffer from this
problem since different messages lead to different ciphertexts and
hence different key streams. However, even CFB mode has the
undesirable property that the key streams will be the same up to and
including the first byte in which the two message streams differ.
This will enable Eve to determine the length of the common prefix of
the two message streams and also to determine the XOR of the first
bytes at which they differ.
One way around this problem in both ciphers is to use a different
initialization vector for each message. The IV is sent to B in the
clear, along with the ciphertext. R=R0 is initialized to IV, then
k0 = σ(Ek(R0)) is computed, and then normal encryption
proceeds.
I have been talking loosely about the "information content" of a
message and the fact that normal English text is quite "redundant".
Information theory, pioneered by Claude Shannon half a century ago,
provides an elegant and rigorous framework for studying these notions.
I present just a bit of it here to give you a flavor of how one might
proceed.
Redundancy in language refers to the fact that most combinations of
letters do not form a meaningful English-language message. Meaning of
course exists at many different levels. "QXUUUOV FMMZ" is not
meaningful because the individual "words" are non-sense. "FUZZY
WILL ARE" contains three valid words, but they are combined in an
ungrammatical and non-sensible order. "PURPLE COWS SURF MODEMS" is
grammatically correct but lacks semantic meaning.
In our abstract theory, we assume some classification of the entire
message space \mcM into a meaningful subset \mcM′ and a
meaningless complement \mcM−\mcM′, and we assume Alice sends
only meaningful messages.
Consider the usual case where Σ is a finite alphabet, and
\mcM = Σ* consists of all strings over Σ. For a set
of strings S, let Sn be the subset of strings in S of length
n. We define the "number of bits" of information in an arbitrary
member of Sn to be B(Sn) = log2(|Sn|). This makes sense, for
if one imagines listing the elements of Sn in a table, the length
of the table is |Sn|, and integers of length \ceillog2(|Sn|)
are adequate to index the table. Let b(S,n) = B(Sn)/n, the average
bits per character for a string in Sn. Clearly, b(Σ*, n) = log2s, where s = |Σ|. The ratio b(S,n)/b(Σ*,n) = b(S,n)/(log2s) compares the amount of information per character for
strings in Sn to the amount of information per character for
arbitrary strings. This ratio is 1 when Sn = Σn and is 0
when |Sn| = 1.1
The above argues that there is a coding scheme for Sn such that the
average length representation is ≈ log2(|Sn|). However, one
does not always want to assume that all words are equally likely.
When different words of Sn have different probabilities of
occurring, the expected length representation is minimized by a
Huffman code and can be much less than the average encoding length in
the unweighted case.
The notion of entropy extends the informal definition of
B(Sn) to this case. We imagine an idealized encoding scheme in
which each word x is represented by −log2(px) bits, where px
is its probability of occurring.2 Then the expected length of an
encoding of a random word in Sn is the weighted average, where we
weight the length of each representation x by its probability px.
Formally, if X is a random variable ranging over Sn, we define
its entropy H(X) to be
H(X) =
∑ x ∈ Sn
−px log2(px)
Note that H(X) = B(Sn) in the special case that all elements are
equally likely. That is, if px = 1/|Sn| for all x ∈ Sn,
we have
H(X)
=
∑ x ∈ Sn
−
1
|Sn|
log2
1
|Sn|
=
−log2
1
|Sn|
=
log2(|Sn|)
=
B(Sn)
In a similar way, we extend the definition of b(S, n), the
"average" bits per character, by letting h(X, n) = H(X)/n. We now
define the redundancy of X to be ρn = 1 − h(X)/(log2s). Thus, the redundancy is 0 when all strings of Sn are equally
likely and 1 when one string x occurs with probability px = 1 and
all other strings have probability 0.
Now, suppose \mcM′ is the set of meaningful English language
messages, however one wants to define that, and to keep this
discussion simple, we assume that all meaningful messages are equally
likely. It is reasonable to assume in our model that b(\mcM′,n)
approaches a limit b as n → ∞. The reason for this is that
|\mcM′n+m| ≈ |\mcM′n| ·|\mcM′m| when n
and m are both large. Hence, b(\mcM′,n) = log2(|\mcM′|)/n ≈ b. Thus, we will assume that |\mcM′n| = 2bn and the
redundancy is ρn = 1 − b/(log2s).
Suppose Eve intercepts the ciphertext c = Ek(m) for a message m
of length n, where k ∈ \mcK is a randomly selected key. The
preimage C of c in \mcM under the various choices of k has
size at most |\mcK|, and possibly less, since we in general make
no assumption that c always decrypts to different messages under
different keys. Obviously, m ∈ C. If it happens that C ∩\mcM′ is the singleton set {m}, then Eve knows that m is the
correct plaintext decryption of c. The probability of this
occurring depends on both the length of the plaintext and also on the
particular encryption function being used.
Two extreme cases are worth mentioning. Suppose the encryption
function is "well matched" to the meaningful messages in the sense
that Ek(m) is always in some subset \mcC′ of \mcC whenever
m ∈ \mcM′, and Ek(m) is in \mcC − \mcC′ whenever m ∈ \mcM−\mcM′. Then the inverse image of every c ∈ \mcC′ is
contained in \mcM′, so Eve is never able to uniquely identify m
from c, no matter how much time she spends (unless the inverse image
happens to be a singleton set, which is obviously very insecure).
To the contrary, one could imagine an encryption function such that
the inverse image of every c ∈ \mcC contains either zero or one
meaningful message. In such a case, Eve can always find m from c,
given enough time.
In practice, it is often assumed that the encryption function is not
at all correlated with the notion of meaningful messages. This means
that the family of encryption functions {Ek}k ∈ \mcK
behaves like a randomly-chosen such family. In that case, Eve has a
certain probability of unique decryption, which goes to 1 as n gets
large.
We can approximate this probability by a simple calculation. Let
m0 ∈ \mcMn′, k0 ∈ \mcK, and let c0 = Ek0(m0).
We wish to estimate
This is the probability that the only meaningful message in the
preimage of c0 is the original message m0. For this to be the
case, it must be that every other key k ≠ k0 decrypts c0 to a
message in \mcMn−\mcMn′. Assuming |\mcMn| = |\mcCn|
and each decryption function is a random permutation, the probability
that one key causes c0 to decrypt to a meaningless message is
qn =
|\mcMn − \mcMn′|
|\mcMn|
= 1 −
|\mcMn′|
|\mcMn|
.
But under our assumptions, |\mcMn′| = 2bn and |\mcMn| = sn,
so |\mcM′n|/|\mcMn| = 2bn/sn. Since b = (1−ρn)log2s,
it follows that
qn = 1 −
2bn
sn
= 1 −
s(1−ρn)n
sn
= 1 − (s−ρn)n .
Now, the probability that all t = |\mcK|−1 keys other than k0
cause c0 to decrypt to a meaningless message is
pn = (qn)t = (1−s−n·ρn)t
By the binomial theorem, pn is approximately 1 −ts−n·ρn when s−n·ρn is small. Since t,
s, and ρn are all constants independent of n, we see that
s−n·ρn→ 0 as n→ ∞. Hence,
pn → 1 as n → ∞, so large messages can
be uniquely decrypted with probability approaching 1.
The Data Encryption Standard is a block cipher that operates on 64-bit
blocks and uses a 56-bit key. It was the standard algorithm for data
encryption for over 20 years until it became widely acknowledged that
the key length was too short and it was subject to brute force attack.
(The new standard used the Rijndael algorithm and is called AES.)
DES is based on a Feistel network. This is a general method
for building an invertible function from any function f that
scrambles bits. It consists of some number of stages. Each stage i
maps a pair of 32-bit words (Li, Ri) to a new pair (Li+1,Ri+1). By applying the stages in sequence, a t-stage network
maps (L0, R0) to (Lt, Rt). The (L0, R0) is the plaintext,
and (Lt, Rt) is the corresponding ciphertext.
Each stage works as follows:
Li+1 = Ri
(1)
Ri+1 = Li ⊕f(Ri, Ki)
(2)
Here, Ki is a subkey, which is generally derived in some
systematic way from the master key k.
The security of a Feistel-based code lies in the construction of the
function f and in the method for producing the subkeys Ki,
However, the invertibility follows just from properties of ⊕.
The inversion problem is to find (Li, Ri) given (Li+1,Ri+1). Equation 1 gives us Ri. Knowing Ri
and Ki, we can compute f(Ri, Ki). We can then solve
equation 2 to get
Li = Ri+1 ⊕f(Ri, Ki)
DES uses a 16 stage Feistel network. The pair L0R0 is constructed
from a 64-bit message by a fixed initial permutation IP. The ciphertext
output is obtained by applying IP−1 to R16L16.
The scrambling function f(Ri, Ki) operates on a 32-bit data block
and a 48-bit key block. Thus, a total of 48×16 = 768 key bits
are used. They are all derived in a systematic way from the 56-bit
primary key and are far from independent of each other.
The scrambling function f(Ri, Ki) is the heart of DES. It
operates on a 32-bit data block and a 48-bit key block. Thus, a total
of 48×16 = 768 key bits are used. They are all derived in a
systematic way from the 56-bit master key k and are far from
independent of each other. In a little more detail, k is split into
two 28-bit pieces C and D. At each stage, C and D are rotated
by one or two bit positions. Subkey Ki is then obtained by
applying a fixed permutation (transposition) to CD. (See Table
3.4c of the text.)
The scrambling function itself is rather involved. However, at its
heart are eight "S-boxes". These are boxes with 6 binary inputs
c0, x1, x2, x3, x4, c1 and 4 binary outputs y1, y2, y3,y4. Each computes some fixed function in {0,1}6 → {0,1}4.
Moreover, each S-box has the very special property that for each of
the four possible ways of fixing the values of (c0, c1) to Boolean
constants, the resulting function on the remaining four inputs x1,…, x4 is a permutation from {0,1}4 → {0,1}4.
Therefore, we can regard an S-box as performing a substitution on
four-bit "characters", where the substitution performed depends both
on the structure of the particular S-box and on the values of its
"control inputs" c0 and c1. The eight S-boxes are all
different and are specified by their truth tables.
The S-boxes together have a total of 48 input lines. Each of these
lines is the output of a corresponding ⊕-gate. One input of
each of these ⊕-gates is connected to a corresponding bit of
the 48-bit subkey Ki. (This is the only place that the key enters
into DES.) The other input of each ⊕-gate is connected to
one of the 32 bits of the first argument of f. Since there are 48
⊕-gates and only 32 bits in the first argument to f, some of
those bits get used more than once. The mapping of input bits to
⊕-gates is called the expansion permutationE and is
given by Table 3.2(c) in the text. By looking at the table, one sees
that the ⊕-gates connected to the six inputs c0, x1, x2,x3, x4, c1 for S-box 1 are in turn connected to the input bits
32, 1, 2, 3, 4, 5, respectively. For S-box 2, they go to bits 4,5, 6, 7, 8, 9, etc. Thus, inputs bits 1, 4, 5, 8, 9, …28, 29,32 are each used twice, and the remaining input bits are each used
once.
Finally, the 32 bits of output from the S-boxes are passed through a
fixed permutation P (transposition) that spreads out the output
bits. The outputs of a single S-box at one stage of DES become inputs
to several different S-boxes at the next stage. This helps provide
the desirable "avalanche" effect described in the text.
We have mentioned previously that DES is vulnerable to a brute force
attack because of its small key size of only 56 bits. However, it has
turned out to be remarkably resistant to two recently discovered
cryptanalysis attacks, differential cryptanalysis and linear
cryptanalysis. The former can break DES using "only" 247
chosen ciphertext pairs. The latter works with 243 chosen
plaintext pairs. Neither attack is feasible in practice.
DES has now been replaced as a national standard by the new AES
(Advanced Encryption Standard), based on the Rijndael algorithm,
developed by two Dutch computer scientists. AES supports key sizes of
128, 192, and 256 bits and works on 128-bit blocks. We will say more
about it later in the course.
Footnotes:
1It is undefined when Sn = ∅.
2This assumption is approximately
true for Huffman encodings and becomes exactly true in a "limiting" sense
that is beyond the scope of these notes.
File translated from
TEX
by
TTH,
version 3.40. On 1 Feb 2005, 14:54.