Confusion and diffusion


In cryptography, confusion and diffusion are two properties of the operation of a secure cipher identified by Claude Shannon in his 1945 classified report A Mathematical Theory of Cryptography. These properties, when present, work to thwart the application of statistics and other methods of cryptanalysis.
These concepts are also important in the design of robust hash functions and pseudorandom number generators where decorrelation of the generated values is of paramount importance.

Definition

Confusion
Confusion means that each binary digit of the ciphertext should depend on several parts of the key, obscuring the connections between the two.
The property of confusion hides the relationship between the ciphertext and the key.
This property makes it difficult to find the key from the ciphertext and if a single bit in a key is changed, the calculation of the values of most or all of the bits in the ciphertext will be affected.
Confusion increases the ambiguity of ciphertext and it is used by both block and stream ciphers.
Diffusion
Diffusion means that if we change a single bit of the plaintext, then half of the bits in the ciphertext should change, and similarly, if we change one bit of the ciphertext, then approximately one half of the plaintext bits should change. Since a bit can have only two states, when they are all re-evaluated and changed from one seemingly random position to another, half of the bits will have changed state.
The idea of diffusion is to hide the relationship between the ciphertext and the plain text.
This will make it hard for an attacker who tries to find out the plain text and it increases the redundancy of plain text by spreading it across the rows and columns; it is achieved through transposition of algorithm and it is used by block ciphers only.

Theory

In Shannon's original definitions, confusion refers to making the relationship between the ciphertext and the symmetric key as complex and involved as possible; diffusion refers to dissipating the statistical structure of plaintext over the bulk of ciphertext. This complexity is generally implemented through a well-defined and repeatable series of substitutions and permutations. Substitution refers to the replacement of certain components with other components, following certain rules. Permutation refers to manipulation of the order of bits according to some algorithm. To be effective, any non-uniformity of plaintext bits needs to be redistributed across much larger structures in the ciphertext, making that non-uniformity much harder to detect.
In particular, for a randomly chosen input, if one flips the i-th bit, then the probability that the j-th output bit will change should be one half, for any i and j—this is termed the strict avalanche criterion. More generally, one may require that flipping a fixed set of bits should change each output bit with probability one half.
One aim of confusion is to make it very hard to find the key even if one has a large number of plaintext-ciphertext pairs produced with the same key. Therefore, each bit of the ciphertext should depend on the entire key, and in different ways on different bits of the key. In particular, changing one bit of the key should change the ciphertext completely.
The simplest way to achieve both diffusion and confusion is to use a substitution-permutation network. In these systems, the plaintext and the key often have a very similar role in producing the output, hence the same mechanism ensures both diffusion and confusion.

Applied to encryption

Designing an encryption method uses both of the principles of confusion and diffusion.
Confusion means that the process drastically changes data from the input to the output, for example, by translating the data through a non-linear table created from the key. There are many ways to reverse linear calculations, so the more non-linear it is, the more analysis tools it breaks.
Diffusion means that changing a single character of the input will change many characters of the output. Done well, every part of the input affects every part of the output, making analysis much harder. No diffusion process is perfect: it always lets through some patterns. Good diffusion scatters those patterns widely through the output, and if there are several patterns making it through they scramble each other. This makes patterns vastly harder to spot, and vastly increases the amount of data to analyze to break the cipher.

Analysis of AES

The Advanced Encryption Standard has both excellent confusion and diffusion. Its confusion look-up tables are very non-linear and good at destroying patterns. Its diffusion stage spreads every part of the input to every part of the output: changing one bit of input changes half the output bits on average. Both confusion and diffusion are repeated multiple times for each input to increase the amount of scrambling. The secret key is mixed in at every stage so that an attacker cannot precalculate what the cipher does.
None of this happens when a simple one-stage scramble is based on a key. Input patterns would flow straight through to the output. It might look random to the eye but analysis would find obvious patterns and the cipher could be broken.