Zero-knowledge proof


In cryptography, a zero-knowledge proof or zero-knowledge protocol is a method by which one party can prove to another party that they know a value, without conveying any information apart from the fact that they know the value. The essence of zero-knowledge proofs is that it is trivial to prove that one possesses knowledge of certain information by simply revealing it; the challenge is to prove such possession without revealing the information itself or any additional information.
If proving a statement requires that the prover possesses some secret information, then the verifier will not be able to prove the statement to anyone else without possessing the secret information.
The statement being proved must include the assertion that the prover has such knowledge, but not the knowledge itself. Otherwise, the statement would not be proved in zero-knowledge because it provides the verifier with additional information about the statement by the end of the protocol. A zero-knowledge proof of knowledge is a special case when the statement consists only of the fact that the prover possesses the secret information.
Interactive zero-knowledge proofs require interaction between the individual proving their knowledge and the individual validating the proof.
A protocol implementing zero-knowledge proofs of knowledge must necessarily require interactive input from the verifier. This interactive input is usually in the form of one or more challenges such that the responses from the prover will convince the verifier if and only if the statement is true, i.e., if the prover does possess the claimed knowledge. If this were not the case, the verifier could record the execution of the protocol and replay it to convince someone else that they possess the secret information. The new party's acceptance is either justified since the replayer does possess the information, or the acceptance is spurious, i.e., was accepted from someone who does not actually possess the information.
Some forms of non-interactive zero-knowledge proofs exist, but the validity of the proof relies on computational assumptions.

Abstract examples

The Ali Baba cave

There is a well-known story presenting the fundamental ideas of zero-knowledge proofs, first published by Jean-Jacques Quisquater and others in their paper "How to Explain Zero-Knowledge Protocols to Your Children". It is common practice to label the two parties in a zero-knowledge proof as Peggy and Victor.
In this story, Peggy has uncovered the secret word used to open a magic door in a cave. The cave is shaped like a ring, with the entrance on one side and the magic door blocking the opposite side. Victor wants to know whether Peggy knows the secret word; but Peggy, being a very private person, does not want to reveal her knowledge to Victor or to reveal the fact of her knowledge to the world in general.
They label the left and right paths from the entrance A and B. First, Victor waits outside the cave as Peggy goes in. Peggy takes either path A or B; Victor is not allowed to see which path she takes. Then, Victor enters the cave and shouts the name of the path he wants her to use to return, either A or B, chosen at random. Providing she really does know the magic word, this is easy: she opens the door, if necessary, and returns along the desired path.
However, suppose she did not know the word. Then, she would only be able to return by the named path if Victor were to give the name of the same path by which she had entered. Since Victor would choose A or B at random, she would have a 50% chance of guessing correctly. If they were to repeat this trick many times, say 20 times in a row, her chance of successfully anticipating all of Victor's requests would become vanishingly small.
Thus, if Peggy repeatedly appears at the exit Victor names, he can conclude that it is extremely probable that Peggy does, in fact, know the secret word.
One side note with respect to third-party observers: even if Victor is wearing a hidden camera that records the whole transaction, the only thing the camera will record is in one case Victor shouting "A!" and Peggy appearing at A or in the other case Victor shouting "B!" and Peggy appearing at B. A recording of this type would be trivial for any two people to fake. Such a recording will certainly never be convincing to anyone but the original participants. In fact, even a person who was present as an observer at the original experiment would be unconvinced, since Victor and Peggy might have orchestrated the whole "experiment" from start to finish.
Further notice that if Victor chooses his A's and B's by flipping a coin on-camera, this protocol loses its zero-knowledge property; the on-camera coin flip would probably be convincing to any person watching the recording later. Thus, although this does not reveal the secret word to Victor, it does make it possible for Victor to convince the world in general that Peggy has that knowledge—counter to Peggy's stated wishes. However, digital cryptography generally "flips coins" by relying on a pseudo-random number generator, which is akin to a coin with a fixed pattern of heads and tails known only to the coin's owner. If Victor's coin behaved this way, then again it would be possible for Victor and Peggy to have faked the "experiment", so using a pseudo-random number generator would not reveal Peggy's knowledge to the world in the same way that using a flipped coin would.
Notice that Peggy could prove to Victor that she knows the magic word, without revealing it to him, in a single trial. If both Victor and Peggy go together to the mouth of the cave, Victor can watch Peggy go in through A and come out through B. This would prove with certainty that Peggy knows the magic word, without revealing the magic word to Victor. However, such a proof could be observed by a third party, or recorded by Victor and such a proof would be convincing to anybody. In other words, Peggy could not refute such proof by claiming she colluded with Victor, and she is therefore no longer in control of who is aware of her knowledge.

Two balls and the colour-blind friend

Imagine your friend is red-green colour-blind and you have two balls: one red and one green, but otherwise identical. To your friend they seem completely identical and he is skeptical that they are actually distinguishable. You want to prove to him they are in fact differently-coloured, but nothing else; in particular, you do not want to reveal which one is the red and which is the green ball.
Here is the proof system. You give the two balls to your friend and he puts them behind his back. Next, he takes one of the balls and brings it out from behind his back and displays it. He then places it behind his back again and then chooses to reveal just one of the two balls, picking one of the two at random with equal probability. He will ask you, "Did I switch the ball?" This whole procedure is then repeated as often as necessary.
By looking at their colours, you can, of course, say with certainty whether or not he switched them. On the other hand, if they were the same colour and hence indistinguishable, there is no way you could guess correctly with probability higher than 50%.
Since the probability that you would have randomly succeeded at identifying each switch/non-switch is 50%, the probability of having randomly succeeded at all switch/non-switches approaches zero. If you and your friend repeat this "proof" multiple times, your friend should become convinced that the balls are indeed differently coloured.
The above proof is zero-knowledge because your friend never learns which ball is green and which is red; indeed, he gains no knowledge about how to distinguish the balls.

Where's Wally?

Where's Wally? is a picture book where the reader is challenged to find a small character called Wally hidden somewhere on a double-spread page that is filled with many other characters. The pictures are designed so that it is hard to find Wally.
Imagine that you are a professional Where's Wally? solver. A company comes to you with a Where's Wally? book that they need solved. The company wants you to prove that you are actually a professional Where's Wally? solver and thus asks you to find Wally in a picture from their book. The problem is that you don't want to do work for them without being paid.
Both you and the company want to cooperate, but you don't trust each other. It doesn't seem like it's possible to satisfy the company's demand without doing free work for them, but in fact there is a zero-knowledge proof which allows you to prove to the company that you know where Wally is in the picture without revealing to them how you found him, or where he is.
The proof goes as follows: You ask the company representative to turn around, and then you place a very large piece of cardboard over the picture such that the center of the cardboard is positioned over Wally. You cut out a small window in the center of the cardboard such that Wally is visible. You can now ask the company representative to turn around and view the large piece of cardboard with the hole in the middle, and observe that Wally is visible through the hole. The cardboard is large enough that they cannot determine the position of the book under the cardboard. You then ask the representative to turn back around so that you can remove the cardboard and give back the book.
As described, this proof is an illustration only, and not completely rigorous. The company representative would need to be sure that you didn't smuggle a picture of Wally into the room. Something like a tamper-proof glovebox might be used in a more rigorous proof. The above proof also results in the body position of Wally being leaked to the company representative, which may help them find Wally if his body position changes in each Where's Wally? puzzle.

Definition

A zero-knowledge proof must satisfy three properties:
  1. Completeness: if the statement is true, the honest verifier will be convinced of this fact by an honest prover.
  2. Soundness: if the statement is false, no cheating prover can convince the honest verifier that it is true, except with some small probability.
  3. Zero-knowledge: if the statement is true, no verifier learns anything other than the fact that the statement is true. In other words, just knowing the statement is sufficient to imagine a scenario showing that the prover knows the secret. This is formalized by showing that every verifier has some simulator that, given only the statement to be proved, can produce a transcript that "looks like" an interaction between the honest prover and the verifier in question.
The first two of these are properties of more general interactive proof systems. The third is what makes the proof zero-knowledge.
Zero-knowledge proofs are not proofs in the mathematical sense of the term because there is some small probability, the soundness error, that a cheating prover will be able to convince the verifier of a false statement. In other words, zero-knowledge proofs are probabilistic "proofs" rather than deterministic proofs. However, there are techniques to decrease the soundness error to negligibly small values.
A formal definition of zero-knowledge has to use some computational model, the most common one being that of a Turing machine. Let,, and be Turing machines. An interactive proof system with for a language is zero-knowledge if for any probabilistic polynomial time verifier there exists a PPT simulator such that
where is a record of the interactions between and. The prover is modeled as having unlimited computation power. Intuitively, the definition states that an interactive proof system is zero-knowledge if for any verifier there exists an efficient simulator that can reproduce the conversation between and on any given input. The auxiliary string in the definition plays the role of "prior knowledge". The definition implies that cannot use any prior knowledge string to mine information out of its conversation with, because if is also given this prior knowledge then it can reproduce the conversation between and just as before.
The definition given is that of perfect zero-knowledge. Computational zero-knowledge is obtained by requiring that the views of the verifier and the simulator are only computationally indistinguishable, given the auxiliary string.

Practical examples

Discrete log of a given value

We can apply these ideas to a more realistic cryptography application. Peggy wants to prove to Victor that she knows the discrete log of a given value in a given group.
For example, given a value, a large prime and a generator, she wants to prove that she knows a value such that, without revealing. Indeed, knowledge of could be used as a proof of identity, in that Peggy could have such knowledge because she chose a random value that she didn't reveal to anyone, computed and distributed the value of to all potential verifiers, such that at a later time, proving knowledge of is equivalent to proving identity as Peggy.
The protocol proceeds as follows: in each round, Peggy generates a random number, computes and discloses this to Victor. After receiving, Victor randomly issues one of the following two requests: he either requests that Peggy discloses the value of, or the value of. With either answer, Peggy is only disclosing a random value, so no information is disclosed by a correct execution of one round of the protocol.
Victor can verify either answer; if he requested, he can then compute and verify that it matches. If he requested, he can verify that is consistent with this, by computing and verifying that it matches. If Peggy indeed knows the value of, she can respond to either one of Victor's possible challenges.
If Peggy knew or could guess which challenge Victor is going to issue, then she could easily cheat and convince Victor that she knows when she does not: if she knows that Victor is going to request, then she proceeds normally: she picks, computes and discloses to Victor; she will be able to respond to Victor's challenge. On the other hand, if she knows that Victor will request, then she picks a random value, computes, and discloses to Victor as the value of that he is expecting. When Victor challenges her to reveal, she reveals, for which Victor will verify consistency, since he will in turn compute, which matches, since Peggy multiplied by the inverse of.
However, if in either one of the above scenarios Victor issues a challenge other than the one she was expecting and for which she manufactured the result, then she will be unable to respond to the challenge under the assumption of infeasibility of solving the discrete log for this group. If she picked and disclosed, then she will be unable to produce a valid that would pass Victor's verification, given that she does not know. And if she picked a value that poses as, then she would have to respond with the discrete log of the value that she disclosed but Peggy does not know this discrete log, since the value C she disclosed was obtained through arithmetic with known values, and not by computing a power with a known exponent.
Thus, a cheating prover has a 0.5 probability of successfully cheating in one round. By executing a large enough number of rounds, the probability of a cheating prover succeeding can be made arbitrarily low.

Short summary

Peggy proves to know the value of x.
  1. Peggy and Victor agree on a prime and a generator of the multiplicative group of the field.
  2. Peggy calculates the value and transfers the value to Victor.
  3. The following two steps are repeated a number of times.
  4. #Peggy repeatedly picks a random value and calculates. She transfers the value to Victor.
  5. #Victor asks Peggy to calculate and transfer either the value or the value. In the first case Victor verifies. In the second case he verifies.
The value can be seen as the encrypted value of. If is truly random, equally distributed between zero and, this does not leak any information about .

Hamiltonian cycle for a large graph

The following scheme is due to Manuel Blum.
In this scenario, Peggy knows a Hamiltonian cycle for a large graph. Victor knows but not the cycle Finding a Hamiltonian cycle given a large graph is believed to be computationally infeasible, since its corresponding decision version is known to be NP-complete. Peggy will prove that she knows the cycle without simply revealing it.
To show that Peggy knows this Hamiltonian cycle, she and Victor play several rounds of a game.
It is important that the commitment to the graph be such that Victor can verify, in the second case, that the cycle is really made of edges from. This can be done by, for example, committing to every edge separately.

Completeness

If Peggy does know a Hamiltonian cycle in, she can easily satisfy Victor's demand for either the graph isomorphism producing from or a Hamiltonian cycle in .

Zero-knowledge

Peggy's answers do not reveal the original Hamiltonian cycle in. Each round, Victor will learn only 's isomorphism to or a Hamiltonian cycle in. He would need both answers for a single to discover the cycle in, so the information remains unknown as long as Peggy can generate a distinct every round. If Peggy does not know of a Hamiltonian cycle in, but somehow knew in advance what Victor would ask to see each round then she could cheat. For example, if Peggy knew ahead of time that Victor would ask to see the Hamiltonian cycle in then she could generate a Hamiltonian cycle for an unrelated graph. Similarly, if Peggy knew in advance that Victor would ask to see the isomorphism then she could simply generate an isomorphic graph . Victor could simulate the protocol by himself because he knows what he will ask to see. Therefore, Victor gains no information about the Hamiltonian cycle in from the information revealed in each round.

Soundness

If Peggy does not know the information, she can guess which question Victor will ask and generate either a graph isomorphic to or a Hamiltonian cycle for an unrelated graph, but since she does not know a Hamiltonian cycle for she cannot do both. With this guesswork, her chance of fooling Victor is, where is the number of rounds. For all realistic purposes, it is infeasibly difficult to defeat a zero-knowledge proof with a reasonable number of rounds in this way.

Variants of zero-knowledge

Different variants of zero-knowledge can be defined by formalizing the intuitive concept of what is meant by the output of the simulator "looking like" the execution of the real proof protocol in the following ways:

Authentication systems

Research in zero-knowledge proofs has been motivated by authentication systems where one party wants to prove its identity to a second party via some secret information but doesn't want the second party to learn anything about this secret. This is called a "zero-knowledge proof of knowledge". However, a password is typically too small or insufficiently random to be used in many schemes for zero-knowledge proofs of knowledge. A zero-knowledge password proof is a special kind of zero-knowledge proof of knowledge that addresses the limited size of passwords.

Ethical behavior

One of the uses of zero-knowledge proofs within cryptographic protocols is to enforce honest behavior while maintaining privacy. Roughly, the idea is to force a user to prove, using a zero-knowledge proof, that its behavior is correct according to the protocol. Because of soundness, we know that the user must really act honestly in order to be able to provide a valid proof. Because of zero knowledge, we know that the user does not compromise the privacy of its secrets in the process of providing the proof.

Nuclear disarmament

In 2016, the Princeton Plasma Physics Laboratory and Princeton University demonstrated a novel technique that may have applicability to future nuclear disarmament talks. It would allow inspectors to confirm whether or not an object is indeed a nuclear weapon without recording, sharing or revealing the internal workings which might be secret.

Blockchains

It is proposed that ZKPs could be used to guarantee that transactions are valid despite the fact that information about the sender, the recipient and other transaction details remain hidden.

History

Zero-knowledge proofs were first conceived in 1989 by Shafi Goldwasser, Silvio Micali, and Charles Rackoff in their paper "The Knowledge Complexity of Interactive Proof-Systems". This paper introduced the IP hierarchy of interactive proof systems and conceived the concept of knowledge complexity, a measurement of the amount of knowledge about the proof transferred from the prover to the verifier. They also gave the first zero-knowledge proof for a concrete problem, that of deciding quadratic nonresidues mod. Together with a paper by László Babai and Shlomo Moran, this landmark paper invented interactive proof systems, for which all five authors won the first Gödel Prize in 1993.
In their own words, Goldwasser, Micali, and Rackoff say:

Of particular interest is the case where this additional knowledge is essentially 0 and we show that is possible to interactively prove that a number is quadratic non residue mod m releasing 0 additional knowledge. This is surprising as no efficient algorithm for deciding quadratic residuosity mod m is known when m’s factorization is not given. Moreover, all known NP proofs for this problem exhibit the prime factorization of m. This indicates that adding interaction to the proving process, may decrease the amount of knowledge that must be communicated in order to prove a theorem.

The quadratic nonresidue problem has both an NP and a co-NP algorithm, and so lies in the intersection of NP and co-NP. This was also true of several other problems for which zero-knowledge proofs were subsequently discovered, such as an unpublished proof system by Oded Goldreich verifying that a two-prime modulus is not a Blum integer.
Oded Goldreich, Silvio Micali, and Avi Wigderson took this one step further, showing that, assuming the existence of unbreakable encryption, one can create a zero-knowledge proof system for the NP-complete graph coloring problem with three colors. Since every problem in NP can be efficiently reduced to this problem, this means that, under this assumption, all problems in NP have zero-knowledge proofs. The reason for the assumption is that, as in the above example, their protocols require encryption. A commonly cited sufficient condition for the existence of unbreakable encryption is the existence of one-way functions, but it is conceivable that some physical means might also achieve it.
On top of this, they also showed that the graph nonisomorphism problem, the complement of the graph isomorphism problem, has a zero-knowledge proof. This problem is in co-NP, but is not currently known to be in either NP or any practical class. More generally, Russell Impagliazzo and Moti Yung as well as Ben-Or et al. would go on to show that, also assuming one-way functions or unbreakable encryption, that there are zero-knowledge proofs for all problems in IP = PSPACE, or in other words, anything that can be proved by an interactive proof system can be proved with zero knowledge.
Not liking to make unnecessary assumptions, many theorists sought a way to eliminate the necessity of one way functions. One way this was done was with multi-prover interactive proof systems, which have multiple independent provers instead of only one, allowing the verifier to "cross-examine" the provers in isolation to avoid being misled. It can be shown that, without any intractability assumptions, all languages in NP have zero-knowledge proofs in such a system.
It turns out that in an Internet-like setting, where multiple protocols may be executed concurrently, building zero-knowledge proofs is more challenging. The line of research investigating concurrent zero-knowledge proofs was initiated by the work of Dwork, Naor, and Sahai. One particular development along these lines has been the development of witness-indistinguishable proof protocols. The property of witness-indistinguishability is related to that of zero-knowledge, yet witness-indistinguishable protocols do not suffer from the same problems of concurrent execution.
Another variant of zero-knowledge proofs are non-interactive zero-knowledge proofs. Blum, Feldman, and Micali showed that a common random string shared between the prover and the verifier is enough to achieve computational zero-knowledge without requiring interaction.