Paxos (computer science)


Paxos is a family of protocols for solving consensus in a network of unreliable processors.
Consensus is the process of agreeing on one result among a group of participants. This problem becomes difficult when the participants or their communication medium may experience failures.
Consensus protocols are the basis for the state machine replication approach to distributed computing, as suggested by Leslie Lamport and surveyed by Fred Schneider. State machine replication is a technique for converting an algorithm into a fault-tolerant, distributed implementation. Ad-hoc techniques may leave important cases of failures unresolved. The principled approach proposed by Lamport et al. ensures all cases are handled safely.
The Paxos protocol was first submitted in 1989 and named after a fictional legislative consensus system used on the Paxos island in Greece, where Lamport wrote that the parliament had to function "even though legislators continually wandered in and out of the parliamentary Chamber". It was later published as a journal article in 1998.
The Paxos family of protocols includes a spectrum of trade-offs between the number of processors, number of message delays before learning the agreed value, the activity level of individual participants, number of messages sent, and types of failures. Although no deterministic fault-tolerant consensus protocol can guarantee progress in an asynchronous network, Paxos guarantees safety, and the conditions that could prevent it from making progress are difficult to provoke.
Paxos is usually used where durability is required, in which the amount of durable state could be large. The protocol attempts to make progress even during periods when some bounded number of replicas are unresponsive. There is also a mechanism to drop a permanently failed replica or to add a new replica.

History

The topic predates the protocol. In 1988, Lynch, Dwork and Stockmeyer had demonstrated the solvability of consensus in a broad family of "partially synchronous" systems. Paxos has strong similarities to a protocol used for agreement in "viewstamped replication", first published by Oki and Liskov in 1988, in the context of distributed transactions. Notwithstanding this prior work, Paxos offered a particularly elegant formalism, and included one of the earliest proofs of safety for a fault-tolerant distributed consensus protocol.
Reconfigurable state machines have strong ties to prior work on reliable group multicast protocols that support dynamic group membership, for example Birman's work in 1985 and 1987 on the virtually synchronous gbcast protocol. However, gbcast is unusual in supporting durability and addressing partitioning failures.
Most reliable multicast protocols lack these properties, which are required for implementations of the state machine replication model.
This point is elaborated in a paper by Lamport, Malkhi and Zhou.
Paxos protocols are members of a theoretical class of solutions to a problem formalized as uniform agreement with crash failures.
Lower bounds for this problem have been proved by Keidar and Shraer. Derecho, a C++ software library for cloud-scale state machine replication, offers a Paxos protocol that has been integrated with self-managed virtually synchronous membership. This protocol matches the Keidar and Shraer optimality bounds, and maps efficiently to modern remote DMA datacenter hardware.

Assumptions

In order to simplify the presentation of Paxos, the following assumptions and definitions are made explicit. Techniques to broaden the applicability are known in the literature, and are not covered in this article.

Processors

In general, a consensus algorithm can make progress using processors, despite the simultaneous failure of any processors : in other words, the number of non-faulty processes must be strictly greater than the number of faulty processes. However, using reconfiguration, a protocol may be employed which survives any number of total failures as long as no more than F fail simultaneously.

Roles

Paxos describes the actions of the processors by their roles in the protocol: client, acceptor, proposer, learner, and leader. In typical implementations, a single processor may play one or more roles at the same time. This does not affect the correctness of the protocol—it is usual to coalesce roles to improve the latency and/or number of messages in the protocol.
; Client: The Client issues a request to the distributed system, and waits for a response. For instance, a write request on a file in a distributed file server.
; Acceptor : The Acceptors act as the fault-tolerant "memory" of the protocol. Acceptors are collected into groups called Quorums. Any message sent to an Acceptor must be sent to a Quorum of Acceptors. Any message received from an Acceptor is ignored unless a copy is received from each Acceptor in a Quorum.
; Proposer: A Proposer advocates a client request, attempting to convince the Acceptors to agree on it, and acting as a coordinator to move the protocol forward when conflicts occur.
; Learner: Learners act as the replication factor for the protocol. Once a Client request has been agreed upon by the Acceptors, the Learner may take action. To improve availability of processing, additional Learners can be added.
; Leader: Paxos requires a distinguished Proposer to make progress. Many processes may believe they are leaders, but the protocol only guarantees progress if one of them is eventually chosen. If two processes believe they are leaders, they may stall the protocol by continuously proposing conflicting updates. However, the [|safety properties] are still preserved in that case.

Quorums

Quorums express the safety properties of Paxos by ensuring at least some surviving processor retains knowledge of the results.
Quorums are defined as subsets of the set of Acceptors such that any two subsets share at least one member. Typically, a Quorum is any majority of participating Acceptors. For example, given the set of Acceptors, a majority Quorum would be any three Acceptors: ,,,. More generally, arbitrary positive weights can be assigned to Acceptors; in that case, a Quorum can be defined as any subset of Acceptors with the summary weight greater than half of the total weight of all Acceptors.

Proposal number and agreed value

Each attempt to define an agreed value v is performed with proposals which may or may not be accepted by Acceptors. Each proposal is uniquely numbered for a given Proposer. So, e.g., each proposal may be of the form , where n is the unique identifier of the proposal and v is the actual proposed value. The value corresponding to a numbered proposal can be computed as part of running the Paxos protocol, but need not be.

Safety and liveness properties

In order to guarantee safety, Paxos defines three properties and ensures the first two are always held, regardless of the pattern of failures:
; Validity : Only proposed values can be chosen and learned.
; Agreement : No two distinct learners can learn different values
; Termination : If value C has been proposed, then eventually learner L will learn some value.
Note that Paxos is not guaranteed to terminate, and thus does not have the liveness property. This is supported by the Fischer Lynch Paterson impossibility result which states that a consistency protocol can only have two of safety, liveness, and fault tolerance. As Paxos's point is to ensure fault tolerance and it guarantees safety, it cannot also guarantee liveness.

Typical deployment

In most deployments of Paxos, each participating process acts in three roles; Proposer, Acceptor and Learner. This reduces the message complexity significantly, without sacrificing correctness:
By merging roles, the protocol "collapses" into an efficient client-master-replica style deployment, typical of the database community. The benefit of the Paxos protocols is the guarantee of its safety properties.
A typical implementation's message flow is covered in the section [|Multi-Paxos].

[|Basic Paxos]

This protocol is the most basic of the Paxos family. Each "instance" of the basic Paxos protocol decides on a single output value. The protocol proceeds over several rounds. A successful round has 2 phases: phase 1 and phase 2. See below the description of the phases. Remember that we assume an asynchronous model, so e.g. a processor may be in one phase while another processor may be in another.

Phase 1

Phase 1a: ''Prepare''

Phase 1b: ''Promise''

Phase 2

Phase 2a: ''Accept''

This Accept message should be interpreted as a "request", as in "Accept this proposal, please!".

Phase 2b: ''Accepted''

Note that an Acceptor can accept multiple proposals. This can happen when another Proposer, unaware of the new value being decided, starts a new round with a higher identification number n. In that case, the Acceptor can promise and later accept the new proposed value even though it has accepted another one earlier. These proposals may even have different values in the presence of certain failures. However, the Paxos protocol will guarantee that the Acceptors will ultimately agree on a single value.

When rounds fail

Paxos can be used to select a leader

Graphic representation of the flow of messages in the basic Paxos

The following diagrams represent several cases/situations of the application of the Basic Paxos protocol. Some cases show how the Basic Paxos protocol copes with the failure of certain components of the distributed system.
Note that the values returned in the Promise message are "null" the first time a proposal is made.

Basic Paxos without failures

In the diagram below, there is 1 Client, 1 Proposer, 3 Acceptors and 2 Learners. This diagram represents the case of a first round, which is successful.
Client Proposer Acceptor Learner
| | | | | | |
X-------->| | | | | | Request
| X--------->|->|->| | | Prepare
| |<---------X--X--X | | Promise
| X--------->|->|->| | | Accept!
| |<---------X--X--X------>|->| Accepted
|<---------------------------------X--X Response
| | | | | | |

Here, V is the last of.

Error cases in basic Paxos

The simplest error cases are the failure of an Acceptor and failure of a redundant Learner. In these cases, the protocol requires no "recovery" : no additional rounds or messages are required, as shown below.

Basic Paxos when an Acceptor fails

In the following diagram, one of the Acceptors in the Quorum fails, so the Quorum size becomes 2. In this case, the Basic Paxos protocol still succeeds.
Client Proposer Acceptor Learner
| | | | | | |
X-------->| | | | | | Request
| X--------->|->|->| | | Prepare
| | | | ! | | !! FAIL !!
| |<---------X--X | | Promise
| X--------->|->| | | Accept!
| |<---------X--X--------->|->| Accepted
|<---------------------------------X--X Response
| | | | | |

Basic Paxos when a redundant learner fails

In the following case, one of the Learners fails, but the Basic Paxos protocol still succeeds.
Client Proposer Acceptor Learner
| | | | | | |
X-------->| | | | | | Request
| X--------->|->|->| | | Prepare
| |<---------X--X--X | | Promise
| X--------->|->|->| | | Accept!
| |<---------X--X--X------>|->| Accepted
| | | | | | ! !! FAIL !!
|<---------------------------------X Response
| | | | | |

Basic Paxos when a Proposer fails

In this case, a Proposer fails after proposing a value, but before the agreement is reached. Specifically, it fails in the middle of the Accept message, so only one Acceptor of the Quorum receives the value. Meanwhile, a new Leader is elected. Note that there are 2 rounds in this case.
Client Proposer Acceptor Learner
| | | | | | |
X----->| | | | | | Request
| X------------>|->|->| | | Prepare
| |<------------X--X--X | | Promise
| | | | | | |
| | | | | | | !! Leader fails during broadcast !!
| X------------>| | | | | Accept!
| ! | | | | |
| | | | | | | !! NEW LEADER !!
| X--------->|->|->| | | Prepare
| |<---------X--X--X | | Promise
| X--------->|->|->| | | Accept!
| |<---------X--X--X------>|->| Accepted
|<---------------------------------X--X Response
| | | | | | |

Basic Paxos when multiple Proposers conflict

The most complex case is when multiple Proposers believe themselves to be Leaders. For instance, the current leader may fail and later recover, but the other Proposers have already re-selected a new leader. The recovered leader has not learned this yet and attempts to begin one round in conflict with the current leader. In the diagram below, 4 unsuccessful rounds are shown, but there could be more.
Client Proposer Acceptor Learner
| | | | | | |
X----->| | | | | | Request
| X------------>|->|->| | | Prepare
| |<------------X--X--X | | Promise
| ! | | | | | !! LEADER FAILS
| | | | | | | !! NEW LEADER
| X--------->|->|->| | | Prepare
| |<---------X--X--X | | Promise
| | | | | | | | !! OLD LEADER recovers
| | | | | | | | !! OLD LEADER tries 2, denied
| X------------>|->|->| | | Prepare
| |<------------X--X--X | | Nack
| | | | | | | | !! OLD LEADER tries 3
| X------------>|->|->| | | Prepare
| |<------------X--X--X | | Promise
| | | | | | | | !! NEW LEADER proposes, denied
| | X--------->|->|->| | | Accept!
| | |<---------X--X--X | | Nack
| | | | | | | | !! NEW LEADER tries 4
| | X--------->|->|->| | | Prepare
| | |<---------X--X--X | | Promise
| | | | | | | | !! OLD LEADER proposes, denied
| X------------>|->|->| | | Accept!
| |<------------X--X--X | | Nack
| | | | | | | | ... and so on...

Multi-Paxos

A typical deployment of Paxos requires a continuous stream of agreed values acting as commands to a distributed state machine. If each command is the result of a single instance of the Basic Paxos protocol, a significant amount of overhead would result.
If the leader is relatively stable, phase 1 becomes unnecessary. Thus, it is possible to skip phase 1 for future instances of the protocol with the same leader.
To achieve this, the round number I is included along with each value which is incremented in each round by the same Leader. Multi-Paxos reduces the failure-free message delay from 4 delays to 2 delays.

Graphic representation of the flow of messages in the Multi-Paxos

Multi-Paxos without failures

In the following diagram, only one instance of the basic Paxos protocol, with an initial Leader, is shown. Note that a Multi-Paxos consists of several instances of the basic Paxos protocol.
Client Proposer Acceptor Learner
| | | | | | | --- First Request ---
X-------->| | | | | | Request
| X--------->|->|->| | | Prepare
| |<---------X--X--X | | Promise
| X--------->|->|->| | | Accept!
| |<---------X--X--X------>|->| Accepted
|<---------------------------------X--X Response
| | | | | | |

where V = last of.

Multi-Paxos when phase 1 can be skipped

In this case, subsequence instances of the basic Paxos protocol use the same leader, so the phase 1, which consist in the Prepare and Promise sub-phases, is skipped. Note that the Leader should be stable, i.e. it should not crash or change.
Client Proposer Acceptor Learner
| | | | | | | --- Following Requests ---
X-------->| | | | | | Request
| X--------->|->|->| | | Accept!
| |<---------X--X--X------>|->| Accepted
|<---------------------------------X--X Response
| | | | | | |

Multi-Paxos when roles are collapsed

A common deployment of the Multi-Paxos consists in collapsing the role of the Proposers, Acceptors and Learners to "Servers". So, in the end, there are only "Clients" and "Servers".
The following diagram represents the first "instance" of a basic Paxos protocol, when the roles of the Proposer, Acceptor and Learner are collapsed to a single role, called the "Server".
Client Servers
| | | | --- First Request ---
X-------->| | | Request
| X->|->| Prepare
| |<-X--X Promise
| X->|->| Accept!
| X<>X<>X Accepted
|<--------X | | Response
| | | |

Multi-Paxos when roles are collapsed and the leader is steady

In the subsequent instances of the basic Paxos protocol, with the same leader as in the previous instances of the basic Paxos protocol, the phase 1 can be skipped.
Client Servers
X-------->| | | Request
| X->|->| Accept!
| X<>X<>X Accepted
|<--------X | | Response
| | | |

Optimizations

A number of optimisations can be performed to reduce the number of exchanged messages, to improve the performance of the protocol, etc. A few of these optimisations are reported below.

Cheap Paxos

Cheap Paxos extends Basic Paxos to tolerate F failures with F+1 main processors and F auxiliary processors by dynamically reconfiguring after each failure.
This reduction in processor requirements comes at the expense of liveness; if too many main processors fail in a short time, the system must halt until the auxiliary processors can reconfigure the system. During stable periods, the auxiliary processors take no part in the protocol.
"With only two processors p and q, one processor cannot distinguish failure of the other processor from failure of the communication medium. A third processor is needed. However, that third processor does not have to participate in choosing the sequence of commands. It must take action only in case p or q fails, after which it does nothing while either p or q continues to operate the system by itself. The third processor can therefore be a small/slow/cheap one, or a processor primarily devoted to other tasks."

Message flow: Cheap Multi-Paxos

An example involving three main acceptors, one auxiliary acceptor and quorum size of three, showing failure of one main processor and subsequent reconfiguration:

Proposer Main Aux Learner
X----------->|->|->| | | Accept!
X----------->|->|------->| | Accept!
X----------->|->| | | Accept!

Fast Paxos

Fast Paxos generalizes Basic Paxos to reduce end-to-end message delays. In Basic Paxos, the message delay from client request to learning is 3 message delays. Fast Paxos allows 2 message delays, but requires that the system be comprised by 3f+ 1 acceptors to tolerate up to f faults, and the Client to send its request to multiple destinations.
Intuitively, if the leader has no value to propose, then a client could send an Accept! message to the Acceptors directly. The Acceptors would respond as in Basic Paxos, sending Accepted messages to the leader and every Learner achieving two message delays from Client to Learner.
If the leader detects a collision, it resolves the collision by sending Accept! messages for a new round which are Accepted as usual. This coordinated recovery technique requires four message delays from Client to Learner.
The final optimization occurs when the leader specifies a recovery technique in advance, allowing the Acceptors to perform the collision recovery themselves. Thus, uncoordinated collision recovery can occur in three message delays.

Message flow: Fast Paxos, non-conflicting

Client Leader Acceptor Learner
| | | | | | | |
| X--------->|->|->|->| | | Any
| | | | | | | |
X------------------->|->|->|->| | | Accept!
| |<---------X--X--X--X------>|->| Accepted
|<------------------------------------X--X Response
| | | | | | | |

Message flow: Fast Paxos, conflicting proposals

Conflicting proposals with coordinated recovery. Note: the protocol does not specify how to handle the dropped client request.
Client Leader Acceptor Learner
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | !! Concurrent conflicting proposals
| | | | | | | | | !! received in different order
| | | | | | | | | !! by the Acceptors
| X--------------?|-?|-?|-?| | | Accept!
X-----------------?|-?|-?|-?| | | Accept!
| | | | | | | | |
| | | | | | | | | !! Acceptors disagree on value
| | |<-------X--X->|->|----->|->| Accepted
| | |<-------|<-|<-X--X----->|->| Accepted
| | | | | | | | |
| | | | | | | | | !! Detect collision & recover
| | X------->|->|->|->| | | Accept!
| | |<-------X--X--X--X----->|->| Accepted
|<---------------------------------X--X Response
| | | | | | | | |

Conflicting proposals with uncoordinated recovery.
Client Leader Acceptor Learner
| | | | | | | | |
| | X------->|->|->|->| | | Any
| | | | | | | | |
| | | | | | | | | !! Concurrent conflicting proposals
| | | | | | | | | !! received in different order
| | | | | | | | | !! by the Acceptors
| X--------------?|-?|-?|-?| | | Accept!
X-----------------?|-?|-?|-?| | | Accept!
| | | | | | | | |
| | | | | | | | | !! Acceptors disagree on value
| | |<-------X--X->|->|----->|->| Accepted
| | |<-------|<-|<-X--X----->|->| Accepted
| | | | | | | | |
| | | | | | | | | !! Detect collision & recover
| | |<-------X--X--X--X----->|->| Accepted
|<---------------------------------X--X Response
| | | | | | | | |

Message flow: Fast Paxos with uncoordinated recovery, collapsed roles

Client Servers
| | | | | |
| | X->|->|->| Any
| | | | | |
| | | | | | !! Concurrent conflicting proposals
| | | | | | !! received in different order
| | | | | | !! by the Servers
| X--------?|-?|-?|-?| Accept!
X-----------?|-?|-?|-?| Accept!
| | | | | |
| | | | | | !! Servers disagree on value
| | X<>X->|->| Accepted
| | |<-|<-X<>X Accepted
| | | | | |
| | | | | | !! Detect collision & recover
| | X<>X<>X<>X Accepted
|<-----------X--X--X--X Response
| | | | | |

Generalized Paxos

Generalized consensus explores the relationship between the operations of the replicated state machine and the consensus protocol that implements it. The main discovery involves optimizations of Paxos when conflicting proposals could be applied in any order. i.e., when the proposed operations are commutative operations for the state machine. In such cases, the conflicting operations can both be accepted, avoiding the delays required for resolving conflicts and re-proposing the rejected operations.
This concept is further generalized into ever-growing sequences of commutative operations, some of which are known to be stable. The protocol tracks these sequences ensuring that all proposed operations of one sequence are stabilized before allowing any operation non-commuting with them to become stable.

Example

In order to illustrate Generalized Paxos, the example below shows a message flow between two concurrently executing clients and a replicated state machine implementing read/write operations over two distinct registers A and B.
ReadWriteReadWrite
Read
Write
Read
Write

Note that in this table indicates operations which are non-commutative.
A possible sequence of operations :
 <1:Read, 2:Read, 3:Write, 4:Read, 5:Read, 6:Write> 

Since 5:Read commutes with both 3:Write and 4:Read, one possible permutation equivalent to the previous order is the following:
 <1:Read, 2:Read, 5:Read, 3:Write, 4:Read, 6:Write> 

In practice, a commute occurs only when operations are proposed concurrently.

Message flow: Generalized Paxos (example)

Responses not shown. Note: message abbreviations differ from previous message flows due to specifics of the protocol, see for a full discussion.

Client Leader Acceptor Learner
| | | | | | | | !! New Leader Begins Round
| | X----->|->|->| | | Prepare
| | |<-----X- X- X | | Promise
| | X----->|->|->| | | Phase2Start
| | | | | | | |
| | | | | | | | !! Concurrent commuting proposals
| X------- ?|-----?|-?|-?| | | Propose
X-----------?|-----?|-?|-?| | | Propose
| | X------X-------------->|->| Accepted
| | |<--------X--X-------->|->| Accepted
| | | | | | | |
| | | | | | | | !! No Conflict, both accepted
| | | | | | | | Stable =
| | | | | | | |
| | | | | | | | !! Concurrent conflicting proposals
X-----------?|-----?|-?|-?| | | Propose
| X--------?|-----?|-?|-?| | | Propose
| | | | | | | |
| | X------X-------------->|->| Accepted
| | |<--------X--X-------->|->| Accepted
| | | | | | | |
| | | | | | | | !! Conflict detected, leader chooses
| | | | | | | | commutative order:
| | | | | | | | V =
| | | | | | | |
| | X----->|->|->| | | Phase2Start
| | |<-----X- X- X-------->|->| Accepted
| | | | | | | | Stable = .
| | | | | | | |
| | | | | | | |
| | | | | | | | !! More conflicting proposals
X-----------?|-----?|-?|-?| | | Propose
| X--------?|-----?|-?|-?| | | Propose
| | | | | | | |
| | X------X-------------->|->| Accepted
| | |<--------X- X-------->|->| Accepted
| | | | | | | |
| | | | | | | | !! Leader chooses order:
| | | | | | | | W =
| | | | | | | |
| | X----->|->|->| | | Phase2Start
| | |<-----X- X- X-------->|->| Accepted
| | | | | | | | Stable = .
| | | | | | | | .
| | | | | | | |
| | | | | | | |

Performance

The above message flow shows us that Generalized Paxos can leverage operation semantics to avoid collisions when the spontaneous ordering of the network fails. This allows the protocol to be in practice quicker than Fast Paxos. However, when a collision occurs, Generalized Paxos needs two additional round trips to recover. This situation is illustrated with operations WriteB and ReadB in the above schema.
In the general case, such round trips are unavoidable and come from the fact that multiple commands can be accepted during a round. This makes the protocol more expensive than Paxos when conflicts are frequent. Hopefully two possible refinements of Generalized Paxos are possible to improve recovery time.
Paxos may also be extended to support arbitrary failures of the participants, including lying, fabrication of messages, collusion with other participants, selective non-participation, etc. These types of failures are called Byzantine failures, after the solution popularized by Lamport.
Byzantine Paxos introduced by Castro and Liskov adds an extra message which acts to distribute knowledge and verify the actions of the other processors:

Message flow: Byzantine Multi-Paxos, steady state

Client Proposer Acceptor Learner
| | | | | | |
X-------->| | | | | | Request
| X--------->|->|->| | | Accept!
| | X<>X<>X | | Verify - BROADCAST
| |<---------X--X--X------>|->| Accepted
|<---------------------------------X--X Response
| | | | | | |

Fast Byzantine Paxos introduced by Martin and Alvisi removes this extra delay, since the client sends commands directly to the Acceptors.
Note the Accepted message in Fast Byzantine Paxos is sent to all Acceptors and all Learners, while Fast Paxos sends Accepted messages only to Learners):

Message flow: Fast Byzantine Multi-Paxos, steady state

Client Acceptor Learner
| | | | | |
X----->|->|->| | | Accept!
| X<>X<>X------>|->| Accepted - BROADCAST
|<-------------------X--X Response
| | | | | |

The failure scenario is the same for both protocols; Each Learner waits to receive F+1 identical messages from different Acceptors. If this does not occur, the Acceptors themselves will also be aware of it, and correct Acceptors will re-broadcast the agreed value:

Message flow: Fast Byzantine Multi-Paxos, failure

Client Acceptor Learner
| | | ! | | !! One Acceptor is faulty
X----->|->|->! | | Accept!
| X<>X<>X------>|->| Accepted - BROADCAST
| | | ! | | !! Learners receive 2 different commands
| | | ! | | !! Correct Acceptors notice error and choose
| X<>X<>X------>|->| Accepted - BROADCAST
|<-------------------X--X Response
| | | ! | |

Adapting Paxos for RDMA networks

With the emergence of very high speed reliable datacenter networks that support remote DMA, there has been substantial interest in optimizing Paxos to leverage hardware offloading, in which the network interface card and network routers provide reliability and network-layer congestion control, freeing the host CPU for other tasks. The is an open-source Paxos implementation that explores this option.
Derecho offers both a classic Paxos, with data durability across full shutdown/restart sequences, and vertical Paxos, for in-memory replication and state-machine synchronization. The Paxos protocols employed by Derecho needed to be adapted to maximize asynchronous data streaming and remove other sources of delay on the leader's critical path. So doing enables Derecho to sustain the full bidirectional RDMA data rate. In contrast, although traditional Paxos protocols can be migrated to an RDMA network by simply mapping the message send operations to native RDMA operations, doing so leaves round-trip delays on the critical path. In high-speed RDMA networks, even small delays can be large enough to prevent utilization of the full potential bandwidth.

Production use of Paxos