TCP hole punching


TCP NAT traversal and TCP hole punching occurs when two hosts behind a network address translation are trying to connect to each other with outbound TCP connections. Such a scenario is particularly important in the case of peer-to-peer communications, such as Voice-over-IP, file sharing, teleconferencing, chat systems and similar applications.
TCP hole punching is an experimentally used NAT traversal technique for establishing a TCP connection between two peers behind a NAT device in an Internet computer network. NAT traversal is a general term for techniques that establish and maintain TCP/IP network and/or TCP connections traversing NAT gateways.

Terms used

In the following, the terms host, client and peer are used almost interchangeably.
; local endpoint, internal endpoint
; public endpoint, external endpoint
; remote endpoint

Description

NAT traversal, through TCP hole punching, establishes bidirectional TCP connections between Internet hosts in private networks using NAT. It does not work with all types of NATs, as their behavior is not standardized.
When two hosts are connecting to each other in TCP, both via outbound connections, they are in the "simultaneous TCP open" case of the TCP state machine diagram.

Network Drawing

Types of NAT

The availability of TCP hole punching depends on the type of computer port allocation used by the NAT.
For two peers behind a NAT to connect to each other via, they need to know a little bit about each other. One thing that they absolutely need to know is the "location" of the other peer, or the remote endpoint. The remote endpoint is the data of the IP address and a port that the peer will connect to. So when two peers, A and B, initiate TCP connections by binding to local ports Pa and Pb, respectively, they need to know the remote endpoint port as mapped by the NAT to make the connection.
When both peers are behind a NAT, how to discover the public remote endpoint of the other peer is a problem called NAT port prediction. All TCP NAT traversal and hole punching techniques have to solve the port prediction problem.
A NAT port allocation can be one of the two:
; predictable: the gateway uses a simple algorithm to map the local port to the NAT port. Most of the time a NAT will use port preservation, which means that the local port is mapped to the same port on the NAT.
; non predictable: the gateways use an algorithm that is either random or too impractical to predict.
Depending on whether the NATs exhibit a predictable or non-predictable behavior, it will be possible or not to perform the TCP connection via a TCP simultaneous open, as shown below by the connection matrix representing the different cases and their impact on end-to-end communication:

Techniques

Methods of Port Prediction (with predictable NATs)

Here are some of the methods used by NATs to allow peers to perform port prediction:
If the remote peer has the information of one mapping, then it can guess the value of subsequent mappings. The TCP connection will happen in two steps, at first the peers make a connection to a third party and learn their mapping. For the second step, both peers can then guess what the NAT port mapping will be for all subsequent connections, which solves port prediction. This method requires making at least two consecutive connections for each peer and require the use of a third party. This method does not work properly in case of Carrier-grade NAT with a lot of subscribers behind each IP addresses, as only a limited amount of ports is available and allocating consecutive ports to a same internal host might be impractical or impossible.
In this case, port prediction is trivial, and the peers simply have to exchange the port to which they are bound through another communication channel before making the outbound connections of the TCP simultaneous open. This method requires only one connection per peer and does not require a third party to perform port prediction.
With this solution, the peers will first connect to a third party server that will save their port mapping value and give to both peers the port mapping value of the other peer. In a second step, both peers will reuse the same local endpoint to perform a TCP simultaneous open with each other. This unfortunately requires the use of the SO_REUSEADDR on the TCP sockets, and such use violates the TCP standard and can lead to data corruption. It should only be used if the application can protect itself against such data corruption.

Details of a typical TCP connection instantiation with TCP Hole Punching

We assume here that port prediction has already taken place through one of the methods outlined above, and that each peer knows the remote peer endpoint. Both peers make a POSIX connect call to the other peer endpoint. TCP simultaneous open will happen as follows:
  1. * Peer A sends a SYN to Peer B
  2. * Peer B sends a SYN to Peer A
  3. * When NAT-a receives the outgoing SYN from Peer A, it creates a mapping in its state machine.
  4. * When NAT-b receives the outgoing SYN from Peer B, it creates a mapping in its state machine.
  5. Both SYN cross somewhere along the network path, then:
  6. * SYN from Peer A reaches NAT-b, SYN from Peer B reaches NAT-a
  7. * Depending on the timing of these events,
  8. * at least one of the NAT will let the incoming SYN through, and map it to the internal destination peer
  9. Upon receipt of the SYN, the peer sends a SYN+ACK back and the connection is established.

    Interoperability requirements on the NAT for TCP Hole Punching

Other requirements on the NAT to comply with TCP simultaneous open

For the TCP simultaneous open to work, the NAT should:
This is enough to guarantee that NATs behave nicely with respect to the TCP simultaneous open.

TCP Hole Punching and Carrier-grade NAT (CGN)

The technique described above works fine within a CGN. A CGN can also make use of a port overloading behavior, which means that distinct internal endpoints with the same port value can be mapped to the same public endpoint. This does not break the uniqueness of the quintuple and, as a result, is acceptable. TCP port preservation can also lead to cases where the CGN ports are overloaded and is not an issue for protocol soundness.
Port overloading for TCP allows the CGN to fit more hosts internally while preserving TCP end-to-end communication guarantees.