Chunked transfer encoding


Chunked transfer encoding is a streaming data transfer mechanism available in version 1.1 of the Hypertext Transfer Protocol. In chunked transfer encoding, the data stream is divided into a series of non-overlapping "chunks". The chunks are sent out and received independently of one another. No knowledge of the data stream outside the currently-being-processed chunk is necessary for both the sender and the receiver at any given time.
Each chunk is preceded by its size in bytes. The transmission ends when a zero-length chunk is received. The chunked keyword in the Transfer-Encoding header is used to indicate chunked transfer.
An early form of the chunked transfer encoding was proposed in 1994. Chunked transfer encoding is not supported in HTTP/2, which provides its own mechanisms for data streaming.

Rationale

The introduction of chunked encoding provided various benefits:
For version 1.1 of the HTTP protocol, the chunked transfer mechanism is considered to be always and anyway acceptable, even if not listed in the TE request header field, and when used with other transfer mechanisms, should always be applied last to the transferred data and never more than one time. This transfer coding method also allows additional entity header fields to be sent after the last chunk if the client specified the "trailers" parameter as an argument of the TE field. The origin server of the response can also decide to send additional entity trailers even if the client did not specify the "trailers" option in the TE request field, but only if the metadata is optional. Whenever the trailers are used, the server should list their names in the Trailer header field; three header field types are specifically prohibited from appearing as a trailer field: Transfer-Encoding, Content-Length and Trailer.

Format

If a Transfer-Encoding field with a value of "chunked" is specified in an HTTP message, the body of the message consists of an unspecified number of chunks, a terminating chunk, trailer, and a final CRLF sequence.
Each chunk starts with the number of octets of the data it embeds expressed as a hexadecimal number in ASCII followed by optional parameters and a terminating CRLF sequence, followed by the chunk data. The chunk is terminated by CRLF.
If chunk extensions are provided, the chunk size is terminated by a semicolon and followed by the parameters, each also delimited by semicolons. Each parameter is encoded as an extension name followed by an optional equal sign and value. These parameters could be used for a running message digest or digital signature, or to indicate an estimated transfer progress, for instance.
The terminating chunk is a regular chunk, with the exception that its length is zero. It is followed by the trailer, which consists of a sequence of entity header fields. Normally, such header fields would be sent in the message's header; however, it may be more efficient to determine them after processing the entire message entity. In that case, it is useful to send those headers in the trailer.
Header fields that regulate the use of trailers are TE, and Trailers.

Use with compression

HTTP servers often use compression to optimize transmission, for example with Content-Encoding: gzip or Content-Encoding: deflate. If both compression and chunked encoding are enabled, then the content stream is first compressed, then chunked; so the chunk encoding itself is not compressed, and the data in each chunk is not compressed individually. The remote endpoint then decodes the stream by concatenating the chunks and uncompressing the result.

Example

Encoded data

In the following example, three chunks of length 4, 5 and 14 are shown. The chunk size is transferred as a hexadecimal number followed by \r\n as a line separator, followed by a chunk of data of the given size.

4\r\n
Wiki\r\n
5\r\n
pedia\r\n
E\r\n
in\r\n
\r\n
chunks.\r\n
0\r\n
\r\n

Note: the chunk size indicates the size of the chunk data and excludes the trailing CRLF. In this particular example, the CRLF following "in" are counted as two octets toward the chunk size of 0xE. The CRLF in its own line are also counted as two octets toward the chunk size.
The period character at the end of "chunks" is the 14th character, so it is the
last data character in that chunk. The CRLF following the period is
the trailing CRLF, so it is not counted toward the chunk size of 0xE.

Decoded data


Wikipedia in
chunks.