MIME
Multipurpose Internet Mail Extensions is an Internet standard that extends the format of email messages to support text in character sets other than ASCII, as well as attachments of audio, video, images, and application programs. Message bodies may consist of multiple parts, and header information may be specified in non-ASCII character sets. Email messages with MIME formatting are typically transmitted with standard protocols, such as the Simple Mail Transfer Protocol, the Post Office Protocol, and the Internet Message Access Protocol.
The MIME standard is specified in a series of requests for comments:,
and
. The integration with SMTP email is specified in
and
Although the MIME formalism was designed mainly for SMTP, its content types are also important in other communication protocols. In the HyperText Transfer Protocol for the World Wide Web, servers insert a MIME header field at the beginning of any Web transmission. Clients use the content type or media type header to select an appropriate viewer application for the type of data indicated. Browsers typically contain GIF and JPEG image viewers.
MIME header fields
MIME-Version
The presence of this header field indicates the message is MIME-formatted. The value is typically "1.0". The field appears as follows:MIME-Version: 1.0
According to MIME co-creator Nathaniel Borenstein, the version number was introduced to permit changes to the MIME protocol in subsequent versions. However, Borenstein admitted short-comings in the specification that hindered the implementation of this feature: "We did not adequately specify how to handle a future MIME version.... So if you write something that knows 1.0, what should you do if you encounter 2.0 or 1.1? I sort of thought it was obvious but it turned out everyone implemented that in different ways. And the result is that it would be just about impossible for the Internet to ever define a 2.0 or a 1.1."
Content-Type
This header field indicates the media type of the message content, consisting of a type and subtype, for exampleContent-Type: text/plain
Through the use of the multipart type, MIME allows mail messages to have parts arranged in a tree structure where the leaf nodes are any non-multipart content type and the non-leaf nodes are any of a variety of multipart types.
This mechanism supports:
- simple text messages using text/plain
- text plus attachments. A MIME message including an attached file generally indicates the file's original name with the field "Content-Disposition", so that the type of file is indicated both by the MIME content-type and the filename extension
- reply with original attached
- alternative content, such as a message sent in both plain text and another format such as HTML
- image, audio, video and application
- many other message constructs
Content-Disposition
- an inline content disposition, which means that it should be automatically displayed when the message is displayed, or
- an attachment content disposition, in which case it is not displayed automatically and requires some form of action from the user to open it.
The following example is taken from RFC 2183, where the header field is defined:
Content-Disposition: attachment; filename=genome.jpeg;
modification-date="Wed, 12 Feb 1997 16:29:51 -0500";
The filename may be encoded as defined in RFC 2231.
As of 2010, a majority of mail user agents did not follow this prescription fully. The widely used Mozilla Thunderbird mail client ignores the content-disposition fields in the messages and uses independent algorithms for selecting the MIME parts to display automatically. Thunderbird prior to version 3 also sends out newly composed messages with inline content disposition for all MIME parts. Most users are unaware of how to set the content disposition to attachment. Many mail user agents also send messages with the file name in the name parameter of the content-type header instead of the filename parameter of the header field Content-Disposition. This practice is discouraged, as the file name should be specified either with the parameter filename, or with both the parameters filename and name.
In HTTP, the response header field Content-Disposition: attachment is usually used as a hint to the client to present the response body as a downloadable file. Typically, when receiving such a response, a Web browser prompts the user to save its content as a file, instead of displaying it as a page in a browser window, with filename suggesting the default file name.
Content-Transfer-Encoding
In June 1992, MIME defined a set of methods for representing binary data in formats other than ASCII text format. The content-transfer-encoding: MIME header field has 2-sided significance:- It indicates whether or not a binary-to-text encoding scheme has been used on top of the original encoding as specified within the Content-Type header:
- If such a binary-to-text encoding method has been used, it states which one.
- If not, it provides a descriptive label for the format of content, with respect to the presence of 8-bit or binary content.
- Suitable for use with normal SMTP:
- * 7bit – up to 998 octets per line of the code range 1..127 with CR and LF only allowed to appear as part of a CRLF line ending. This is the default value.
- * quoted-printable – used to encode arbitrary octet sequences into a form that satisfies the rules of 7bit. Designed to be efficient and mostly human readable when used for text data consisting primarily of US-ASCII characters but also containing a small proportion of bytes with values outside that range.
- * base64 – used to encode arbitrary octet sequences into a form that satisfies the rules of 7bit. Designed to be efficient for non-text 8 bit and binary data. Sometimes used for text data that frequently uses non-US-ASCII characters.
- Suitable for use with SMTP servers that support the 8BITMIME SMTP extension :
- *8bit – up to 998 octets per line with CR and LF only allowed to appear as part of a CRLF line ending.
- Suitable for use with SMTP servers that support the BINARYMIME SMTP extension :
- * binary – any sequence of octets.
[|Encoded-Word]
Since RFC 2822, conforming message header field names and values use ASCII characters; values that contain non-ASCII data should use the MIME encoded-word syntax instead of a literal string. This syntax uses a string of ASCII characters indicating both the original character encoding and the content-transfer-encoding used to map the bytes of the charset into ASCII characters.The form is: "
=?
charset?
encoding?
encoded text?=
".- charset may be any character set registered with IANA. Typically it would be the same charset as the message body.
- encoding can be either "
Q
" denoting Q-encoding that is similar to the quoted-printable encoding, or "B
" denoting base64 encoding. - encoded text is the Q-encoded or base64-encoded text.
- An encoded-word may not be more than 75 characters long, including charset, encoding, encoded text, and delimiters. If it is desirable to encode more text than will fit in an encoded-word of 75 characters, multiple encoded-words may be used.
Difference between Q-encoding and quoted-printable
For example,
Subject: =?iso-8859-1?Q?=A1Hola,_se=F1or!?=
is interpreted as "Subject: ¡Hola, señor!".
The encoded-word format is not used for the names of the headers fields. These names are usually English terms and always in ASCII in the raw message. When viewing a message with a non-English email client, the header field names might be translated by the client.
Multipart messages
The MIME multipart message contains a boundary in the header field ; this boundary, which must not occur in any of the parts, is placed between the parts, and at the beginning and end of the body of the message, as follows:MIME-Version: 1.0
Content-Type: multipart/mixed; boundary=frontier
This is a message with multiple parts in MIME format.
--frontier
Content-Type: text/plain
This is the body of the message.
--frontier
Content-Type: application/octet-stream
Content-Transfer-Encoding: base64
PGh0bWw+CiAgPGhlYWQ+CiAgPC9oZWFkPgogIDxib2R5PgogICAgPHA+VGhpcyBpcyB0aGUg
Ym9keSBvZiB0aGUgbWVzc2FnZS48L3A+CiAgPC9ib2R5Pgo8L2h0bWw+Cg
--frontier--
Each part consists of its own content header and a body. Multipart content can be nested. The of a multipart type must always be "7bit", "8bit" or "binary" to avoid the complications that would be posed by multiple levels of decoding. The multipart block as a whole does not have a charset; non-ASCII characters in the part headers are handled by the Encoded-Word system, and the part bodies can have charsets specified if appropriate for their content-type.
Notes:
- Before the first boundary is an area that is ignored by MIME-compliant clients. This area is generally used to put a message to users of old non-MIME clients.
- It is up to the sending mail client to choose a boundary string that doesn't clash with the body text. Typically this is done by inserting a long random string.
- The last boundary must have two hyphens at the end.
Multipart subtypes
The RFC initially defined four subtypes: mixed, digest, alternative and parallel. A minimally compliant application must support mixed and digest; other subtypes are optional. Applications must treat unrecognized subtypes as "multipart/mixed". Additional subtypes, such as signed and form-data, have since been separately defined in other RFCs.
Mixed
Multipart/mixed is used for sending files with different header fields inline. If sending pictures or other easily readable files, most mail clients will display them inline. Otherwise, it offers them as attachments. The default content-type for each part is "text/plain".The type is defined in RFC 2046.
Digest
Multipart/digest is a simple way to send multiple text messages. The default content-type for each part is "message/rfc822".The MIME type is defined in RFC 2046.
Alternative
The multipart/alternative subtype indicates that each part is an "alternative" version of the same content, each in a different format denoted by its "Content-Type" header. The order of the parts is significant. RFC1341 states: In general, user agents that compose multipart/alternative entities should place the body parts in increasing order of preference, that is, with the preferred format last.Systems can then choose the "best" representation they are capable of processing; in general, this will be the last part that the system can understand, although other factors may affect this.
Since a client is unlikely to want to send a version that is less faithful than the plain text version, this structure places the plain text version first. This makes life easier for users of clients that do not understand multipart messages.
Most commonly, multipart/alternative is used for email with two parts, one plain text and one HTML. The plain text part provides backwards compatibility while the HTML part allows use of formatting and hyperlinks. Most email clients offer a user option to prefer plain text over HTML; this is an example of how local factors may affect how an application chooses which "best" part of the message to display.
While it is intended that each part of the message represent the same content, the standard does not require this to be enforced in any way. At one time, anti-spam filters would only examine the text/plain part of a message, because it is easier to parse than the text/html part. But spammers eventually took advantage of this, creating messages with an innocuous-looking text/plain part and advertising in the text/html part. Anti-spam software eventually caught up on this trick, penalizing messages with very different text in a multipart/alternative message.
The type is defined in RFC 2046.
Related
A multipart/related is used to indicate that each message part is a component of an aggregate whole. It is for compound objects consisting of several inter-related components - proper display cannot be achieved by individually displaying the constituent parts. The message consists of a root part which reference other parts inline, which may in turn reference other parts. Message parts are commonly referenced by Content-ID. The syntax of a reference is unspecified and is instead dictated by the encoding or protocol used in the part.One common usage of this subtype is to send a web page complete with images in a single message. The root part would contain the HTML document, and use image tags to reference images stored in the latter parts.
The type is defined in RFC 2387.
Report
Multipart/report is a message type that contains data formatted for a mail server to read. It is split between a text/plain and a message/delivery-status, which contains the data formatted for the mail server to read.The type is defined in RFC 6522.
Signed
A multipart/signed message is used to attach a digital signature to a message. It has exactly two body parts, a body part and a signature part. The whole of the body part, including mime fields, is used to create the signature part. Many signature types are possible, like "application/pgp-signature" and "application/pkcs7-signature".The type is defined in RFC 1847.
Encrypted
A multipart/encrypted message has two parts. The first part has control information that is needed to decrypt the application/octet-stream second part. Similar to signed messages, there are different implementations which areidentified by their separate content types for the control part. The most common types are
"application/pgp-encrypted" and "application/pkcs7-mime".
The MIME type defined in RFC 1847.
Form-Data
The MIME type multipart/form-data is used to express values submitted through a form. Originally defined as part of HTML 4.0, it is most commonly used for submitting files with HTTP. It is specified in RFC 7578, superseding RFC 2388.Mixed-Replace
The content type multipart/x-mixed-replace was developed as part of a technology to emulate server push and streaming over HTTP.All parts of a mixed-replace message have the same semantic meaning. However, each part invalidates - "replaces" - the previous parts as soon as it is received completely. Clients should process the individual parts as soon as they arrive and should not wait for the whole message to finish.
Originally developed by Netscape, it is still supported by Mozilla, Firefox, Safari, and Opera. It is commonly used in IP cameras as the MIME type for MJPEG streams. It was supported by Chrome for main resources until 2013.
Byteranges
The multipart/byterange is used to represent noncontiguous byte ranges of a single message. It is used by HTTP when a server returns multiple byte ranges and is defined in RFC 2616.RFC documentation
- RFC 1426, SMTP Service Extension for 8bit-MIMEtransport. J. Klensin, N. Freed, M. Rose, E. Stefferud, D. Crocker. February 1993.
- RFC 1847, Security Multiparts for MIME: Multipart/Signed and Multipart/Encrypted
- RFC 3156, MIME Security with OpenPGP
- RFC 2045, MIME Part One: Format of Internet Message Bodies
- RFC 2046, MIME Part Two: Media Types. N. Freed, Nathaniel Borenstein. November 1996.
- RFC 2047, MIME Part Three: Message Header Extensions for Non-ASCII Text. Keith Moore. November 1996.
- RFC 4288, MIME Part Four: Media Type Specifications and Registration Procedures.
- RFC 4289, MIME Part Four: Registration Procedures. N. Freed, J. Klensin. December 2005.
- RFC 2049, MIME Part Five: Conformance Criteria and Examples. N. Freed, N. Borenstein. November 1996.
- RFC 2183, Communicating Presentation Information in Internet Messages: The Content-Disposition Header Field. Troost, R., Dorner, S. and K. Moore. August 1997.
- RFC 2231, MIME Parameter Value and Encoded Word Extensions: Character Sets, Languages, and Continuations. N. Freed, K. Moore. November 1997.
- RFC 2387, The MIME Multipart/Related Content-type
- RFC 1521, Mechanisms for Specifying and Describing the Format of Internet Message Bodies