MPEG-4 Part 3

MPEG-4 Part 3 or MPEG-4 Audio is the third part of the ISO/IEC MPEG-4 international standard developed by Moving Picture Experts Group. It specifies audio coding methods. The first version of ISO/IEC 14496-3 was published in 1999.
The MPEG-4 Part 3 consists of a variety of audio coding technologies – from lossy speech coding, general audio coding, lossless audio compression, a Text-To-Speech Interface, Structured Audio and many additional audio synthesis and coding techniques.
MPEG-4 Audio does not target a single application such as real-time telephony or high-quality audio compression. It applies to every application which requires the use of advanced sound compression, synthesis, manipulation, or playback.
MPEG-4 Audio is a new type of audio standard that integrates numerous different types of audio coding: natural sound and synthetic sound, low bitrate delivery and high-quality delivery, speech and music, complex soundtracks and simple ones, traditional content and interactive content.

Versions

Subparts

MPEG-4 Part 3 contains following subparts:

Subpart 1: Main
Subpart 2: Speech coding – HVXC
Subpart 3: Speech coding – CELP
Subpart 4: General Audio Coding – AAC, TwinVQ, BSAC
Subpart 5: Structured Audio
Subpart 6: Text to Speech Interface
Subpart 7: Parametric Audio Coding – HILN
Subpart 8: Technical description of parametric coding for high quality audio
Subpart 9: MPEG-1/MPEG-2 Audio in MPEG-4
Subpart 10: Technical description of lossless coding of oversampled audio
Subpart 11: Audio Lossless Coding
Subpart 12: Scalable Lossless Coding
MPEG-4 Audio Object Types

MPEG-4 Audio includes a system for handling a diverse group of audio formats in a uniform manner. Each format is assigned a unique Audio Object Type to represent it. Object Type is used to distinguish between different coding methods. It directly determines the MPEG-4 tool subset required to decode a specific object. The MPEG-4 profiles are based on the object types and each profile supports different list of object types.

Object Type ID	Audio Object Type	First public release date	Description
1	AAC Main	1999	contains AAC LC
2	AAC LC	1999	Used in the "AAC Profile". MPEG-4 AAC LC Audio Object Type is based on the MPEG-2 Part 7 Low Complexity profile combined with Perceptual Noise Substitution .
3	AAC SSR	1999	MPEG-4 AAC SSR Audio Object Type is based on the MPEG-2 Part 7 Scalable Sampling Rate profile combined with Perceptual Noise Substitution .
4	AAC LTP	1999	contains AAC LC
5	SBR	2003	used with AAC LC in the "High Efficiency AAC Profile"
6	AAC Scalable	1999
7	TwinVQ	1999	audio coding at very low bitrates
8	CELP	1999	speech coding
9	HVXC	1999	speech coding
10
11
12	TTSI	1999
13	Main synthesis	1999	contains 'wavetable' sample-based synthesis and Algorithmic Synthesis and Audio Effects
14	'wavetable' sample-based synthesis	1999	based on SoundFont and DownLoadable Sounds, contains General MIDI
15	General MIDI	1999
16	Algorithmic Synthesis and Audio Effects	1999
17	ER AAC LC	2000	Error Resilient
18
19	ER AAC LTP	2000	Error Resilient
20	ER AAC Scalable	2000	Error Resilient
21	ER TwinVQ	2000	Error Resilient
22	ER BSAC	2000	It is also known as "Fine Granule Audio" or fine grain scalability tool. It is used in combination with the AAC coding tools and replaces the noiseless coding and the bitstream formatting of MPEG-4 Version 1 GA coder. Error Resilient
23	ER AAC LD	2000	Error Resilient, used with CELP, ER CELP, HVXC, ER HVXC and TTSI in the "Low Delay Profile",
24	ER CELP	2000	Error Resilient
25	ER HVXC	2000	Error Resilient
26	ER HILN	2000	Error Resilient
27	ER Parametric	2000	Error Resilient
28	SSC	2004
29	PS	2004 and 2006	used with AAC LC and SBR in the "HE-AAC v2 Profile". PS coding tool was defined in 2004 and Object Type defined in 2006.
30	MPEG Surround	2007	also known as MPEG Spatial Audio Coding, it is a type of spatial audio coding
31
32	MPEG-1/2 Layer-1	2005
33	MPEG-1/2 Layer-2	2005
34	MPEG-1/2 Layer-3	2005	also known as "MP3onMP4"
35	DST	2005	lossless audio coding, used on Super Audio CD
36	ALS	2006	lossless audio coding
37	SLS	2006	two-layer audio coding with lossless layer and lossy General Audio core/layer
38	SLS non-core	2006	lossless audio coding without lossy General Audio core/layer
39	ER AAC ELD	2008	Error Resilient
40	SMR Simple	2008	note: Symbolic Music Representation is also the MPEG-4 Part 23 standard
41	SMR Main	2008
42	USAC	2012	Unified Speech and audio Coding is defined in MPEG-D Part 3
43	SAOC	2010	note: Spatial Audio Object Coding is also the MPEG-D Part 2 standard
44	LD MPEG Surround	2010	This object type conveys Low Delay MPEG Surround Coding side information in the MPEG-4 Audio framework.
45	SAOC-DE	2013	Spatial Audio Object Coding Dialogue Enhancement
46	Audio Sync	2015	The audio synchronization tool provides capability of synchronizing multiple contents in multiple devices.

Audio Profiles

The MPEG-4 Audio standard defines several profiles. These profiles are based on the object types and each profile supports different list of object types. Each profile may also have several levels, which limit some parameters of the tools present in a profile. These parameters usually are the sampling rate and the number of audio channels decoded at the same time.

Audio Profile	Audio Object Types	First public release date
AAC Profile	AAC LC	2003
High Efficiency AAC Profile	AAC LC, SBR	2003
HE-AAC v2 Profile	AAC LC, SBR, PS	2006
Main Audio Profile	AAC Main, AAC LC, AAC SSR, AAC LTP, AAC Scalable, TwinVQ, CELP, HVXC, TTSI, Main synthesis	1999
Scalable Audio Profile	AAC LC, AAC LTP, AAC Scalable, TwinVQ, CELP, HVXC, TTSI	1999
Speech Audio Profile	CELP, HVXC, TTSI	1999
Synthetic Audio Profile	TTSI, Main synthesis	1999
High Quality Audio Profile	AAC LC, AAC LTP, AAC Scalable, CELP, ER AAC LC, ER AAC LTP, ER AAC Scalable, ER CELP	2000
Low Delay Audio Profile	CELP, HVXC, TTSI, ER AAC LD, ER CELP, ER HVXC	2000
Natural Audio Profile	AAC Main, AAC LC, AAC SSR, AAC LTP, AAC Scalable, TwinVQ, CELP, HVXC, TTSI, ER AAC LC, ER AAC LTP, ER AAC Scalable, ER TwinVQ, ER BSAC, ER AAC LD, ER CELP, ER HVXC, ER HILN, ER Parametric	2000
Mobile Audio Internetworking Profile	ER AAC LC, ER AAC Scalable, ER TwinVQ, ER BSAC, ER AAC LD	2000
HD-AAC Profile	AAC LC, SLS	2009
ALS Simple Profile	ALS	2010

Audio storage and transport

There is no standard for transport of elementary streams over a channel, because the broad range of MPEG-4 applications have delivery requirements that are too wide to easily characterize with a single solution.
The capabilities of a transport layer and the communication between transport, multiplex, and demultiplex functions are described in the Delivery Multimedia Integration Framework in ISO/IEC 14496-6. A wide variety of delivery mechanisms exist below this interface, e.g., MPEG transport stream, Real-time Transport Protocol, etc.
Transport in Real-time Transport Protocol is defined in RFC 3016, RFC 3640, RFC 4281 and RFC 4337.
LATM and LOAS were defined for natural audio applications, which do not require sophisticated object-based coding or other functions provided by MPEG-4 Systems.

Bifurcation in the AAC technical standard

The Advanced Audio Coding in MPEG-4 Part 3 Subpart 4 was enhanced relative to the previous standard MPEG-2 Part 7, in order to provide better sound quality for a given encoding bitrate.
It is assumed that any Part 3 and Part 7 differences will be ironed out by the ISO standards body in the near future to avoid the possibility of future bitstream incompatibilities. At present there are no known player or codec incompatibilities due to the newness of the standard.
The MPEG-2 Part 7 standard was first published in 1997 and offers three default profiles: Low Complexity profile, Main profile and Scalable Sampling Rate profile.
The MPEG-4 Part 3 Subpart 4 combined the profiles from MPEG-2 Part 7 with Perceptual Noise Substitution and defined them as Audio Object Types.

HE-AAC

is an extension of AAC LC using spectral band replication, and Parametric Stereo. It is designed to increase coding efficiency at low bitrates by using partial parametric representation of audio.

AAC-SSR

AAC Scalable Sample Rate was introduced by Sony to the MPEG-2 Part 7 and MPEG-4 Part 3 standards. It was first published in ISO/IEC 13818-7, Part 7: Advanced Audio Coding in 1997. The audio signal is first split into 4 bands using a 4 band polyphase quadrature filter bank. Then these 4 bands are further split using MDCTs with a size k of 32 or 256 samples. This is similar to normal AAC LC which uses MDCTs with a size k of 128 or 1024 directly on the audio signal.
The advantage of this technique is that short block switching can be done separately for every PQF band. So high frequencies can be encoded using a short block to enhance temporal resolution, low frequencies can be still encoded with high spectral resolution. However, due to aliasing between the 4 PQF bands coding efficiencies around * fs/8 is worse than normal MPEG-4 AAC LC.
MPEG-4 AAC-SSR is very similar to ATRAC and ATRAC-3.

Why AAC-SSR was introduced

The idea behind AAC-SSR was not only the advantage listed above, but also the possibility of reducing the data rate by removing 1, 2 or 3 of the upper PQF bands. A very simple bitstream splitter can remove these bands and thus reduce the bitrate and sample rate.
Example:

4 subbands: bitrate = 128 kbit/s, sample rate = 48 kHz, f_lowpass = 20 kHz
3 subbands: bitrate ~ 120 kbit/s, sample rate = 48 kHz, f_lowpass = 18 kHz
2 subbands: bitrate ~ 100 kbit/s, sample rate = 24 kHz, f_lowpass = 12 kHz
1 subband: bitrate ~ 65 kbit/s, sample rate = 12 kHz, f_lowpass = 6 kHz

Note: although possible, the resulting quality is much worse than typical
for this bitrate. So for normal 64 kbit/s AAC LC a bandwidth of 14–16 kHz is
achieved by using intensity stereo and reduced NMRs. This degrades audible quality
less than transmitting 6 kHz bandwidth with perfect quality.

BSAC

Bit Sliced Arithmetic Coding is an MPEG-4 standard for scalable audio coding. BSAC uses an alternative noiseless coding to AAC, with the rest of the processing being identical to AAC. This support for scalability allows for nearly transparent sound quality at 64 kbit/s and graceful degradation at lower bit rates. BSAC coding is best performed in the range of 40 kbit/s to 64 kbit/s, though it operates in the range of 16 kbit/s to 64 kbit/s. The AAC-BSAC codec is used in Digital Multimedia Broadcasting applications.

Licensing

In 2002, the MPEG-4 Audio Licensing Committee selected the Via Licensing Corporation as the Licensing Administrator for the MPEG-4 Audio patent pool.

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...