MPEG-4 Part 3
MPEG-4 Part 3 or MPEG-4 Audio is the third part of the ISO/IEC MPEG-4 international standard developed by Moving Picture Experts Group. It specifies audio coding methods. The first version of ISO/IEC 14496-3 was published in 1999.
The MPEG-4 Part 3 consists of a variety of audio coding technologies – from lossy speech coding, general audio coding, lossless audio compression, a Text-To-Speech Interface, Structured Audio and many additional audio synthesis and coding techniques.
MPEG-4 Audio does not target a single application such as real-time telephony or high-quality audio compression. It applies to every application which requires the use of advanced sound compression, synthesis, manipulation, or playback.
MPEG-4 Audio is a new type of audio standard that integrates numerous different types of audio coding: natural sound and synthetic sound, low bitrate delivery and high-quality delivery, speech and music, complex soundtracks and simple ones, traditional content and interactive content.
Versions
Subparts
MPEG-4 Part 3 contains following subparts:- Subpart 1: Main
- Subpart 2: Speech coding – HVXC
- Subpart 3: Speech coding – CELP
- Subpart 4: General Audio Coding – AAC, TwinVQ, BSAC
- Subpart 5: Structured Audio
- Subpart 6: Text to Speech Interface
- Subpart 7: Parametric Audio Coding – HILN
- Subpart 8: Technical description of parametric coding for high quality audio
- Subpart 9: MPEG-1/MPEG-2 Audio in MPEG-4
- Subpart 10: Technical description of lossless coding of oversampled audio
- Subpart 11: Audio Lossless Coding
- Subpart 12: Scalable Lossless Coding
MPEG-4 Audio Object Types
Object Type ID | Audio Object Type | First public release date | Description |
1 | AAC Main | 1999 | contains AAC LC |
2 | AAC LC | 1999 | Used in the "AAC Profile". MPEG-4 AAC LC Audio Object Type is based on the MPEG-2 Part 7 Low Complexity profile combined with Perceptual Noise Substitution . |
3 | AAC SSR | 1999 | MPEG-4 AAC SSR Audio Object Type is based on the MPEG-2 Part 7 Scalable Sampling Rate profile combined with Perceptual Noise Substitution . |
4 | AAC LTP | 1999 | contains AAC LC |
5 | SBR | 2003 | used with AAC LC in the "High Efficiency AAC Profile" |
6 | AAC Scalable | 1999 | |
7 | TwinVQ | 1999 | audio coding at very low bitrates |
8 | CELP | 1999 | speech coding |
9 | HVXC | 1999 | speech coding |
10 | |||
11 | |||
12 | TTSI | 1999 | |
13 | Main synthesis | 1999 | contains 'wavetable' sample-based synthesis and Algorithmic Synthesis and Audio Effects |
14 | 'wavetable' sample-based synthesis | 1999 | based on SoundFont and DownLoadable Sounds, contains General MIDI |
15 | General MIDI | 1999 | |
16 | Algorithmic Synthesis and Audio Effects | 1999 | |
17 | ER AAC LC | 2000 | Error Resilient |
18 | |||
19 | ER AAC LTP | 2000 | Error Resilient |
20 | ER AAC Scalable | 2000 | Error Resilient |
21 | ER TwinVQ | 2000 | Error Resilient |
22 | ER BSAC | 2000 | It is also known as "Fine Granule Audio" or fine grain scalability tool. It is used in combination with the AAC coding tools and replaces the noiseless coding and the bitstream formatting of MPEG-4 Version 1 GA coder. Error Resilient |
23 | ER AAC LD | 2000 | Error Resilient, used with CELP, ER CELP, HVXC, ER HVXC and TTSI in the "Low Delay Profile", |
24 | ER CELP | 2000 | Error Resilient |
25 | ER HVXC | 2000 | Error Resilient |
26 | ER HILN | 2000 | Error Resilient |
27 | ER Parametric | 2000 | Error Resilient |
28 | SSC | 2004 | |
29 | PS | 2004 and 2006 | used with AAC LC and SBR in the "HE-AAC v2 Profile". PS coding tool was defined in 2004 and Object Type defined in 2006. |
30 | MPEG Surround | 2007 | also known as MPEG Spatial Audio Coding, it is a type of spatial audio coding |
31 | |||
32 | MPEG-1/2 Layer-1 | 2005 | |
33 | MPEG-1/2 Layer-2 | 2005 | |
34 | MPEG-1/2 Layer-3 | 2005 | also known as "MP3onMP4" |
35 | DST | 2005 | lossless audio coding, used on Super Audio CD |
36 | ALS | 2006 | lossless audio coding |
37 | SLS | 2006 | two-layer audio coding with lossless layer and lossy General Audio core/layer |
38 | SLS non-core | 2006 | lossless audio coding without lossy General Audio core/layer |
39 | ER AAC ELD | 2008 | Error Resilient |
40 | SMR Simple | 2008 | note: Symbolic Music Representation is also the MPEG-4 Part 23 standard |
41 | SMR Main | 2008 | |
42 | USAC | 2012 | Unified Speech and audio Coding is defined in MPEG-D Part 3 |
43 | SAOC | 2010 | note: Spatial Audio Object Coding is also the MPEG-D Part 2 standard |
44 | LD MPEG Surround | 2010 | This object type conveys Low Delay MPEG Surround Coding side information in the MPEG-4 Audio framework. |
45 | SAOC-DE | 2013 | Spatial Audio Object Coding Dialogue Enhancement |
46 | Audio Sync | 2015 | The audio synchronization tool provides capability of synchronizing multiple contents in multiple devices. |
Audio Profiles
The MPEG-4 Audio standard defines several profiles. These profiles are based on the object types and each profile supports different list of object types. Each profile may also have several levels, which limit some parameters of the tools present in a profile. These parameters usually are the sampling rate and the number of audio channels decoded at the same time.Audio Profile | Audio Object Types | First public release date |
AAC Profile | AAC LC | 2003 |
High Efficiency AAC Profile | AAC LC, SBR | 2003 |
HE-AAC v2 Profile | AAC LC, SBR, PS | 2006 |
Main Audio Profile | AAC Main, AAC LC, AAC SSR, AAC LTP, AAC Scalable, TwinVQ, CELP, HVXC, TTSI, Main synthesis | 1999 |
Scalable Audio Profile | AAC LC, AAC LTP, AAC Scalable, TwinVQ, CELP, HVXC, TTSI | 1999 |
Speech Audio Profile | CELP, HVXC, TTSI | 1999 |
Synthetic Audio Profile | TTSI, Main synthesis | 1999 |
High Quality Audio Profile | AAC LC, AAC LTP, AAC Scalable, CELP, ER AAC LC, ER AAC LTP, ER AAC Scalable, ER CELP | 2000 |
Low Delay Audio Profile | CELP, HVXC, TTSI, ER AAC LD, ER CELP, ER HVXC | 2000 |
Natural Audio Profile | AAC Main, AAC LC, AAC SSR, AAC LTP, AAC Scalable, TwinVQ, CELP, HVXC, TTSI, ER AAC LC, ER AAC LTP, ER AAC Scalable, ER TwinVQ, ER BSAC, ER AAC LD, ER CELP, ER HVXC, ER HILN, ER Parametric | 2000 |
Mobile Audio Internetworking Profile | ER AAC LC, ER AAC Scalable, ER TwinVQ, ER BSAC, ER AAC LD | 2000 |
HD-AAC Profile | AAC LC, SLS | 2009 |
ALS Simple Profile | ALS | 2010 |
Audio storage and transport
There is no standard for transport of elementary streams over a channel, because the broad range of MPEG-4 applications have delivery requirements that are too wide to easily characterize with a single solution.The capabilities of a transport layer and the communication between transport, multiplex, and demultiplex functions are described in the Delivery Multimedia Integration Framework in ISO/IEC 14496-6. A wide variety of delivery mechanisms exist below this interface, e.g., MPEG transport stream, Real-time Transport Protocol, etc.
Transport in Real-time Transport Protocol is defined in RFC 3016, RFC 3640, RFC 4281 and RFC 4337.
LATM and LOAS were defined for natural audio applications, which do not require sophisticated object-based coding or other functions provided by MPEG-4 Systems.
Bifurcation in the AAC technical standard
The Advanced Audio Coding in MPEG-4 Part 3 Subpart 4 was enhanced relative to the previous standard MPEG-2 Part 7, in order to provide better sound quality for a given encoding bitrate.It is assumed that any Part 3 and Part 7 differences will be ironed out by the ISO standards body in the near future to avoid the possibility of future bitstream incompatibilities. At present there are no known player or codec incompatibilities due to the newness of the standard.
The MPEG-2 Part 7 standard was first published in 1997 and offers three default profiles: Low Complexity profile, Main profile and Scalable Sampling Rate profile.
The MPEG-4 Part 3 Subpart 4 combined the profiles from MPEG-2 Part 7 with Perceptual Noise Substitution and defined them as Audio Object Types.
HE-AAC
is an extension of AAC LC using spectral band replication, and Parametric Stereo. It is designed to increase coding efficiency at low bitrates by using partial parametric representation of audio.AAC-SSR
AAC Scalable Sample Rate was introduced by Sony to the MPEG-2 Part 7 and MPEG-4 Part 3 standards. It was first published in ISO/IEC 13818-7, Part 7: Advanced Audio Coding in 1997. The audio signal is first split into 4 bands using a 4 band polyphase quadrature filter bank. Then these 4 bands are further split using MDCTs with a size k of 32 or 256 samples. This is similar to normal AAC LC which uses MDCTs with a size k of 128 or 1024 directly on the audio signal.The advantage of this technique is that short block switching can be done separately for every PQF band. So high frequencies can be encoded using a short block to enhance temporal resolution, low frequencies can be still encoded with high spectral resolution. However, due to aliasing between the 4 PQF bands coding efficiencies around * fs/8 is worse than normal MPEG-4 AAC LC.
MPEG-4 AAC-SSR is very similar to ATRAC and ATRAC-3.
Why AAC-SSR was introduced
The idea behind AAC-SSR was not only the advantage listed above, but also the possibility of reducing the data rate by removing 1, 2 or 3 of the upper PQF bands. A very simple bitstream splitter can remove these bands and thus reduce the bitrate and sample rate.Example:
- 4 subbands: bitrate = 128 kbit/s, sample rate = 48 kHz, f_lowpass = 20 kHz
- 3 subbands: bitrate ~ 120 kbit/s, sample rate = 48 kHz, f_lowpass = 18 kHz
- 2 subbands: bitrate ~ 100 kbit/s, sample rate = 24 kHz, f_lowpass = 12 kHz
- 1 subband: bitrate ~ 65 kbit/s, sample rate = 12 kHz, f_lowpass = 6 kHz
for this bitrate. So for normal 64 kbit/s AAC LC a bandwidth of 14–16 kHz is
achieved by using intensity stereo and reduced NMRs. This degrades audible quality
less than transmitting 6 kHz bandwidth with perfect quality.