Mp3
Page
Resources :
http://www.winamp.com
My favourite mp3 player site. Get winamp plugins, skins, goodies
here. Its a freeware
http://www.mp3.com
Get all else about mp3 that you want from here. Players, Encoders,
Decoders, CD-Rippers etc. and offcourse Songs.
Frequently Asked Questions about MPEG Audio Layer-3, Fraunhofer-IIS, and
all the rest...
Version 2.60
This text will be continously upgraded: step by step, more answers and
more information will be included. Yes, we definitely know that there are
a lot more questions to answer! But we cannot do that all at once. So,
some parts may remain "under construction" for a while, and other parts
may be modified due to new results of our research work or new applications.
You find the latest release at
http://www.iis.fhg.de/departs/amm/layer3/sw/
or
ftp://ftp.fhg.de/pub/layer3/l3faq.html
Table of Contents
Introduction - or: What is "MPEG Audio Layer-3"?
Today, efficient coding techniques are a must for cost-effective processing
of digital audio and video data by computers. Data reduction of moving
pictures and sound is a key technology for any application with limited
transmission or storage capacity. In the recent years, a lot of progress
has been achieved. While there (still) exist several proprietary formats
for audio and video coding, the ISO/IEC standardisation body has released
an international standard ("MPEG") for powerful audio and video coding
tools (see: Overview
about the ISO-MPEG Standard - or: What is MPEG all about?).
Without data reduction, digital audio signals typically consist
of 16 bit samples recorded at a sampling rate more than twice the actual
audio bandwidth (e.g. 44.1 kHz for Compact Disks). So you end up with more
than 1400 kbit to represent just one second of stereo music in
CD quality. By using MPEG audio coding, you may shrink down the original
sound data from a CD by a factor of 12, without losing sound quality. Factors
of 24 and even more still maintain a sound quality that is significantly
better than what you get by just reducing the sampling rate and the resolution
of your samples. Basically, this is realized by "perceptual coding" techniques
addressing the perception of sound waves by the human ear (see:
Basics
of Perceptual Audio Coding - or: What is the trick?).
Using MPEG audio, one may achieve a typical data reduction of
1:4 by Layer 1 (corresponds with 384 kbps for a stereo signal),
1:6...1:8 by Layer 2 (corresponds with 256..192 kbps for a stereo signal),
1:10...1:12 by Layer 3 (corresponds with 128..112 kbps for a stereo signal),
still maintaining the original CD sound quality.
By exploiting stereo effects and by limiting the audio bandwidth, the
coding schemes may achieve an acceptable sound quality at even lower bitrates.
Layer-3 is the most powerful member of the MPEG audio coding family. For
a given sound quality level, it requires the lowest bitrate - or for a
given bitrate, it achieves the highest sound quality (see: Advanced
Features of Layer-3 - or: Why does Layer-3 perform so well?).
Some typical performance data of Layer-3 are:
sound quality bandwidth mode bitrate reduction ratio
"telephone sound" 2.5 kHz mono 8 kbps* 96:1
"better than shortwave" 4.5 kHz mono 16 kbps* 48:1
"better than AM radio" 7.5 kHz mono 32 kbps 24:1
"similar to FM radio" 11 kHz stereo 56..64 kbps 26..24:1
"near-CD" 15 kHz stereo 96 kbps 16:1
"CD" > 15 kHz stereo 112..128 kbps 14..12:1
*: Fraunhofer uses a non-ISO extension of Layer-3 for enhanced performance ("MPEG 2.5")
All in all, Layer-3 is the key for numerous low-bitrate, high-quality sound
applications (see: Applications
- or: Layer-3, what is it good for?).
Applications - or: Layer-3, what is it good for?
A key technology like Layer-3 is useful for a pretty large spectrum of
applications - practically almost any system with a limited channel capacity
may benefit from it. The following chapters identify some main areas and
list some companies that are actively exploiting the Layer-3 technology.
For product-related information, please contact these companies
directly.
Music Links via ISDN
Digital telephone networks (ISDN = Integrated Services Digital Network)
offer reliable dial-up links with two 64 kbps data channels per basic rate
adapter; other regional networks (in North-America) use 56 kbps data links.
Transmission fees are often rather similar or identical to the traditional
analog phone lines - those allow to transmit up to 28.8 kbps (V.34 modem)
or even 32 kbps ("V.34+").
Using Layer-3, a low-cost narrowband ISDN connection allows to transmit
CD-quality sound. Audio professionals, like broadcasting stations and sound
studios, benefit from the "music-by-phone" application in various ways.
They save money, as they only pay transmission fees for the actual time
of usage (not 24 h a day in case of a leased phone line) and for a rather
small data channel (one ISDN phone connector for a stereo music link).
Radio stations increase the attractiveness of their programs, as reporters
transmit high-quality takes (e.g. an interview) or live news without annoying
"telephone sound". And new applications become possible, e.g. a "virtual
studio", where remote artists may play along some preproduced material,
without actually travelling to the studio.
Examples:
-
In 1992, Radio FFN, a private broadcasting station in Niedersachsen, Germany,
replaced its leased phone lines with ISDN and Layer-3 codecs, to transmit
8 local programs 20 min per day to the central broadcasting studio. This
move saved them transmission fees of more than 300.000 US$ per year.
-
As one of the first real-world trials, all private radio stations of Germany
very successfully used Layer-3 codecs during the Winter Olympic Games in
Albertville (France) as reporter links between the various sporting events
and their central studio in Meribel.
-
At the International Music Festival 92 in Bergen, Arne Nordheim composed
a piece of music, where an organ in the church of Trondheim played along
with the symphony orchestra in Bergen; the sound of the organ was transmitted
via ISDN and a Layer-3 codec.
Since 1992, various manufacturers are producing equipment ("codecs") for
studio applications: Dialog
4,
Lucent,
Telos.
Digital Satellite Broadcasting
Pioneered by WorldSpace,
a worldwide satellite digital audio broadcasting system is under construction.
Its name is WorldStar”, and it will use three geostationary orbit satellites
called AfriStar” (21 East), CaribStar” (95 West), and AsiaStar” (105 East),
with AfriStar being launched in mid-1998. The other satellites will follow
until mid-1999. Each satellite is equipped with three downlink spot beams
that are pointed so as to cover populations that provide the greatest radio
listener base (radio set population of 1 billion, with annual sales of
more than 100 million radios). Each downlink uses TDM (time division multiplexing)
to carry 96 prime rate channels (16.056 kbps each). The prime rate channels
are combined to carry broadcast channels ranging from 16 kbps to 128 kbps;
the broadcast channels are coded using MPEG Layer-3. The prime rate channels
may even be dynamically allocated to meet the demands of the broadcast
service (e.g. 4 channels combined for 1 hour to allow FM quality stereo
(64 kbps) for the transmission of a concert with classic music, followed
by 1 hour with 4 separate news channels (16 kbps) in 4 different native
tongues).
WorldSpace is offering channels on its three satellites for lease to
international and national broadcasters. Agreements already have been signed
with a number of broadcasters, and negotiations are underway with numerous
other system users. Nearly 1 billion $ in private financing has been raised
to cover acquisition of the satellites and for most of the operational
costs through full system implementation in 1999. FranceÂs Alcatel
Espace is the spacecraft prime contractor and supplies the telecommunications
payload.
The radio receivers (named StarMan”) will be designed for maximum convenience
of use at a minimum cost. Low cost receiver will use a small compact patch
antenna, will require practically no pointing, and will tune automatically
to selected channels. Higher end receivers are also envisioned. In a press
release from 5. June 96 (Montreux, Switzerland), WorldSpace declared that
it has awarded production contracts for two million receiver chips; the
contracts were issued to SGS-Thomson and ITT Intermetall, authorizing each
company for an initial production of one million StarMan chip-sets.
ITT
Intermetall has already gained Layer-3 knowhow by using its mask-programmed
DSP technology to develop a single-chip Layer-3 decoder named "MAS 3503
C". This chip supports only MPEG-1 Layer-3.
Audio-on-Demand
The Internet is a world-wide packet-switched network of computers linked
together by various types of data communications systems. Professional
Internet providers usually access the network through rather high bit-rate
links (e.g., primary rate ISDN with 2 Mbps or ATM with up to 2 Gbps). However,
the average consumer uses low cost, low bit-rate connections (e.g., basic
rate ISDN with 64 kbps or phone line modems with 28.8 or 14.4 kbps). The
actual transmission rate depends on the current user load and the infrastructure
of the part of the Internet in use. From a client´s point of view,
it may unpredictably vary between zero and the maximum bit-rate of its
network modem, with an average bit-rate somewhere in between.
Without audio coding, downloading uncompressed high-quality audio files
from a remote Internet server would result in unfavourably long transmission
times. For example, with an average transmission rate of 28.8 kbaud (optimistic
guess), a single 3-min stereo track from a CD (31.7 Mbyte) would require
a download time of more than 2 hours. Therefore, audio on the Internet
calls for an audio coding scheme that maintains sound quality as far as
possible and allows real-time decoding on a large number of computer platforms
without special add-on hardware. Layer-3 fits very well into this scenario
- real-time players (like WinPlay3)
are available. Intranets present an interesting special case, as they usually
provide sufficient bitrate to allow a number of real-time audio links.
Furthermore, our experiments indicate that using the http protocol, a real-time
connection with 56 (112) kbps is possible with one (two) ISDN phone line(s).
If content providers are willing to add audio data onto their Internet
servers, they have to consider carefully the copyright aspects of the music
industry (e.g., artists, producers, record companies). They must not violate
these rights by their actions! In the framework of a European project called
MODE
(for "Music-on-Demand"), we developed a flexible protection scheme called
MMP
(for "multimedia protection format") that effectively addresses this issue.
Furthermore, MMP allows to distribute real-time players "virtually free".
Audio servers may be used plainly for promotional purposes. E.g., museums
may increase the attractiveness of their WWW pages by adding some sound
files, or mail-order services may add sound excerpts to their server to
increase their CD sales numbers. Opticom,
a spin-off from Fraunhofer,
offers system solutions for this type of application. In spring 1996 (CeBit
Hannover), they successfully demonstrated an "audio-on-demand" application
via T-Online together with the Deutsche
Telekom and a broadcasting station, the Südwestfunk Baden-Baden.
Audioservers may also be used for music sales systems. Cerberus
Sound amp Vision uses a personalized real-time Layer-3 player and a
proprietary encryption scheme to sell sound files via the Internet on a
" BERLIN WILL
"Audio-on-the-Internet"
is currently a very popular topic. It does not only comprise audio file
transfers with download times as low as possible, but also streaming audio
applications, like "Internet Radio". As Layer-3 offers a sound quality
"better than shortwave" at a bitrate of 16 kbps (and, with some modifications,
may even be useful at 8 kbps), various companies currently work on this
Internet subject - e.g., Opticom
or
Telos. As first
multimedia authoring tools, "Director Multimedia Studio 2" and "SoundEdit
16" (from Macromedia)
exploit Layer-3 to generate compressed sound files for "Shockwave" movies.
Layer-3 encoders and decoders are not only available as studio equipment,
but also as ISA-bus PC boards from Dialog
4, along with application software; recording and playback tools are
also available from Proton
Data, along with a special decoder module (called "CenLay3") that allows
to playback Layer-3 files via the parallel printer port.
In addition, a file-oriented Layer-3 encoder and decoder (called "L3ENC"
and "L3DEC") is available as shareware for various platforms. Registration
is processed by Opticom.
Real-time Layer-3 players
WinPlay3
"WinPlay3" allows the decoding simply by software on any Pentium PC in
real time. A 80486 class CPU with a built-in floating-point-unit will also
allow some limited operation. For the availability of supported modes,
please refer to the following performance matrix:
Pentium 486DX2-66 486DX-50 486DX-33
MPEG-1 stereo ok - - -
MPEG-1 downmix* ok ok - -
MPEG-1 mono ok ok ok -
MPEG-2 stereo ok ok ok -
MPEG-2 downmix* ok ok ok ok
MPEG-2 mono ok ok ok ok
*downmix: the original stereo signal will be played back as a mono signal
"MPEG-1" = "MPEG-1 Layer-3", i.e. sample rates 32, 44.1 or 48 kHz
"MPEG-2" = "MPEG-2 Layer-3", i.e. sample rates 16, 22.05 or 24 kHz
On a Pentium-90, WinPlay3 consumes less than 30 % of the CPU power to decode
Layer-3 stereo @ 44.1 kHz, or around 5 % of the CPU power to decode Layer-3
mono @ 16 kHz.
At least, a 8-bit stereo sound card is required. For full quality audio,
a 16-bit card is recommended. The card´s MCI driver should support
sampling frequencies from 8 kHz to 48 kHz.
A standard VGA graphics card is required.
As WinPlay3 buffers up to 4 seconds of sound data due to the limitations
of the Microsoft Windows multitasking architecture, around 1 MByte free
physical memory must be available.
WinPlay3 runs with the following operating systems: Microsoft Windows
3.1/3.11 (in extended 386 mode), Windows 95 und Windows NT (long file names
not yet supported).
WinPlay3 supports file play back of *.mp3 files and direct play from
an URL via HTTP. WinPlay3 can simply be integrated as an helper application
in common browsers, for example Netscapeþ or Mosaic.
WinPlay3 is available at http://www.iis.fhg.de/departs/amm/layer3/winplay3/.
The unregistered player is limited to a reproduction time of 20 sec, i.e.
it will playback each plain Layer-3 file only for this time. If you want
to use your player without limitation, you have to register your player
with Opticom.
MMP
As many applications require a player that is "free" for the user, the
latest versions of WinPlay3 (starting with version 2.0) also support the
new "MMP" ("multimedia protection") format.
MMP is a very flexible data format that may support the following functions:
-
"unlocking" of the 20 sec playback time limitation
-
"copyright protection" by applying encryption methods to (part of) the
data
-
"title associated data" (e.g. ISRC code, user data)
-
"expiry date" to allow only a limited use
More detailed information is available at http://www.iis.fhg.de/departs/amm/layer3/mmp/.
In a typical "audio-on-demand" application, the content provider may
"on-the-fly" convert its plain Layer-3 data into MMP data, by using a "MMP
tagger" software (available at Opticom).
The client may use its unregistered player to playback these files without
limitation - the player is "virtually free". The client need not pay fees
- this issue now may be covered at the server side.
MPEG Layer 3 Player
For Mac OS users, a real-time player called "MPEG Layer 3 Player" with
a similar look and feel (and similar features) like "WinPlay3" will soon
be released, too. This new player will (finally!) replace the much simpler
(and somewhat buggy) pre-version 0.99 beta that has been available from
http://www.iis.fhg.de/departs/amm/layer3/macplay3/.
Layer-3 Sound on CD-ROMs
CD-ROMs (and hard disks) have become most popular to store "multimedia"
data. Even with the advent of the new DVD standard, memory capacity will
remain a precious resource for many applications. For uncompressed stereo
signals from a CD, more than 10 MByte are necessary to store one minute
of music. Using Layer-3, less than 1 MByte is enough for the same playing
time. And significantly less memory is necessary, if some limitations in
performance are acceptable. As CD-ROM readers (and pretty soon, writers
too) have already gained a significant market share, typical applications
focus today on storing compressed sound files on CD-ROMs, introducing more
or better sound tracks into the product. Real application examples are
video games, music catalogues or encyclopedias with sound excerpts (e.g.,
"MusicFinder" by Sygna),
or talking books for blind people.
Layer-3 Sound on Silicon
Up to now, solid-state memories (RAMs, Flash-ROMs) are only used as audio
storage devices in special (niche) applications, as the costs per byte
are much higher than with other types of media (magneto-optical disks or
magnetic tapes). Speech announcement systems for mass transit vehicles
(e.g., busses, subways or trains) are an example for such special applications,
as the rough environment requires to use ROM based memories. Since 1993,
Meister
Electronic manufactures speech announcement systems with Layer-3, significantly
reducing the precious memory capacity and, at the same time, significantly
improving the sound quality (compared with their older 64 kbps PCM "phone
sound").
Today, PC-Cards with Flash-ROMs are available, offering a memory capacity
up to 100 MByte and more, but at prohibitive high costs for a consumer
application. Here, further advances in memory and card technology may trigger
a new interesting market segment of "audio-chip-card"-applications. At
a press conference in August 95 in Munich, Siemens
Germany announced the advent of a new cost-effective ROM technology called
the "ROS chip" (ROS = Record-on-Silicon). The first generation of ROS chips
will be in production in 1997, with a storage capacity of 64 Mbit; a next
generation with 256 Mbit as well as a one-time user programmable version
will follow. The ROS chips will be embedded in the new "MultiMedia-Card"
from Siemens, a cost-effective card media that will store data, text, graphics,
images and sound. Siemens has already demonstrated a battery-powered audio
player using a prototype "Audio-Card" containing sound tracks coded with
MPEG-Layer-3.
General Questions and Answers
-
Q: O.K., Layer-3 is obviously a key to many applications. Where
are its limitations?
-
A: Well, Layer-3 is a perceptual audio coding scheme, exploiting
the properties of the human ear, and trying to maintain the original sound
quality as far as possible.
In contrast, a dedicated speech codec exploits the properties of the
human vocal tract, trying to maintain the intelligibility of the voice
signals as far as possible. Advanced speech coding schemes (e.g., CS-ACELP
[LD-CELP] as standardised by ITU as G.723.1 [G.728]) achieve a useful voice
reproduction at bitrates as low as 5.3 [16] kbps, with a codec delay below
40 [1] ms. At such very low bitrates, they behave superior to Layer-3 for
pure voice signals, and they offer the low delay that is necessary for
full- duplex voice communications.
In the framework of MPEG-4, scalable audio coding schemes are devised
that combine speech coding and perceptual audio coding.
-
Q: You mentioned the codec delay. May I have some figures?
-
A: Well, the standard gives some figures of the theoretical minimum
delay:
Layer-1: 19 ms (<50 ms)
Layer-2: 35 ms (100 ms)
Layer-3: 59 ms (150 ms)
Practical values are significantly above that. As they depend on the
implementation, precise figures are hard to give. So the numbers in brackets
are just rough thumb values - real codecs may show even higher values.
So yes, there are certain applications that may suffer from such a delay
(like feedback links for remote reporter units). For many other applications
(like the ones mentioned above), delay is of minor interest.
Overview about the ISO-MPEG Standard - or: What is MPEG all about?
-
Q: What is "MPEG"?
-
A: MPEG is the "Moving Picture Experts Group", working under the
joint direction of the International Standards Organization (ISO) and the
International Electro-Technical Commission (IEC). This group works on standards
for the coding of moving pictures and audio. MPEG
has
created its own homepage, providing information on the what, where, when
and how of the standards.
-
Q: What is MPEG-1, -2, and so on?
-
A: MPEG approaches the growing need for multimedia standards step-by-step.
Today, three main "steps" are defined (MPEG-1, MPEG-2, MPEG-4).
-
MPEG-1: "Coding of Moving Pictures and Associated Audio for Digital Storage
Media at up to about 1.5 Mbit/s"
-
MPEG-2: "Generic Coding of Moving Pictures and Associated Audio Information"
-
MPEG-3: originally planned mainly for HDTV applications; later on, it was
merged into MPEG-2
-
MPEG-4: "Coding of Audio-Visual Objects"
-
Q: Are MPEG-3 and Layer-3 the same thing?
-
A: No! Layer-3 is a powerful audio coding scheme which certainly
is part of the MPEG standard. Layer-3 is defined within the audio part
of both existing international standards, MPEG-1 and MPEG-2. So please
do not mix audio layers and MPEG standards!
-
Q: What is the status of MPEG-1?
-
A: Work on MPEG-1 is finished. The first three parts are standardized
since 1992. MPEG-1 consists of five parts:
-
IS-11172-1 ("System") describes synchronization and multiplexing of video
and audio signals.
-
IS-11172-2 ("Video") describes compression of video signals, focussing
on progressive scan video (and mainly aiming at "Video-on-CD" applications).
-
IS-11172-3 ("Audio") describes a generic audio coding family, with three
hierarchically compatible members (called "Layer-1", "Layer-2" and "Layer-3").
-
IS-11172-4 ("Compliance Testing") describes procedures for determining
the characteristics of coded bitstreams and the decoding process and for
testing compliance with the requirements stated in the other parts.
-
DTR-11172-5 ("Software Simulation") is a technical report about a full
software implementation of the first three parts of MPEG-1.
-
Q: What is the status of MPEG-2?
-
A: MPEG-2 currently consists of nine parts. The first three parts
are standardized since 1994, with some amendments included later on. Other
parts are at different levels of completion.
-
IS-13818-1 ("System") describes synchronization and multiplexing of video
and audio signals; it is also standardised by ITU-T as H.222.
-
IS-13818-2 ("Video") describes a generic video coding tool set, supporting
interlaced scan; it is also standardised by ITU-T as H.262.
-
IS-13818-3 ("Audio") describes a backward compatible extension of MPEG-1
for multichannel audio coding ("surround sound", "multilingual sound")
and a non-backward compatible extension to lower sample rates, to support
sound applications with limited audio bandwidth requirements.
-
IS-13818-4 ("Conformance Testing") describes procedures for determining
the characteristics of coded bitstreams and the decoding process and for
testing compliance with the requirements stated in the other parts.
-
DTR-13818-5 ("Software Simulation") is a technical report about a full
software implementation of the first three parts of MPEG-2.
-
IS-13818-6 ("System Extensions - Digital Storage Media Command and Control
(DSM-CC))" describes a set of protocols for client-server applications
-
CD-13818-7 ("Audio, Non-Backwards-Compatible (NBC) - Coding") describes
an improved audio coding scheme for mono- and stereophonic signals as well
as for multichannel sound
-
13818-8 ("Video, extension to 10-bit input samples") has been withdrawn,
due to insufficient interest.
-
IS-13818-9 ("Real-Time Interface Specification for Low-Jitter Applications")
defines timing constraints on the real-time delivery of MPEG-2 transport
bitstreams.
-
WD-13818-10 ("Conformance Extensions - DSM-CC") describes the addendum
to IS 13818-4 for DSM-CC
-
Q: "NBC audio"?" What is the motivation for this working group?
What are the results?
-
A: Well, during the work for multichannel audio coding (IS-13818-3),
it turned out that backwards compatible (BC) schemes suffer from the matrixing
process. Matrixing is required to allow a MPEG-1 decoder to playback all
surround channels via its two stereophonic channels. Unfortunately, some
of the introduced quantisation noise may become audible after dematrixing.
All in all, during an ISO listening test in spring 1994, BC multichannel
coding performed poorer, compared to non-ISO coding schemes (e.g., Dolby´s
AC-3). So the NBC working group currently develops a new audio coding scheme.
NBC audio achieves a significant better performance, not only for multichannel
surround sound, but even for monophonic signals (here targeting "true transparency"
at 64 kbps). In spring 1996, ISO performed a listening test for 5-channel
surround sound, and NBC audio using a total bit-rate of 320 kbps scored
better than Layer-2 BC at a bit-rate of 640 kbps. NBC audio will also become
one of the MPEG-4 audio coding algorithms.
-
Q: How do I get the MPEG documents?
-
A: Well, you may contact ISO,
or you order it from your national standards body. E.g., in Germany, please
contact DIN.
-
Q: Is some public C source available?
-
A: Well, there is "public C source" available on various sites,
e.g. at ftp://ftp.fhg.de/pub/layer3/
or at ftp://ftp.tnt.uni-hannover.de/pub/MPEG/audio/mpeg2/public_software/
. This code has been written mainly for explanation purposes, so do not
expect too much performance.
Some Basics about MPEG Audio - or: What about Layer-1, Layer-2, Layer-3?
-
Q: Talking about MPEG audio, I always hear "Layer 1, 2 and 3". What
does it mean?
-
A: MPEG describes the compression of audio signals using high performance
perceptual coding schemes. It specifies a family of three audio coding
schemes, simply called Layer-1, Layer-2, and Layer-3. From Layer-1 to Layer-3,
encoder complexity and performance (sound quality per bitrate) are increasing.
The three codecs are compatible in a hierarchical way, i.e. a Layer-N
decoder may be able to decode bitstream data encoded in Layer-N and all
Layers below N (e.g., a Layer-3 decoder may accept Layer-1,-2,-3, whereas
a Layer-2 decoder may accept only Layer-1 and -2.)
-
Q: So we have a family of three audio coding schemes. What does
the MPEG standard define, exactly?
-
A: For each Layer, the standard specifies the bitstream format and
the decoder. To allow for future improvements, it does not specify the
encoder, but an informative chapter gives an example for an encoder for
each Layer.
-
Q: What have the three audio Layers in common?
-
A: All Layers use the same basic structure. The coding scheme can
be described as "perceptual noise shaping" or "perceptual subband / transform
coding". The encoder analyzes the spectral components of the audio signal
by calculating a filterbank (transform) and applies a psychoacoustic model
to estimate the just noticeable noise-level. In its quantization and coding
stage, the encoder tries to allocate the available number of data bits
in a way to meet both the bitrate and masking requirements.
The decoder is much less complex. Its only task is to synthesize an
audio signal out of the coded spectral components.
All Layers use the same analysis filterbank (polyphase with 32 subbands).
Layer-3 adds a MDCT transform to increase the frequency resolution.
All Layers use the same "header information" in their bitstream, to
support the hierarchical structure of the standard.
All Layers have a similar sensitivity to biterrors. They use a bitstream
structure that contains parts that are more sensitive to biterrors ("header",
"bit allocation", "scalefactors", "side information") and parts that are
less sensitive ("data of spectral components").
All Layers support the insertion of programm-associated information
("ancillary data") into their audio data bitstream.
All Layers may use 32, 44.1 or 48 kHz sampling frequency.
All Layers are allowed to work with similar bitrates:
Layer-1: from 32 kbps to 448 kbps
Layer-2: from 32 kbps to 384 kbps
Layer-3: from 32 kbps to 320 kbps
The last two statements refer to MPEG-1; with MPEG-2, there is an extension
for the sampling frequencies and bitrates (see below).
-
Q: What are the main differences between the three Layers, from
a global view?
-
A: From Layer-1 to Layer-3, complexity increases (mainly true for
the encoder), overall codec delay increases, and performance increases
(sound quality per bitrate).
-
Q: What are the main differences between MPEG-1 and MPEG-2 in the
audio part?
-
A: MPEG-1 and MPEG-2 use the same family of audio codecs, Layer-1,
-2 and -3. The new audio features of MPEG-2 are a "low sample rate extension"
to address very low bitrate applications with limited bandwidth requirements
(the new sampling frequencies are 16, 22.05 or 24 kHz, the bitrates extend
down to 8 kbps), and a "multichannel extension" to address surround sound
applications with up to 5 main audio channels (left, center, right, left
surround, right surround) and optionally 1 extra "low frequency enhancement
(LFE)" channel for subwoofer signals; in addition, a "multilingual extension"
allows the inclusion of up to 7 more audio channels.
-
Q: Is this all compatible to each other?
-
A: Well, more or less, yes - with the execption of the low sample
rate extension. Obviously, a pure MPEG-1 decoder is not able to handle
the new "half" sample rates.
-
Q: You mean: compatible!? With all these extra audio channels? Please
explain!
-
A: Compatibility has been a major topic during the MPEG-2 definition
phase. The main idea is to use the same basic bitstream format as defined
in MPEG-1, with the main data field carrying two audio signals (called
L0 and R0) as before, and the ancillary data field carrying the multichannel
extension information. Without going further into details, two terms should
be explained here: "forwards compatible": the MPEG-2 decoder has to accept
any MPEG-1 audio bitstream (that represents one or two audio channels)
"backwards compatible": the MPEG-1 decoder should be able to decode the
audio signals in the main data field (L0 and R0) of the MPEG-2 bitstream
"Matrixing" may be used to get the surround information into L0 and R0:
L0 = left signal + a * center signal + b * left surround signal R0 = right
signal + a * center signal + b * right surround signal Therefore, a MPEG-1
decoder can reproduce a comprehensive downmix of the full 5- channel information.
A MPEG-2 decoder uses the multichannel extension information (3 more audio
signals) to reconstruct the five surround channels.
-
Q: In your footnotes, you indicate the use of some "non-ISO" extension
inside your Fraunhofer codec, called "MPEG 2.5", to further improve the
performance at very low bitrates (e.g. 8 kbps mono). What do you mean by
this?
-
A: Oh, yes. Well, the MPEG-2 standard allows bitrates as low as
8 kbps, for the low sample rate extension. At such a low bitrate, the useful
audio bandwidth has to be limited anyway, e.g. to 3 kHz. Therefore, the
actual sample rate could be reduced, e.g. to 8 kHz. The lower the sample
rate, the better the frequency resolution, the worse the time resolution,
and the better the ratio between control information and audio payload
inside the bitstream format. As the MPEG-2 standard defines 16 kHz as lowest
sample rate, we introduced a further extension, again dividing the low
sample rates of MPEG-2 by 2, i.e. we introduced 8, 11.025, and 12 kHz -
and we named this extension to the extension "MPEG 2.5". "Layer-3" performs
significantly better with 8 kbps @ 8 kHz or 16 kbps @ 11 kHz than with
8 or 16 kbps @ 16 kHz.
Advanced Features of Layer-3 - or: Why does Layer-3 perform so well?
-
Q: Well, I read your statement about "CD-like" performance, achieved
at a data reduction of 4:1 (or 384 kbps total bitrate) with Layer-1, 6..8:1
(or 256..192 kbps total bitrate) with Layer-2, and 12..14:1 (or 128..112
kbps total bitrate) with Layer-3. Can you explain a little further?
-
A: Well, each audio Layer extends the features of the Layer with
the lower number. The simplest form is Layer-1. It has been designed mainly
for the DCC (Digital Compact Cassette), where it is used at 384 kbps (called
"PASC"). Layer-2 has been designed as a trade-off between complexity and
performance. It achieves a good sound quality at bitrates down to 192 kbps.
Below, sound quality suffers. Layer-3 has been designed for low bitrates
right from the start. It adds a number of "advanced features" to Layer-2:
the frequency resolution is 18 times higher, which allows a Layer-3 encoder
to adapt the quantisation noise much better to the masking threshold only
Layer-3 uses entropy coding (like MPEG video) to further reduce redundancy
only Layer-3 uses a bit reservoir (like MPEG video) to suppress artefacts
in critical moments and Layer-3 may use more advanced joint-stereo coding
methods
-
Q: I see. Sounds to me as if Layer-3 is something like a "Layer-2++".
Now, tell me more about sound quality. How do you assess that?
-
A: Today, there is no alternative to expensive listening tests.
During the ISO-MPEG process, a number of international listening tests
have been performed, with a lot of trained listeners. All these tests used
the "triple stimulus, hidden reference" method and the "CCIR impairment
scale" to assess the sound quality. The listening sequence is "ABC", with
A = original, BC = pair of original / coded signal with random sequence,
and the listener has to evaluate both B and C with a number between 1.0
and 5.0. The meaning of these values is: 5.0 = transparent (this should
be the original signal) 4.0 = perceptible, but not annoying (first differences
noticable) 3.0 = slightly annoying 2.0 = annoying 1.0 = very annoying
-
Q: Listening tests are certainly an expensive task. Is there really
no alternative?
-
A: Well, at least not today. Tomorrow may be different. To assess
sound quality with perceptual codecs, all traditional "quality" parameters
(like signal-to-noise ratio, total harmonic distortion, bandwidth) are
rather useless, as any codec may introduce noise and distortions as long
as these do not affect the perceived sound quality. So, listening tests
are necessary, and, if carefully prepared and performed, they lead to rather
reliable results.
Nevertheless, Fraunhofer-IIS works on the development and standardisation
of objective sound quality assessment tools, too. And there is already
a first product available (contact Opticom),
a real-time measurement tool that nicely supports the analysis of perceptual
audio codecs. If you need more information about the Noise- to-Mask-Ratio
(NMR) technology, feel free to contact nmr@iis.fhg.de.
-
Q: O.K., back to these listening tests and the performance evaluation.
Come on, tell me some results.
-
A: Well, for more details you should study one of these AES
papers or the MPEG documents. For Layer-3, the main result is that it always
performed superior at low bitrates (64 kbps per audio channel or below).
Well, this is not completely surprising, as Layer-3 uses the same tool
set as Layer-2, but with some additional advanced coding features that
all address the demands of very low bitrate coding. One impressive example
is the ISO-MPEG listening test carried out in September 94 at NTT Japan
(doc. ISO/IEC JTC1/SC29/WG11 N0848, 11.Nov. 94). Another interesting result
is the conclusion of the task group TG 10/2 within the ITU- R, which recommends
the use of low bit-rate audio coding schemes for digital sound-broadcasting
applications (ITU-R doc. BS.1115).
-
Q: Very interesting! Tell me more about this recommendation!
-
A: The task group TG 10/2 finished its work in 10/93. The recommendation
defines three fields of broadcast applications and recommends Layer-2 with
180 kbps per channel for distribution and contribution links (20 kHz bandwidth,
no audible impairments with up to 5 cascaded codec), Layer-2 with 128 kbps
per channel for emission (20 kHz bandwidth), and Layer-3 with 60 (120)
kbps for mono (stereo) signals for commentary links (15 kHz bandwidth).
Basics of Perceptual Audio Coding - or: What is the trick?
Sorry - under construction...
References - or: Where to find more information?
For around 10 years, perceptual audio coding is a permanent topic at various
scientific conferences; e.g., the AES
(Audio
Engineering Society) organizes two conventions per year. You may find the
following papers helpful:
-
Brandenburg, Stoll, et al.: "The ISO/MPEG-Audio Codec: A Generic Standard
for Coding of High Quality Digital Audio", 92nd AES, Vienna Mar. 92, pp.
3336; revised version ("ISO-MPEG-1 Audio: A Generic Standard...") published
in the Journal of AES, Vol.42, No. 10, Oct. 94
-
Eberlein, Popp, et al.: "Layer-3, a Flexible Coding Standard", 94th AES,
Berlin Mar. 93, pp. 3493 3) Church, Grill, et al.: "ISDN and ISO/MPEG Layer-3
Audio Coding: Powerful New tools for Broadcast and Audio Production", 95th
AES, New York Oct. 93, pp. 3743
-
Grill, Herre, et al.: "Improved MPEG-2 Audio Multi-Channel Encoding", 96th
AES, Amsterdam Feb. 94, pp. 3865
-
Witte, Dietz, et al.: "Single Chip Implementation of an ISO/MPEG Layer-3
Decoder", 96th AES, Amsterdam Feb. 94, pp. 3805
-
Herre, Brandenburg, et al.: "Second Generation ISO/MPEG Audio Layer-3 Coding",
98th AES, Paris Feb. 95
-
Dietz, Popp, et al.: "Audio Compression for Network Transmission", 99th
AES, New York Oct. 95, pp. 4129
-
Brandenburg, Bosi: "Overview of MPEG-Audio: Current and Future Standards
for Low Bit-Rate Audio Coding, 99th AES, New York Oct. 95, pp. 4130
Please note that these papers are not available electronically. You have
to order the preprints ("pp. xxxx") directly from the AES.
Addressess
AES, 60 East 42nd Street, Suite 2520 New York, NY 10165-2520,
USA
fax: +1 212 682 0477
email: hq@aes.org
http://www.aes.org/
Cerberus Sound & Vision,
21 Denmark Street
London WC2H 8NE, UK
fax: +44 171 497 0679
email: http://www.cdj.co.uk/
Deutsche Telekom AG, Technologiezentrum
Darmstadt
Aussenstelle Berlin, Abteilung EK 21
Oranienburger Str. 70, D-10117 Berlin, Germany
fax: +49 30 2845 4146
Dialog 4 System Engineering GmbH, Monreposstr. 55
D-71634 Ludwigsburg, Germany
fax: +49 7141 22667
email: dialog4@proaudio.de
http://win.bda.de/bda/int/proaudio/dialog4/
DIN Beuth Verlag, Auslandsnormen
D-10772 Berlin, Germany
fax: +49 30 2601 1231
email: postmaster@din.de
Fraunhofer-IIS, Am Weichselgarten 3
D-91058 Erlangen, Germany
contact: Harald Popp
fax: +49 9131 776 399
email: layer3@iis.fhg.de
http://www.iis.fhg.de/departs/amm/layer3/
ISO Central Secretariat, Case postale 56,
CH-1211 Geneva 20, Switzerland
fax: +41 22 733 3430
email: central@isocs.iso.ch
http://www.iso.ch/
ITT Intermetall GmbH, Hans-Bunte-Str. 19
D-79108 Freiburg, Germany
fax: +49 761 517 2395
email: info@itt-sc.de
Lucent Technologies, Thurn-und-Taxis-Str. 10
D-90411 Nürnberg, Germany
contact: Wolfgang Peters
fax: +49 911 526 2278
email: WolfgangPeters@lucent.com
Macromedia Inc., 600 Townsend
San Francisco, CA 94103, USA
fax: +1 415 626 0554
http://www.macromedia.com/
Meister Electronic GmbH, Kölner Str. 37
D-51149 Köln, Germany
fax: +49 2203 1701 30
MODE
http://www.mode.net/
MPEG
http://www.cselt.stet.it/mpeg/
Opticom, Am Weichselgarten 7
D-91058 Erlangen, Germany
fax: +49 9131 691325
email: info@opticom.de
http://www.opticom.de
Proton Data, Marrensdamm 12 b
D-24944 Flensburg, Germany
fax: +49 461 3816948
email: proton.data@t-online.de
Siemens AG Halbleiter, P.O. Box 80 17 09
D-81617 Muenchen, Germany
fax: +49 89 4144 4697
email: Christine.Born@hl.siemens.de
Sygna A/S, P.O.Box 191
N-5801 Sogndal, Norway
fax: +47 5767 6190
email: bach@sygna.no
http://www.mode.net/partners/sygna.html
Telos Systems, 2101 Superior Avenue
Cleveland, OH 44114, USA
fax: +1 216 241 4103
email: info@zephyr.com
http://www.zephyr.com/
WorldSpace 11 Dupont Circle, N.W., 9th Floor
Washington, DC 20036, USA
fax: +1 202 884 7900
email: gene@mail.worldspace.com
http://www.worldspace.com
About us - or: What is going on at our Fraunhofer Institute?
-
Q: Who is or was Fraunhofer? And what does your institute do?
-
A: As researcher, inventor and entrepreneur, Joseph von Fraunhofer
(1787 - 1826) won high acclaim for his scientific and commercial achievements.
When the Fraunhofer-Gesellschaft was founded in Munich in 1949, his name
was chosen as the "guiding light" of the association.
Today, the Fraunhofer-Gesellschaft employs a staff of around 8.000
persons and operates 46 research institutes in Germany and one resource
centre in the United States, with a research volume of around 1 billion
DM. 70 % of its income is obtained by contract research for public authorities
as well as for industrial clients.
The Fraunhofer
Institut Integrierte Schaltungen (IIS) was founded in Erlangen in 1985.
It is headed by Prof. Dr.Ing. Dieter Seitzer and Dr. Heinz Gerhäuser.
Today, a staff of 160 persons works on projects in the field of information
electronics, developing microelectronic solutions at chip-, board- and
system level. In its department "Audio & Multimedia", headed by Dr.
Karlheinz Brandenburg, around 40 skilled engineers concentrate on the development
and real-time implementation of signal processing algorithms in the field
of audiovisual communications.
-
Q: So you focus on "contract research". What does this mean exactly?
-
A: Simply put: we have to earn our money. In case of our institute,
we are funded by public money for less than 20 % - the rest of our budget
has to be financed by research & development projects. You may call
this work "applied research", i.e. in contrast to a university, we focus
on real-world applications, and in contrast to an engineeering office,
we focus on state-of-the-art applications that bear some technical risks
(and therefore need some further research). With other words, we are always
trying to stay at the leading edge of technology. Take audio coding as
an example. We started in 1987, in a close cooperation with the University
of Erlangen, to develop an advanced audio coding scheme for future broadcast
services (Eureka 147, DAB radio). In 1991, our algorithm ("Layer-3") became
the most powerful member of audio coding schemes of the international ISO-MPEG
standard. Since then, we work on industrial applications as well as on
further audiovisual research projects, e.g. MPEG-4 scalable audio coding,
MPEG-2 NBC audio coding, or MPEG-4 audiovisual terminals.
-
Q: I am interested in your Layer-3 technology. What can you do for
me?
-
A: Well - basically, you may use our knowhow as a cost-effective
road to your application. We expect a certain renumeration for our development
work that we carried out in advance. We call this a "know-how share". In
addition, you may want us to work on some special R&D tasks for you,
so you have to pay for this extra effort, too. This is the principle. In
case of Layer-3, we have advanced simulation sources (C) for encoder and
decoder as well as DSP source and assembler code for decoders on DSP 5600x
(Motorola), DSP 32C (AT&T), TMS320C30 (TI), and MAS 3503 C (ITT), and
for encoders on a hybrid solution (32C + 5600x) as well as on a pure 5600x
(2 DSPs) solution. We expect a single 5630x Layer-3 encoder until the end
of 1996. In any case, depending on your specific technical needs, the knowhow-share
sum may range from several 10.000.- $ to more than 100.000.- $. In any
case, we expect significantly more money for the encoder, as this is the
part that is responsible for the performance of a Layer-3 system (and so
it is the part where most of our knowhow is concentrated). So you know
the framework. We are open for any discussion and any new ideas - so feel
free to contact us.
Oh - by the way you are interested in some rough ASIC estimations for
a Layer-3 stereo decoder. You will need a computation power of around 12
MIPs, a Data ROM of around 2.5 Kwords, a Data RAM of around 4.5 Kwords,
and a Programm ROM of around 2 to 4 Kwords (depending on the instruction
set). The word length should be 20 bit, at least.
-
Q: What else do I have to keep in mind, if I want to use Layer-3
in my application? Are there patents involved? How may I address this topic?
-
A: You are right. For all MPEG audio coding schemes, patent rights
exist. Using MPEG audio, you use these rights - and in order not to violate
them, you should establish a license contract with the patent holders.
This is true for all MPEG audio Layers. In case of Layer-3, there are currently
two entities that may give licenses, Thomson Multimedia, Paris, and Fraunhofer-IIS,
Erlangen. Due to an agreement between them, Thomson is in charge of consumer-oriented
applications, and Fraunhofer-IIS is in charge of professional-oriented
applications. License contracts typically address only the patent issue.
Due to the rules of ISO-MPEG, the license has to be given non-exclusively
on fair and reasonable terms. Of course, details depend on the specific
business model.
So there are four steps for a Layer-3 application. First, defining
the technical requirements and finding the most cost-effective road to
meet them. Second, following that road to the final solution. Third, defining
the license rules depending on the business model. Four, signing the resulting
license contract.
Fraunhofer Institut Integrierte Schaltungen IIS, Am Weichselgarten 3, D-91058
Erlangen, Germany, Fax: +49-9131-776-399


