MP3 Compression

The audio stream of an audio CD stores more data than your brain can actually perceive. For example, if two similar notes are played simultaneously, your brain can perceive only one of them. If two sounds are different but one is louder than the other, your brain may never perceive the quieter signal. This effect is called “Masking”. Humans cannot hear frequencies below 20 Hz or above 20 KHz. While hearing capacities vary among individuals, humans perceive midrange frequencies more strongly than high and low frequencies. These are simple and well-established empirical observations on the human hearing mechanism. The study of this auditory phenomenon is called “psychoacoustics”. The human hearing patterns have been researched so intricately that the entire process has been translated into mathematical models and can be represented in tables and charts. These reference tables and charts are built into the MP3 codec.

 

Frames and bit rate

When you click on the “Rip” or “Grab” button after inserting an audio disc into your optical drive, the audio signals are broken into fragments called “frames”, each of which last typically for a fraction of a second. You can think of frames similar to those present in a filmstrip. But here, it refers to sound. Next, the encoder calculates the number of bits to be distributed to best account for the audio stream to be encoded. The reason being, the different portions of the frequency spectrum are encoded most efficiently using slight variants of the same encoding algorithm for optimal results. Before moving on to compression, the MP3 codec takes into consideration the encoding bit rate specified by the user. This step determines how much of the available audio data will be stored, and how much will be dropped. Bit rate refers to the number of bits which are played per second. It is comparable to the resolution of an image—the higher the resolution, the better the quality of the image. Similarly, in an MP3 file, it is the bit rate which determines the audio resolution—the higher the bit rate, the greater the audio resolution. While you cannot control the degree of loss, you can control the number of bits per second to be devoted to data storage, which has a similar net result.

 

Huffman Coding

Dropping redundant frequencies

The audio frequency for each frame is compared to the mathematical models of psychoacoustics which are stored as reference tables in the MP3 codec. From these reference tables, it can be determined which frequencies need to be rendered accurately, since they’ll be perceptible to humans. In addition to this, the codec also decides which frequencies need to be dropped or allocated fewer bits, since we wouldn’t be able to hear them anyway. The codec takes the bit rate into consideration as it writes each frame to the bit stream. If the bit rate is low, the redundancy criteria will be measured strictly, resulting in a lower-quality output as greater number of frequencies is dropped. If the bit rate is high, the codec will be lenient, and the end result will sound better.

 

Final compression

The serial bit stream of frames is run through the process of “Huffman Coding”, which compresses the information throughout the sample. The Huffman coding does not work with the psychoacoustic model, but achieves additional compression. Thus, the entire process of MP3 encoding is a two-pass system: First, the redundant data is discarded and then the relevant data is further shrunk, resulting in an audio file that is 10 times smaller!

Advertisements
Explore posts in the same categories: Uncategorized

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: