AudioUtils

How Audio Compression Works

A clear explanation of how audio compression works, from psychoacoustics to bitrate allocation. No jargon overload.

# How Audio Compression Works

Audio compression makes files smaller. Lossless compression does it without losing data. Lossy compression does it by throwing data away. Both are clever. Here's how they work under the hood.

The Problem

Uncompressed CD audio uses 1,411 kilobits per second. That's 10 MB per minute. A typical album takes 500 MB. That's a lot of data for 45 minutes of music.

Compression reduces that. A lossless codec like FLAC cuts it to about 800 kbps. A lossy codec like MP3 can cut it to 128 kbps. That's a 90% reduction. How?

Lossless Compression

Lossless compression finds patterns in the data and encodes them more efficiently. It doesn't remove anything.

Audio signals have predictable properties. The next sample usually isn't far from the current one. A lossless encoder predicts each sample, then stores only the prediction error. Since errors are small numbers, they compress well.

Think of it like this: instead of storing "100, 101, 103, 102, 104," you store "100, +1, +2, -1, +2." The differences are smaller numbers that need fewer bits.

Convert WAV to FLAC uses exactly this principle. The FLAC file is smaller, but decode it and you get the exact original audio back.

Lossy Compression: Psychoacoustics

Lossy compression uses a different strategy entirely. It models human hearing and removes sounds you can't perceive.

Auditory Masking

Your ear has limits. When a loud sound plays, quiet sounds near it become inaudible. This is called masking.

A loud snare hit masks quiet room noise at similar frequencies. The encoder detects this, removes the masked sounds, and saves the bits.

Frequency Masking

If a loud 1 kHz tone plays, you won't hear a quiet 1.1 kHz tone. The loud tone masks the quiet one. The encoder removes the quiet tone.

Temporal Masking

After a loud sound, your ear needs time to recover sensitivity. For a brief moment, quiet sounds after the loud one are inaudible too. The encoder can remove those.

The Encoding Process

Here's how a typical lossy encoder (like MP3) works:

1. Split audio into frames -- Usually 26 milliseconds each 2. Transform to frequency domain -- Convert time-domain samples to frequency-domain using MDCT 3. Apply psychoacoustic model -- Determine what's audible and what's masked 4. Allocate bits -- Give more bits to important sounds, fewer to masked ones 5. Quantize -- Round frequency values to fit the bit budget 6. Encode -- Pack everything into the output format

The psychoacoustic model is where the magic happens. A good model wastes fewer bits on inaudible content. That's why newer codecs like AAC and Vorbis outperform MP3 -- they have better models.

Quality vs Compression

More compression means more aggressive removal. At 320 kbps, the encoder has enough bits to be gentle. At 64 kbps, it makes harsh cuts. You can hear the difference.

Convert WAV to MP3 at different bitrates and listen. The tradeoff becomes obvious.

Lossy + Lossy = Bad

Converting between lossy formats compounds the losses. MP3 to OGG means two rounds of psychoacoustic modeling, each removing different data. The result sounds worse than either format alone.

Always convert from a lossless source. Convert WAV to MP3 or convert WAV to OGG -- not MP3 to OGG via our converter unless you have no lossless source.

The Takeaway

Lossless compression is math. It finds patterns and encodes them efficiently. Nothing is lost.

Lossy compression is psychoacoustics. It models your ears and removes what you can't hear. The better the model, the better the sound at the same file size.

Both approaches are remarkable engineering. Understanding them helps you make better decisions about formats and quality settings.