AudioUtils

PCM Audio Explained: Why WAV Files Are So Large

PCM (Pulse Code Modulation) is raw digital audio — no compression, every sample stored exactly. Learn what bit depth and sample rate mean and why DAWs prefer PCM.

What PCM Audio Actually Is

PCM stands for Pulse-Code Modulation. It is the method by which a continuous analog sound wave is turned into a stream of numbers a computer can store, and it is the foundation of essentially all digital audio you have ever heard. CDs are PCM. WAV files are PCM. The audio your DAW processes internally is PCM. Bluetooth, when it is not using a separate codec like aptX or LDAC, ultimately decodes back to PCM. The microphone in your laptop converts analog voltage into PCM samples before the operating system ever touches the data.

PCM is not a compression scheme and not a file format. It is a representation. The samples produced by PCM can be wrapped in any container — WAV, AIFF, BWF, Matroska, MP4 — or written to a CD as raw audio with no container at all. Whenever you read about "uncompressed audio," "raw audio," or "lossless PCM," you are reading about the same underlying thing.

How Sampling Turns Sound Into Numbers

Sound is a pressure wave. A microphone converts that pressure wave into a continuously varying voltage. A digital audio system has to turn that continuous voltage into a finite list of numbers, and PCM does this with two simple operations performed many thousands of times per second:

1. Measure the voltage at a precise instant. This is called sampling. 2. Round that voltage to the nearest available numerical value. This is called quantization.

That is the entire idea. Repeat it 44,100 times every second and you have a CD. The two parameters you can vary are how often you sample and how many distinct values you can round to.

Sample Rate

Sample rate is the number of samples taken per second, expressed in hertz (Hz) or kilohertz (kHz). The Nyquist-Shannon sampling theorem proves that to faithfully capture a signal containing frequencies up to F hertz, you need to sample at least 2F times per second. Human hearing tops out around 20 kHz, so a sample rate of at least 40 kHz is required to capture everything the ear can hear.

Common sample rates and where you encounter them:

  • 8 kHz — landline telephony and old voicemail. Captures up to 4 kHz, fine for intelligible speech and useless for music.
  • 16 kHz — wideband VoIP (Microsoft Teams, Zoom in HD mode, modern smartphone codecs).
  • 22.05 kHz — half of CD rate. Used in old multimedia and some game audio engines.
  • 32 kHz — broadcast radio and miniDV camcorders.
  • 44.1 kHz — Audio CD, the de facto standard for consumer music files.
  • 48 kHz — the universal standard for video and broadcast. YouTube, every major NLE (Premiere, Resolve, Final Cut), every camera, and every video game uses 48 kHz internally.
  • 88.2 / 96 kHz — high-resolution music and professional recording. Useful headroom for pitch shifting and time stretching.
  • 176.4 / 192 kHz — high-end mastering and audiophile distribution. Most engineers consider anything above 96 kHz to be a marketing exercise rather than an audible improvement.

The 44.1 kHz figure is not arbitrary. It was chosen in the late 1970s because early digital recorders were built on top of video tape transports, and 44.1 kHz fits cleanly into both NTSC and PAL video frame rates.

Bit Depth

Bit depth is the number of bits used to represent each sample. It controls how many distinct amplitude values you can quantize to, and therefore how small a level change you can represent before rounding kicks in.

  • 8-bit — 256 possible values per sample. Audible quantization noise. Used historically in early computers and still seen in retro game audio.
  • 16-bit — 65,536 values per sample. Roughly 96 dB of dynamic range. The CD standard, and inaudibly clean for any final listening scenario.
  • 24-bit — 16,777,216 values per sample. Roughly 144 dB of dynamic range. The recording and mixing standard. The extra headroom is not for listening — it is for the dozens of times the signal will be scaled, summed and processed inside a mix.
  • 32-bit float — uses IEEE 754 floating-point numbers instead of integers. The dynamic range is effectively unlimited and clipping inside the mix bus becomes mathematically reversible. Used internally by every modern DAW and increasingly as a recording format on devices like the Zoom F3 and Sound Devices MixPre.

A common misconception is that higher bit depth makes recordings sound "more detailed." It does not — it lowers the noise floor. For final listening, 16-bit is transparent. For a recording you intend to process, 24-bit (or 32-bit float) protects you from the cumulative quantization error of a long signal chain.

How PCM Differs From MP3, AAC and Other Lossy Codecs

PCM stores every sample. MP3, AAC, Opus and Vorbis throw most of them away.

A lossy codec uses a psychoacoustic model — a body of research about which sounds the human ear cannot perceive when other sounds are present — to discard whatever the model predicts you will not miss. A 320 kbps MP3 is about a quarter of the size of the equivalent CD-quality PCM, and a 96 kbps Opus stream is about a sixteenth. The audio that comes out of the decoder is not the audio that went in. It is an approximation that fits within the bit budget you specified.

This matters in three concrete ways for anyone working with audio:

1. Compounding loss. Each time you re-encode a lossy file you lose more information. PCM has nothing to lose — copy it a thousand times and the thousandth copy is bit-identical to the first. 2. Editing precision. Lossy codecs work on frames (typically 20-50 ms blocks). Sample-accurate edits require decoding to PCM, editing, and re-encoding, which is itself a generation loss. 3. Effects processing. Every plugin and every DAW operates on PCM internally. An MP3 dropped onto a track is decoded to PCM the moment it loads.

If you want to read more about the difference, see lossless vs lossy audio and the WAV format explainer.

Why WAV Files Are So Large: The File Size Math

The math behind PCM file size is exact. Multiply sample rate, bit depth, channels and duration:

bytes = sample_rate × (bit_depth / 8) × channels × seconds

A four-minute stereo song at CD quality (44.1 kHz, 16-bit, 2 channels):

  • 44,100 × 2 × 2 × 240 = 42,336,000 bytes ≈ 40.4 MB

Bumping to 48 kHz / 24-bit, the same four minutes becomes 66 MB. At 96 kHz / 24-bit it is 132 MB. A one-hour 96 kHz / 24-bit stereo recording is just under 2 GB. Multitrack sessions multiply that by the number of tracks: a 32-track album recording at 96/24 will eat through 60 GB of disk for raw audio alone before any takes, comps or alternates are stored.

By comparison, the same four-minute song as a 320 kbps MP3 is about 9.6 MB and as a typical FLAC is about 22 MB. MP3 is throwing away about 77% of the bits. FLAC is losslessly removing redundancy in the PCM stream — every FLAC decoder produces bit-identical PCM output, no matter who encoded the file or which compression level was used. Read more in what is FLAC.

WAV, AIFF, BWF and the Container Question

PCM samples have to live inside something. The file you actually open is the container — a wrapper that adds a header describing how to interpret the bytes that follow.

  • WAV — Microsoft and IBM, 1991. Built on the RIFF chunk format. Universal compatibility. Native 4 GB file size limit (RF64 extends this).
  • AIFF — Apple, 1988. Built on Electronic Arts' IFF format. Functionally identical PCM data; different chunk structure. Slightly better metadata in the original spec.
  • BWF (Broadcast WAV) — EBU, 1997. WAV plus a bext chunk that stores timecode, scene/take, originator and other production metadata. The standard delivery format for film and TV post-production.
  • CAF — Apple Core Audio Format. Used by Logic and some iOS apps. No 4 GB limit.
  • W64 (Sony Wave64) — extends WAV past the 4 GB barrier for very long recordings.
  • Raw PCM — no header at all. The receiver must already know the sample rate, bit depth and channel count.

When you convert MP3 to WAV, FLAC to WAV, or WAV to MP3, the WAV side of the operation is always the PCM side. Decoders produce PCM, encoders consume PCM.

PCM Variants: LPCM, DPCM, ADPCM, μ-law and A-law

When someone says "PCM," they almost always mean Linear PCM (LPCM) — uniform quantization steps, every sample takes the same number of bits, samples are stored end to end. This is what is in WAV, AIFF and CDs.

But the umbrella covers a few variations worth knowing about:

  • DPCM (Differential PCM) stores the difference between successive samples instead of the absolute values. Because consecutive samples in real audio are usually similar, the differences fit in fewer bits.
  • ADPCM (Adaptive DPCM) extends DPCM by adapting the step size dynamically based on signal content. Used in older game audio, voice memos on some legacy devices, and the IMA ADPCM and Microsoft ADPCM variants you sometimes see inside WAV files.
  • μ-law (mu-law) is a non-linear 8-bit encoding used in North American and Japanese telephony. It allocates more resolution to quiet signals where the ear is more sensitive, giving the perceived dynamic range of roughly 14 linear bits in 8 bits per sample.
  • A-law is the European telephony equivalent, with a slightly different companding curve.

If a WAV file refuses to import into a DAW with a "format not supported" error, it is usually because the WAV contains μ-law, A-law or ADPCM rather than plain LPCM. Decoding it to a standard 16-bit LPCM WAV with our WAV converter or ffmpeg fixes the issue.

Why DAWs Decode Everything to PCM Before Processing

Open Logic, Pro Tools, Ableton, Reaper, Cubase or any other DAW, drag an MP3 onto a track, and the file gets decoded to PCM the moment the track is created. Some DAWs do this lazily and stream-decode on playback; others bake out a PCM rendered file in the project folder. Either way, no plugin is ever fed MP3 frames.

There are four practical reasons for this:

1. Plugin signal flow assumes PCM. Every VST, AU and AAX plugin operates on a buffer of float samples. There is no MP3-aware EQ. 2. Sample-accurate timing. PCM lets you cut, fade and quantize to a specific sample. MP3 frames are 1,152 samples long and you cannot edit inside one. 3. Mathematical reversibility. Many operations (gain, panning, summing) are loss-free in float PCM and lossy in any compressed format. 4. Predictable CPU. PCM playback is essentially memory-bound. Decoding 80 MP3 tracks in parallel would dwarf the cost of the actual mixing.

This is why "best practice" advice for music production and Audacity is always: keep your working files in WAV (or FLAC if disk space is tight), and only export to MP3, AAC or Opus at the very end of the chain.

When PCM Is the Right Choice — And When It Is Not

Use PCM when:

  • Recording. Capture in 24-bit at the project's sample rate (48 kHz for video, 44.1 or 48 kHz for music).
  • Mixing and mastering. Every working file should be PCM. Bouncing between FLAC and PCM is fine; bouncing through MP3 is not.
  • Delivery to clients, broadcasters, mastering engineers and aggregators. They expect WAV or BWF.
  • Archiving project sessions. PCM in a 24-bit WAV will still open in 50 years.

Use a compressed format when:

  • Distributing to listeners. Convert your PCM master to AAC, MP3 or Opus. A typical 256 kbps AAC distributed via Spotify or Apple Music is indistinguishable from PCM for the overwhelming majority of listeners.
  • Embedding in video for the web. Use AAC inside MP4 or Opus inside WebM.
  • Storing voice memos and meeting recordings, where size matters far more than fidelity.

PCM is the correct internal representation. It is rarely the correct distribution format.