Audio Glossary

What Is an Audio Channel?

An audio channel is one independent stream of audio samples within a file. Mono carries one; stereo carries two; surround carries five, seven, or more; immersive formats add height channels. The channel count determines file size, playback compatibility, and what creative use the file supports. This page is the reference card.

Mono: One Channel

Mono is a single audio stream — one set of samples, played identically on every speaker the playback system has. All sound comes from one virtual location. Where mono is the right choice: phone calls and VoIP (telephone bandwidth assumes mono), AM radio, podcasts and audiobooks (single voice has no stereo information to preserve), public address systems, voice memos, IVR systems, ringtones, sound effects in game engines (the engine spatializes per-instance). File size advantage: mono is exactly half the file size of stereo at the same bitrate. A 192 kbps stereo MP3 becomes a 96 kbps mono MP3 with no perceptual loss for voice content; alternatively, run mono at 128 kbps for genuine quality improvement at a smaller file. Don't think of mono as 'lower quality' — it is the correct format for a class of content.

Stereo: Two Channels (Left and Right)

Stereo has two independent channels — left and right — usually intended for two-speaker reproduction. Stereo is the universal music format: every track on Spotify, Apple Music, YouTube Music is stereo. The two channels carry different content: instruments panned across the field, ambience captured by spaced microphones, lead vocals centered, room reverb spread wide. Stereo recording techniques include XY (two cardioids angled), ORTF (cardioids spaced and angled), mid-side (mid mic plus side figure-eight, decoded to L/R), and spaced pair (two omnis). All produce stereo files when decoded. File size: exactly 2x mono at the same bitrate. Stereo through headphones creates the spatial 'image' that mono cannot. For voice with music (podcasts with intro music, talk shows): record voice mono, music stereo, mix to stereo for delivery. See [audio-for-podcasters](/guide/audio-for-podcasters).

5.1 and 7.1 Surround

Surround sound uses a discrete-channel layout for theatrical and home cinema reproduction. 5.1: front-left, front-centre, front-right, surround-left, surround-right, plus the .1 LFE (low-frequency effects, the subwoofer channel). 7.1: adds two rear-surround speakers behind the listener. The Dolby Digital (AC-3) and DTS codecs are the dominant surround formats; both support 5.1 and 7.1 at typical bitrates of 384-640 kbps. Channel mapping must be declared correctly in the file metadata — surround files with mismapped channels play correctly to mono speakers but route surround information to the wrong location on a 5.1 system. Use cases: film soundtracks, broadcast TV (sports especially), some video games, immersive concerts. Music in surround (DTS-HD Master Audio, Dolby TrueHD) exists but is niche; most listeners cannot deploy 5.1 reproduction at home.

Atmos and Immersive Audio

Dolby Atmos and DTS:X are object-based immersive formats that add height channels above the listener. Atmos in cinema uses up to 64 speakers; consumer Atmos systems (Atmos for headphones via Apple AirPods Pro, Atmos soundbars, dedicated home theatre installs) deliver 5.1.4 to 7.1.4 layouts (the .4 indicating four height channels). Apple Music's Atmos catalog and Tidal's HiFi Plus tier deliver Atmos masters as MP4 with E-AC-3 JOC (Joint Object Coding) at ~768 kbps. Music recording in Atmos involves placing instruments as 3D objects rather than panning across stereo; the renderer maps objects to whatever speaker layout the listener has. Production for Atmos requires a Dolby Atmos Renderer plugin in the DAW (Pro Tools, Logic Pro 10.7+, Nuendo). Atmos files cannot be losslessly downmixed to stereo — the mix must be reauthored.

Joint Stereo and Channel Coupling

MP3, AAC, and Opus all support joint stereo encoding modes. Instead of independently encoding left and right channels, joint stereo encodes the sum (mid) and the difference (side) — when L and R are similar (as in mono-recorded vocals on a stereo bus, or naturally panned music), the side signal is small and compresses to far fewer bits than encoding L and R separately. The decoder reconstructs L and R perfectly from M and S. Result: the same perceived quality at a lower bitrate, or higher quality at the same bitrate. MP3's joint stereo is intensity stereo (a coarser variant) below 128 kbps and M/S above. AAC uses M/S as a per-frame switchable mode. Modern encoders default to joint stereo for music; binaural and audiophile material may benefit from forced 'simple stereo' to preserve precise L/R encoding. See [vbr-vs-cbr-mp3](/blog/vbr-vs-cbr-mp3) for related encoding decisions.

Channel Mapping in File Metadata

Multichannel files declare a channel layout in the container header. WAV uses the WAVE_FORMAT_EXTENSIBLE structure with a channel mask; common masks are 0x3 (stereo), 0x33 (4.0), 0x3F (5.1), 0x63F (7.1). M4A/MP4 uses the 'chan' atom; FLAC uses Vorbis-style channel mapping. Wrong mapping = audio plays through wrong speakers — common in DIY surround mixes where someone produced a 6-channel WAV without setting the mask correctly. Verify with 'ffprobe input.wav' which prints the channel layout in plain English. Some DAWs (Pro Tools, Nuendo) handle layouts strictly; others (Reaper, Audacity) require explicit configuration. For consumer delivery, stereo and 5.1 are the safe layouts; anything else risks playback issues on at least some destinations.

Choosing Channel Count by Use Case

Quick reference. Podcast (voice): mono — 64-128 kbps. Podcast with music bed: stereo — 128-192 kbps. Audiobook for ACX: mono — 192 kbps CBR (firm spec, see [audio-for-audiobooks](/guide/audio-for-audiobooks)). Music: stereo — 192-320 kbps lossy or [FLAC](/wav-to-flac) lossless. Voice-over delivery: mono WAV unless client specifies stereo. Game SFX: mono WAV (engine spatialises). Game music: stereo WAV or FLAC. Film dialogue: mono WAV per character per take. Film music score: stereo or 5.1. Atmos releases: object-based, no fixed channel count. Live concert recording: stereo (XY or ORTF) for documentary, multichannel multitrack (each mic to its own track) for studio mixdown later. Voice memos: mono. The default rule: use the smallest channel count that supports your content; stereo for music, mono for voice. Convert appropriately via [WAV to MP3](/wav-to-mp3) or other tools.