Audio Formats for Voiceover Artists
Voiceover delivery specs are not optional — getting the format wrong wastes your client's time and delays payment. Whether you record for ACX audiobooks, commercial broadcast, corporate e-learning, or explainer videos, each platform has specific format, bitrate, and loudness requirements. This guide covers every major delivery spec you will encounter as a voiceover artist.
ACX (Amazon/Audible) Audiobook Requirements
ACX is the primary marketplace for audiobook narration and has the most precise specifications in the industry. Required format: MP3 at 192 kbps constant bitrate (CBR), 44.1 kHz sample rate, mono audio (stereo is not accepted for audiobook narration). Loudness requirements: RMS level between -23 dB and -18 dB. Peak level must not exceed -3 dBFS. Noise floor must be at least -60 dB from the average signal level. Room noise below the noise floor threshold is critical — ACX will reject files with audible HVAC, computer fan noise, or traffic bleed. Silence requirements: each audio file must have at least 0.5 seconds of room tone (silence) at the beginning and end. ACX uses automated QC tools that will flag deviations from these specs before a human reviewer even listens.
Commercial Broadcast and Agency Delivery
Broadcast television and radio spec requirements vary by network but follow general standards. For US television: 48 kHz sample rate, 24-bit depth, WAV or AIFF format, -24 LUFS integrated loudness (following the CALM Act for TV audio). For radio commercial delivery: WAV at 44.1 kHz, 16-bit or 24-bit, loudness per individual station spec (ask for their spec sheet before delivering). Advertising agencies typically request WAV files at 44.1 kHz, 24-bit, stereo, with -14 to -16 LUFS depending on the deliverable. Always confirm with the agency — some production houses have proprietary format requirements. Provide both a dry voice file (voice only, no music or SFX) and a mixed file if you have been asked to mix.
E-Learning and Corporate Narration
E-learning platforms have more flexible requirements but still reward clean, properly formatted audio. Standard corporate e-learning spec: MP3 at 128 kbps (for voice-only content) or 192 kbps (for voice with music beds), 44.1 kHz, mono or stereo depending on whether music is present. Articulate Storyline, Adobe Captivate, and Lectora Inspire all accept MP3 and WAV natively. WAV at 44.1 kHz, 16-bit is ideal if the client will be doing post-editing (adding music, sound effects) — lossless format gives them flexibility. Corporate narration for internal training often has looser specs, but 44.1 kHz WAV at 16-bit is the safe universal default. Always deliver a clean dry track and keep your raw session files until the project is fully approved.
Explainer Video and YouTube Content
Video production companies creating explainer videos, YouTube content, or social media videos typically request WAV or AIFF at 48 kHz, 24-bit — matching the video timeline's audio settings. The 48 kHz sample rate is the video production standard (television and film both use 48 kHz) and mismatching sample rates between video and audio causes pitch and sync issues in the edit. Loudness target: -16 LUFS integrated, -1 dBFS peak for YouTube delivery. For creators editing themselves, 44.1 kHz WAV at 16-bit is fine. When in doubt, ask the editor what sample rate their project is running — delivering audio at a different sample rate creates unnecessary work for them.
Recording Tips and Format Conversion
Record your voice tracks in WAV at 48 kHz, 24-bit — this is the highest-quality capture that covers any delivery requirement. From WAV, convert to MP3 for ACX or e-learning delivery. Converting from WAV to MP3 in AudioUtils preserves the full recording quality in the compressed output. Never record directly to MP3 — if you need to retake a phrase, you cannot edit the MP3 without generation loss. Always work from your WAV master. For noise floor requirements, record in the quietest room available and use a dynamic microphone (like the Shure SM7B or Electro-Voice RE20) if room acoustics are imperfect — condensers reveal room noise more. Use a high-pass filter at 80–100 Hz to remove low-frequency rumble from traffic and HVAC before any processing.