AudioUtils
Workflow Guide

Audio Formats for Voice Acting and Voiceover

Voice-over work is sold to a deliverable spec. Get the format wrong and the file is rejected, an audition is lost, or a session has to be re-recorded. This guide is the practical reference: recording settings that capture the cleanest source, the processing chain that gets there, the standard deliverables clients expect, and the firm rules of ACX submission.

Recording Format: 24-bit WAV at 48 kHz Mono

Industry default: WAV, 48 kHz, 24-bit, mono. The 48 kHz rate aligns with video production (every TV, film, and online video project runs at 48 kHz internally) and avoids resampling at delivery. 24-bit gives 144 dB of dynamic range — comfortable headroom even when an excited line peaks 18 dB above your average. Mono because a single voice has no stereo information to preserve, and clients want mono files that sit cleanly in stereo mixes. Some specific clients (audiobook ACX, music industry) prefer 44.1 kHz; always confirm before tracking. Never record directly to MP3 or AAC — every subsequent edit, EQ, and noise reduction operates on already-degraded audio. Save raw takes as separate WAV files per take and treat them as your masters. Reference: [audio-for-broadcasting](/guide/audio-for-broadcasting).

Noise Floor and Booth Acoustics

ACX requires noise floor below -60 dBFS; broadcast typically -65 dBFS; e-learning is more forgiving at -50 to -55 dBFS. To hit these targets at source, the recording space matters more than the microphone. Treated booth or walk-in closet packed with clothing: -55 to -65 dBFS achievable. Untreated bedroom: -45 to -55 dBFS, marginal for ACX. Open desk in office: -40 to -50 dBFS, fails ACX. Test before each session: record 30 seconds of silence with normal session conditions (HVAC running, computer powered, no talking), measure peak in Audacity (Effect > Amplify reports peak) or with the free Audacity ACX Check plugin. Reduce ambient noise: turn off HVAC during takes if possible; switch off LED dimmers (they radiate RFI), unplug switching power supplies, mute desk fans. See [what-is-audio-noise-floor](/guide/what-is-audio-noise-floor).

Processing Chain in Order

Apply effects in this specific sequence; reordering causes problems. (1) Noise reduction first — capture a noise profile from a silent passage and reduce by 6-10 dB; more removes voice character. (2) High-pass filter at 80-100 Hz removes rumble and footsteps without affecting voice. (3) De-esser at 5-8 kHz tames sibilance from condenser mics close-miked. (4) EQ for tonal shaping — typical voice cut: -2 dB at 250 Hz (mud), +1 dB at 4 kHz (presence). (5) Compressor: 3:1 to 4:1 ratio, attack 5-10 ms, release 50-100 ms, threshold giving 3-6 dB of gain reduction on average. (6) Limiter at -3 dBFS sample-peak (or -1 dBTP for streaming-bound deliveries). (7) Normalize to RMS target (-23 dBFS for ACX, -16 LUFS for podcast inserts). Export from this processed master to whichever delivery format the client requests.

Standard Client Deliverables

Common deliverables by category. Commercial radio/TV spots: WAV 48 kHz / 24-bit mono, normalized -3 dBFS peak, often two versions: dry (no music bed) and mixed. E-learning and corporate narration: WAV 48 kHz / 24-bit mono, sometimes MP3 192 kbps stereo as preview. Audiobook: MP3 192 kbps CBR mono per chapter (ACX spec; firm) or WAV per platform (Findaway, Authors Republic accept WAV). Video game: WAV 48 kHz / 24-bit mono per line, often labelled by character + line ID. IVR and phone systems: 8 kHz / 8-bit mono μ-law or A-law (uncommon, always confirm). Anime/dub: WAV 48 kHz / 24-bit mono with timecode. Podcast inserts: MP3 128-192 kbps mono. Always confirm with the project spec sheet before delivery; assumptions cost re-records.

ACX Audiobook Submission: Firm Requirements

Amazon's ACX (Audiobook Creation Exchange) has zero-tolerance technical requirements; non-compliant files are auto-rejected. Specs: MP3 format only (WAV, FLAC, AAC rejected). 192 kbps CBR (constant bitrate; VBR rejected). Mono (stereo rejected). 44.1 kHz sample rate. RMS loudness between -23 dBFS and -18 dBFS (use Audacity ACX Check or Auphonic to verify). Peak no higher than -3 dBFS (sample-peak, not true-peak). Noise floor below -60 dBFS measured during silent passages. Each chapter file 1-120 minutes long, with at least 0.5 sec of silence at head and tail and 1-5 sec at each section break. Convert your processed WAV master via [WAV to MP3](/wav-to-mp3) selecting 192 kbps CBR and mono explicitly. See [audio-for-audiobooks](/guide/audio-for-audiobooks) for full guidance.

Self-Tape vs Studio Booth

Two recording contexts dominate VO work in 2026. Self-tape from home: most auditions and many remote sessions. Standard kit: large-diaphragm condenser (Neumann TLM 103, Audio-Technica AT4040, Rode NT1) on a shock mount with a pop filter; audio interface (Focusrite Scarlett 2i2, Apogee Duet, RME Babyface); treated room or portable booth (sE Reflexion Filter, Aston Halo). Achievable noise floor: -55 to -65 dBFS. Studio booth or commercial facility: typically a dialogue session with director and engineer; you provide voice, they handle gain staging, treatment, recording. Achievable noise floor: -65 to -75 dBFS. For demos and auditions, self-tape is universal. For union work, broadcast, and high-budget productions, studio is still common. Either way, the deliverable spec is identical — the difference is just the noise floor cushion.

Audition Workflow

Auditions are time-critical. Workflow: read the breakdown carefully (note tone, pacing, character notes); record 2-3 takes of the audition copy in WAV; pick the best; apply minimum processing (noise reduction, light EQ, compression, limiter, normalize to -3 dBFS); convert to MP3 192 kbps mono via [WAV to MP3](/wav-to-mp3); rename per platform spec (Voices.com: 'YourName_AuditionTitle.mp3'; Voice123: similar; Bodalgo: similar). Most platforms accept MP3 up to ~5 MB and 60-90 seconds. Keep the WAV master in case the client requests a higher-quality version. Have a saved processing chain or template in your DAW so audition turnaround is under 15 minutes. Slow turnaround is the most common reason talent loses auditions on competitive briefs. See [audio-for-broadcasting](/guide/audio-for-broadcasting) for related broadcast specifications and [vbr-vs-cbr-mp3](/blog/vbr-vs-cbr-mp3) for encoding choices.