AudioUtils

Best Audio Format for Speech-to-Text Transcription

Choose the right audio format for transcription services and speech-to-text engines. Covers WAV, MP3, FLAC, and bitrate recommendations for maximum accuracy.

Audio format affects transcription accuracy more than most people expect. The codec, bitrate, sample rate, channel count, and noise floor all influence how well speech-to-text engines can recover words from waveform. This guide covers what every major transcription service actually wants, why, and how to prep audio for maximum accuracy.

The Headline Answer

For best accuracy: mono WAV at 16 kHz 16-bit, denoised, with -16 to -23 LUFS integrated loudness. For convenience: mono MP3 at 128 kbps is acceptable on every modern engine and saves 90% of the storage. Stereo, high sample rates, and lossless beyond 16-bit mono add cost without measurable accuracy gain on speech.

What Each Major Service Prefers

  • OpenAI Whisper (and whisper.cpp, Whisper API): WAV / MP3 / FLAC / OGG / M4A all work. Internal model resamples everything to 16 kHz mono 16-bit float. File size cap on the API: 25 MB. For long audio, chunk into 10-20 minute segments at 16 kHz mono MP3 128 kbps to stay under the cap.
  • Google Speech-to-Text v2: FLAC and LINEAR16 (raw PCM 16-bit) recommended for highest accuracy. MP3 and OGG accepted. Sample rate 8-48 kHz; documented sweet spot is 16 kHz for narrowband models, 24 kHz for the latest 'long' models. Mono required for diarization.
  • Amazon Transcribe: WAV, MP3, MP4, FLAC, AMR, OGG, WebM, M4A. 8-48 kHz. Per-job file size cap 2 GB or 4 hours, whichever is lower. Recommends FLAC for medical / legal where accuracy matters.
  • Microsoft Azure Speech: WAV PCM 16-bit mono 16 kHz is the documented optimum. MP3, OGG Opus, FLAC, ALAW also accepted via the GStreamer pipeline.
  • Rev.ai: MP3, MP4, WAV, M4A, FLAC, OGG, WMA. Accuracy is essentially identical across formats above 96 kbps; below that, lossy artifacts cost a few percent WER.
  • Otter.ai: MP3, AAC, M4A, MP4, MOV, WAV. Internal pipeline resamples to 16 kHz mono. Web upload cap typically 1.9 GB / 4 hours per file.
  • AssemblyAI: All common formats. Recommends 16 kHz+ for best results; below 8 kHz quality degrades sharply.

Why 16 kHz Is the Sweet Spot

Human speech contains intelligible information up to about 8 kHz — sibilants ('s', 'sh', 'f') sit between 4-8 kHz and consonants like 'th' depend on energy near 6 kHz. Nyquist sampling theorem says you need at least 2x the highest frequency you want to capture, so 16 kHz sample rate (capturing up to 8 kHz) covers every phoneme that matters for transcription. Going higher (44.1 or 48 kHz) captures music-band frequencies that speech engines discard immediately during preprocessing. The engine downsamples internally regardless, so giving it 16 kHz directly is slightly more efficient and produces no accuracy loss.

Going lower is dangerous. 8 kHz audio (telephone bandwidth) cuts off at 4 kHz, losing the 4-8 kHz range where fricatives live. Word error rate on 8 kHz audio is typically 2-5x higher than on 16 kHz audio for the same speaker.

Mono vs Stereo: Mono Wins

For single-speaker content, mono is unambiguously better. Stereo doubles the file size, and most engines just sum to mono internally. Some services charge by 'audio minute' regardless of channel count, but a stereo file may upload twice the bytes for the same content.

For multi-speaker content with a separate mic per speaker (interview where each lavalier records to its own track), you can preserve speaker separation by transcribing each channel as a separate mono file and merging the timestamps. This produces cleaner diarization than letting the engine guess from a stereo mix.

Lossy vs Lossless: Where It Actually Matters

For clean studio audio at 96 kbps MP3 or higher, no major engine shows measurable accuracy loss versus FLAC or WAV. Below 96 kbps, fricatives start to fall apart and accuracy drops 1-3% per step.

The accuracy gap shows up in three specific cases:

1. Heavy accent speakers or non-native English — every dB of detail matters; FLAC or WAV gives the model the cleanest signal. 2. Multiple overlapping speakers — lossy compression artifacts confuse diarization and word boundary detection. 3. Low-volume background speech — quiet voices in meeting recordings; lossy encoding tends to throw away the quietest content.

For all three cases, transcribe from FLAC or WAV. For solo voiceover, podcast interviews mic'd cleanly, or call-center recordings, MP3 128 kbps mono is fine.

File Size Limits Across Services

  • Whisper API: 25 MB / file
  • Google Speech-to-Text sync: 60 seconds; async: any length up to 480 minutes
  • Amazon Transcribe: 2 GB or 4 hours per job
  • Otter.ai: ~1.9 GB / 4 hours web upload
  • Rev.ai: 4 GB / 17 hours

For files near a service's cap, mono MP3 128 kbps gives the best size/accuracy ratio: a 2-hour recording is 110 MB, comfortably under any limit.

Denoising Before Transcription

Denoising before transcription reliably improves accuracy on noisy source material. RNNoise (used in Krisp, NVIDIA Broadcast, and FFmpeg's 'arnndn' filter), iZotope RX, or Adobe Podcast Enhance reduce HVAC hum, keyboard clatter, and reverb. The accuracy improvement is largest on conference-room recordings, smallest on close-mic'd studio voice.

Do not over-denoise. Aggressive noise reduction creates artifacts that confuse the model. A gentle pass that drops the noise floor by 6-10 dB is usually optimal.

Loudness Targets

Aim for -16 to -23 LUFS integrated loudness with peaks below -1 dBTP. Quiet audio forces the engine to amplify and noise-detect; clipped audio destroys the consonants the model relies on. Use a normalization step in your DAW or FFmpeg's 'loudnorm' filter to hit the target before sending to transcription.

Recommended Conversion Workflow

1. Record at 48 kHz 24-bit WAV in your interface. 2. Edit, denoise, and normalize to -20 LUFS in your DAW. 3. Export mono 16 kHz 16-bit WAV for highest accuracy, or mono 16 kHz 128 kbps MP3 for a 90% size reduction with no measurable accuracy loss. 4. Use MP3 to WAV, AAC to WAV, or OGG to WAV on AudioUtils to convert any source format into the transcription-friendly target.

For the underlying bitrate concepts, see audio bitrate explained. For sample rate fundamentals, see sample rate explained. For broader format choices, see audio quality settings explained.

More to Read

How to Convert Audio Files: Complete GuideHow to Reduce Audio File Size Without Losing QualityHow to Convert iPhone Voice Memo to MP3 FreeHow Audio Compression WorksBest Audio Format for WebsitesHow to Batch Convert Audio FilesHow to Extract Audio from Video FilesDoes Converting MP3 to WAV Improve Quality?How to Convert MP3 to WAV for Music ProductionHow to Convert MP3 to WAV Without Losing QualityHow to Convert MP3 to WAV on Mac and WindowsHow to Convert WAV to MP3 Without Losing QualityWAV File Too Large? Convert to MP3How to Convert iPhone Voice Memo to MP3 FreeHow to Play M4A Files on Android (Convert to MP3)How to Convert FLAC to MP3 Without Losing QualityBest Bitrate for FLAC to MP3 ConversionConvert AAC to MP3: Best Quality SettingsHow to Extract Audio from MP4 FilesConvert iPhone MOV Video to MP3How to Convert WAV to MP3 (The Complete Guide)How to Convert MOV to MP3 (iPhone & QuickTime)How to Convert MP3 to WAV for Editing and DAWsHow to Convert YouTube to MP3 Legally (3 Ways)Best MP3 to WAV Settings for Editing and DAWsBest WAV to MP3 Bitrate for Music, Podcasts, and VoiceMOV to MP3 on Mac: Fastest Ways ComparedHow to Convert M4A to MP3 on iPhone Without a ComputerHow to Convert FLAC to MP3 on MacHow to Convert FLAC to MP3 on WindowsHow to Convert OGG to MP3 on MacHow to Convert MP4 to MP3 on MacHow to Convert MP4 to MP3 on iPhoneHow to Convert MP4 to MP3 on AndroidHow to Convert WMA to MP3 on MacHow to Convert AIFF to MP3 on MacHow to Convert MOV to MP3 on WindowsM4A to WAV: How to Convert and WhyHow to Convert FLAC to OGG VorbisHow to Convert AAC to WAV for EditingHow to Convert WMA to MP3 on WindowsHow to Convert AIFF to MP3 on WindowsHow to Convert OGG to MP3 on WindowsHow to Convert FLAC to MP3 on iPhoneHow to Convert AAC to MP3 on MacHow to Convert M4A to MP3 on Mac: 3 Easy MethodsHow to Convert Audio Files with AudacityHow to Convert Audio Files with VLCFLAC to AAC: Bitrate Guide and Practical StepsOGG to AAC: Cross-Platform Audio Migration GuideWMA to OGG: Escape the Windows Media EcosystemWMA to FLAC: Lossless Archiving of Your Old WMA LibraryFLAC to Opus: Web Streaming Optimization GuideAIFF to M4A: Apple Production Workflow GuideWAV to AIFF: Windows to Mac Audio WorkflowHow to Convert AAC to MP3 on iPhoneHow to Convert FLAC to MP3 on AndroidHow to Convert OGG to MP3 on AndroidHow to Convert WAV to MP3 on iPhoneHow to Convert AIFF to MP3 on iPhoneHow to Convert M4A to MP3 on WindowsOpus to MP3: Complete Conversion GuideConvert Audio on Linux: Command Line and Browser OptionsHow to Convert Audio Without Installing SoftwareHow to Convert WMA to MP3 on Mac (Step-by-Step Guide)OGG to FLAC: What to Expect from the ConversionAAC to FLAC: Convert and What to ExpectOpus to WAV: How to Convert and Why You Might Need ToWAV to Opus: The Web Developer's Audio GuideBest Audio Format for WhatsApp Voice MessagesAudio Formats Windows Media Player Plays NativelyAudio Formats VLC Supports and Its Conversion FeaturesAudio Formats Foobar2000 SupportsAudio Formats Plex Media Server SupportsKodi Audio Format: What Works & What Needs ConversionAudio Formats for PS4 and PS5 USB PlaybackAudio Formats for Xbox USB PlaybackAudio on Nintendo Switch: Limitations and WorkaroundsHow to Play FLAC on iPhone (iOS 11 and Later)How to Play FLAC on Android NativelyWAV to FLAC: Converting Without Any Quality LossAIFF to WAV: macOS to Windows Audio WorkflowM4A to OGG: Converting Apple Audio to Open-SourceOpus Bitrate Guide: 32, 64, 96, 128, 192 kbps ExplainedReduce Audio File Size Without Losing QualityAudio Format Support on Raspberry Pi with mpd and mopidyBest Audio Format in 2025: The Definitive GuideIs yt-dlp Legal? What You Need to KnowLegal Ways to Download Music for Offline ListeningCreative Commons Music for Content Creators: Full GuideWMA to MP3: What to Expect and How to ConvertAIFF to MP3: GarageBand Exports and Quality SettingsHow to Convert Audio on Mac: GarageBand & QuickTimeHow to Convert Audio on iPhone: Files App & BrowserHow to Batch Convert Audio Files: FFmpeg & BrowserExtract Audio from MP4 Without Software (Browser Method)How to Convert iPhone Voice Memo to MP3 (Free, No App)How to Convert Zoom Recording to MP3 (M4A or MP4 Export)How to Convert Google Meet Recording to MP3How to Extract Audio from a Zoom Webinar RecordingHow to Compress Audio in Audacity: Size & DynamicsFFmpeg Compress Audio: MP3, FLAC, Opus & AAC One-LinersCompress MP3 Without Losing Quality: What's PossibleHow to Make a Ringtone From an MP3 (iPhone & Android)How to Trim an MP3 Without Losing QualityHow to Cut Audio in Audacity (2026 Step-by-Step)How to Merge Audio Files: Three Real MethodsHow to Remove Vocals From a Song (Honest 2026 Guide)How to Record Audio on Mac: 2026 GuideHow to Record Audio on Windows: 2026 GuideHow to Record Audio on iPhone: 2026 GuideHow to Edit MP3 Metadata: Tools & WorkflowsHow to Find BPM of a Song: 5 MethodsHow to Split Audio Files: 3 Methods That WorkWhat Is MP3? The Format ExplainedWhat Is WAV? Everything You Need to KnowWhat Is FLAC? The Lossless Audio FormatWhat Is OGG? The Open Container Format ExplainedWhat Is M4A? Apple's Audio Format ExplainedWhat Is AAC? Advanced Audio Coding ExplainedWhat Is AIFF? Apple's Lossless Audio FormatWhat Is WMA? Windows Media Audio ExplainedAudio Bitrate Explained: What It Means for QualityMP3 vs WAV: Which Format Should You Use?MP3 vs FLAC: Lossy vs Lossless ComparedMP3 vs AAC: Which Codec Sounds Better?MP3 vs OGG (Vorbis): The Complete ComparisonFLAC vs WAV: Lossless Formats ComparedM4A vs MP3: Which Should You Choose?Lossless vs Lossy Audio: The Complete GuideAudio Formats Explained: The Complete GuideBest Audio Format for Music ProductionBest Audio Format for PodcastsBest Audio Format for GamingBest Audio Format for Music StreamingBest Audio Format for Archiving MusicWhy WAV Files Are So Large (And What to Do About It)MP3 vs WAV for Audio Editing in a DAWWhen Should You Convert MP3 to WAV?Convert WAV to MP3 for Sharing and EmailM4A vs MP3: Which Has Better Quality and Smaller Size?What Is M4A? The iPhone Audio Format ExplainedHow to Convert MP3 to OGG for Unity Game DevelopmentOGG vs MP3 for Web Audio: Which Should You Use?WAV vs AIFF: Which Uncompressed Format?AAC vs OGG: Which Lossy Codec Wins?Opus vs MP3: The Modern Codec ShowdownM4A vs AAC: What's the Difference?What Is Opus? The Modern Audio Codec ExplainedMP3 vs WMA: Which Format Should You Choose?AAC vs FLAC: Lossy or Lossless — Which to Choose?OGG vs Opus: What's the Difference?Best Audio Format for Discord in 2026Best Audio Format for Video EditingAudio File Size Comparison: MP3, WAV, FLAC, OGG, AACOpus Audio for Web Developers: A Practical GuidePrivacy-First Audio Conversion: Why Browser-Based MattersAudacity vs AudioUtils: Which Should You Use?AIFF vs FLAC: Which Lossless Format Is Better?WMA vs MP3: Which Sounds Better?OGG vs AAC: Which Audio Codec Is Better?M4A vs OGG: Which Lossy Audio Codec to UseBest Audio Format for Zoom RecordingsBest Audio Format to Use in AudacityBest Audio Format for Voice RecordingWhat Is Vorbis? The Open Audio Codec ExplainedWhat Is ALAC? Apple Lossless Audio ExplainedGarageBand Audio Formats: What to Use and WhyiTunes and Apple Music Audio Formats ExplainedAudio Sample Rates: 44.1, 48, 96 kHz ExplainedWhat Is HLS Audio? HTTP Live Streaming ExplainedAIFF vs. AIF: What Is the Difference?Best Audio Format for iMovie: Import and Export GuideAdobe Premiere Pro Audio Format GuideLogic Pro Audio Guide: Best Import & Export SettingsOBS Studio Audio Format and Settings GuideTwitch Audio Requirements: Format, Bitrate & QualitySpotify Audio Format: What You Need to KnowYouTube Audio Requirements: Quality, Format & LUFSTikTok Audio Requirements: Format, Bitrate, and QualityAndroid Audio Formats: Native Support and Best PracticesiPhone Audio Formats: What iOS Supports & Doesn'tBest Audio Format for Ringtones: iPhone and AndroidBest Audio Format for Car USB: MP3, FLAC, or WAV?MP3 Bitrate Guide: 128 to 320 kbps ExplainedFLAC vs Opus: When to Use Each Audio CodecWAV vs MP3: The Honest Quality ComparisonAAC vs. MP3 for Streaming: Which Is Better?Best Audio Format for AudiobooksFFmpeg vs. AudioUtils: When to Use EachAudio Formats for Podcast Apps: Spotify, Apple, and MoreAudio Bitrate vs. Sample Rate: What's the Difference?Audio Transcoding vs. Converting: What Is the Difference?OGG vs FLAC: Which Should You Use?Opus vs AAC: Which Codec Is Better?WAV vs FLAC for Archiving: Which Is Best?M4A vs FLAC: Apple AAC vs Lossless Quality ComparedMP3 vs AAC for AirPods: Does the Codec Matter?Audio Normalization: Peak vs Loudness — When to Use EachMP3 vs. WAV for Podcasting: Which Format to UseBest Audio Format for Discord: Opus, MP3, and File LimitsBest Audio Format for TikTok: Specs and Upload TipsBest Audio Format for Instagram Reels and StoriesAudio Sample Rate Explained: 44.1 vs 48 vs 96kHzFLAC vs. ALAC: Lossless Audio Format ComparisonWhat Is VBR vs CBR? Bit Allocation in Audio EncodingAudio File Too Large? How to Reduce Audio File SizeAudio Formats for Zoom: Recordings, Uploads, and SharingContainer vs Codec: The Most Confusing Thing in AudioPCM Audio Explained: Why WAV Files Are So LargeVBR vs CBR for MP3: When Each Mode Is the Right ChoiceMP3 128 kbps vs 320 kbps: Does the Difference Matter?FLAC vs WAV for Music Production: The Practical AnswerM4A vs MP3 for iPhone: Which Format to Use and WhenOGG Vorbis vs MP3: Quality, Compatibility & When OGG WinsBest Audio Format for YouTube Uploads in 2026Best Audio Format for Audacity: Import, Edit, and ExportBest Audio Format for Premiere Pro: Timelines & ExportAudio Bitrate Guide: Right Settings for Every Use CaseWhy Is My Audio File So Large? How to Reduce ItLossless Audio: Is It Worth It? The Honest AnswerMP3 File Corrupted: How to Diagnose and Fix ItAudio Format for Spotify: Upload Specs & What HappensBest Free Audio Converter: Browser-Based vs DesktopAudio Compression Explained: File Size vs Dynamic RangeID3 Tags Explained: MP3 Metadata Standard