How to Extract Audio from Video Files
Extract audio tracks from video files and save as MP3, WAV, or other formats. Step-by-step guide for any video format.
Every video file already contains the audio you want as a separate stream — extracting it is a question of pulling that stream out, not re-recording or re-mixing anything. The right method depends on the source container, what you plan to do with the audio next, and whether you can afford an extra round of lossy re-encoding. This guide covers every scenario, every common source format, and the exact decisions that determine the quality of your output.
Why People Extract Audio From Video
The use cases are broader than most people realize:
- Lectures and webinars recorded as MP4 or MOV that students want to re-listen to during commutes, at 1.5x or 2x speed in a podcast app.
- YouTube tutorials saved as offline video that work better as audio in a car or on a run.
- Conference talks and interviews filmed but consumed audio-first.
- Voice memos recorded on iPhone in QuickTime that arrive as .mov files but are functionally voice notes.
- Podcasts recorded on camera (most modern podcasts have a video version) where the host needs to publish an audio-only feed alongside YouTube.
- Music videos and live performances where the official audio release is unavailable.
- Screen recordings from OBS, ScreenFlow, or QuickTime where only the narration matters.
- Transcription prep — every transcription service (Otter, Rev, Whisper, Descript) accepts audio more efficiently than video and many charge by file size.
- Sampling and music production where producers pull short audio segments from films, interviews, or live recordings as source material for tracks. Once extracted, cut the audio down to the exact clip you need before importing into your DAW.
In each case, the underlying operation is the same: separate the audio elementary stream from the video container.
What Is Actually Inside a Video File
A video file is a container — a wrapper format that holds one or more video streams, one or more audio streams, optional subtitle tracks, chapter markers, and metadata. The container and the codecs inside it are independent decisions:
- MP4 (.mp4) — almost always carries AAC audio. Sometimes MP3, occasionally AC-3 for theatrical content.
- MOV (.mov) — Apple's QuickTime container. Typically AAC for iPhone recordings; can carry PCM (uncompressed), ALAC, or AC-3 for professional workflows.
- MKV (.mkv) — Matroska. The flexible one. Audio can be AAC, AC-3, DTS, FLAC, Vorbis, Opus, PCM, or TrueHD. MKV files of films often carry multiple audio tracks (English, foreign dubs, commentary).
- WebM (.webm) — Google's web container. Audio is Vorbis or Opus, never AAC.
- AVI (.avi) — legacy Microsoft container. Usually MP3 or PCM audio.
- MTS / M2TS (.mts, .m2ts) — AVCHD, the format consumer camcorders and Blu-ray use. Audio is AC-3 or LPCM.
- FLV (.flv) — Flash Video. Almost always AAC or MP3 audio. Rare in 2026 but still surfaces in old archives.
When you 'extract audio,' you are demuxing — pulling the audio stream out of the container. Whether re-encoding happens depends entirely on whether your target format matches the source codec.
Three Approaches: Browser, Desktop, Command-Line
There are three categories of tools, with very different trade-offs.
Browser-based (recommended for most users)
Tools like AudioUtils run FFmpeg compiled to WebAssembly directly in your browser tab. The video file is loaded into browser memory, processed locally, and the audio is downloaded back to your device. The file never touches a server. This matters for unreleased lectures, internal company recordings, interviews under NDA, or any video you would not upload to a random web service.
Workflow: drag the video into a converter like /mp4-to-mp3, /mov-to-mp3, /mp4-to-wav, or /mov-to-wav. Pick a bitrate. Download the result.
Strengths: no install, no upload, works on iPhone/Android/Chromebook/locked-down work computers, identical workflow on every OS. Limits: practical file size around 1-2 GB before browser memory pressure becomes an issue; not a great fit for batch processing 200+ files.
Desktop applications
- HandBrake — primarily a video transcoder but can output audio-only. Overkill for a single extraction; useful when you also want to re-encode the video.
- VLC — Media > Convert/Save will demux any container VLC plays. Free, every OS, but its conversion UI is awkward.
- Audacity — opens MP4/MOV via its FFmpeg plugin. Best when you also want to edit, denoise, or normalize the audio.
- Permute (macOS) — drag-and-drop, $10. Solid pick for Mac users who do this regularly.
Command-line: FFmpeg
The professional answer. FFmpeg is the engine behind virtually every other tool in this list (including the browser-based ones). It is free, scriptable, and handles every codec and container in existence.
Step-by-Step: Extracting Audio With AudioUtils
For a typical YouTube-downloaded MP4 or iPhone MOV file:
1. Open /mp4-to-mp3 for MP4 or /mov-to-mp3 for MOV in your browser. 2. Drag the video file into the converter. Files up to about 500 MB load smoothly on most computers; larger files may need a few seconds of patience. 3. Choose a bitrate. 192 kbps is the sweet spot for most extracted speech and music. 128 kbps is fine for voice-only content. 320 kbps for music you want to keep at the highest practical MP3 quality. 4. Click Convert. Processing takes 5-30 seconds depending on the source length and your CPU. 5. Download the .mp3.
For lossless intermediate output — for example, if you plan to load the audio into a DAW for editing — use /mp4-to-wav or /mov-to-wav instead. The output WAV is decoded PCM, ready for any editing software.
For Apple-ecosystem use where you want to keep the original AAC quality without an MP3 transcode, use /mp4-to-m4a. When the source is MP4 with AAC inside, this is a stream copy — no re-encoding happens, and the output is byte-identical AAC in an M4A container.
Output Format Decision Tree
Once you have the audio out of the video, the container choice depends on what you are going to do with it:
- MP3 — sharing, sending to friends, uploading to a podcast host, listening on any device. Universal compatibility, manageable file size, lossy.
- WAV — editing in a DAW, transcription with services that prefer PCM, archival for short clips, sampling. Uncompressed, large.
- M4A (AAC) — Apple ecosystem, AirPods, anywhere you want better quality per byte than MP3. Especially efficient when the source video already has AAC audio (stream copy avoids transcoding entirely).
- FLAC — long-term archival of the extracted audio. Lossless compression, half the size of WAV.
- OGG / Opus — web playback, Discord, game development. Most efficient lossy codec available; not universally supported on legacy hardware.
Quality Preservation: When Extraction Is Lossless
This is the most important point in the article and the one most tutorials get wrong.
Extracting AAC audio from an MP4 file to an M4A file is lossless. No re-encoding happens. The audio bytes are copied from the MP4 container into an M4A container with their codec parameters preserved. The output is bit-identical to the audio that was inside the source video.
Extracting that same AAC to MP3 is lossy. The audio is decoded from AAC to PCM, then re-encoded to MP3. You incur the artifacts of two lossy codecs in series — once when the original video was created, again when you convert to MP3. At 192 kbps target MP3 from a 128 kbps AAC source, the result is audibly OK for casual listening but noticeably worse than the AAC original.
The rule: match codecs when possible. If the source has AAC, output AAC (M4A). If the source has Vorbis, output OGG. Only transcode when you need a specific format the recipient or platform requires.
FFmpeg One-Liners (Every Flag Explained)
For users comfortable on the command line, FFmpeg gives you exact control.
Stream copy AAC from MP4 (lossless, fastest): 'ffmpeg -i input.mp4 -vn -c:a copy output.m4a'. The flags: '-i input.mp4' specifies the input. '-vn' means 'no video' — drop the video stream. '-c:a copy' means 'audio codec: copy' — copy the audio bytes without re-encoding. Output is an M4A with the original AAC bytes. Process time: a few seconds for a 1-hour file.
Transcode to MP3 at 192 kbps (lossy but universal): 'ffmpeg -i input.mp4 -vn -c:a libmp3lame -b:a 192k output.mp3'. '-c:a libmp3lame' uses LAME, the gold-standard MP3 encoder. '-b:a 192k' sets bitrate to 192 kbps CBR. Use '-q:a 2' instead of '-b:a' for VBR (better quality at similar average rate).
Extract to WAV (lossless PCM): 'ffmpeg -i input.mp4 -vn -c:a pcm_s16le output.wav'. 'pcm_s16le' is 16-bit signed little-endian PCM, the standard CD-quality WAV format. Use 'pcm_s24le' for 24-bit if your source has higher bit depth.
Pick a specific audio track from a multi-track MKV: 'ffmpeg -i input.mkv -map 0:a:0 -c:a copy output.m4a'. '-map 0:a:0' means 'from input 0, take audio stream index 0.' Use '0:a:1' for the second audio track (commentary, foreign dub). Run 'ffprobe input.mkv' first to see what tracks exist.
Batch Conversion
For more than a handful of files, scripting beats clicking. A simple shell loop processes every .mp4 in a folder to a same-named .mp3 with the LAME encoder at 192 kbps. Replace the codec or bitrate as needed. For batch jobs above a hundred files, FFmpeg is dramatically faster than browser tools because it can process files in parallel and skip the browser-memory ceiling.
Multiple Audio Tracks
Films and concerts on MKV often carry several audio streams: original language, dubs, director's commentary, isolated music. Browser tools typically grab the default (track 0). To pick a specific track:
- VLC: Audio menu > Audio Track > pick before exporting.
- FFmpeg: '-map 0:a:N' where N is the track index. Use 'ffprobe' to list tracks.
- HandBrake: Audio tab > Track dropdown shows all available streams.
Preserving Timestamps for Sync Workflows
If you are extracting audio for transcription that you will sync back to the video later, frame-accurate timing matters. Stream copy ('-c:a copy') preserves the original sample-accurate timing. Transcoding to MP3 introduces a small encoder delay (LAME adds about 576 samples / 13 ms of leading silence by default) that can offset transcripts. For transcription that must sync precisely, extract to WAV — PCM has no encoder delay.
File Size Expectations
Rough numbers for a one-hour source:
- Original MP4 video (1080p, 5 Mbps video + 128 kbps audio): about 2.3 GB total
- Extracted M4A (AAC stream copy): about 56 MB
- Transcoded MP3 at 192 kbps stereo: about 86 MB
- Transcoded MP3 at 128 kbps mono: about 56 MB
- Extracted WAV (16-bit / 44.1 kHz stereo): about 605 MB
- Extracted FLAC from the WAV: about 300 MB
The audio is always a tiny fraction of the original video file. Even uncompressed PCM audio is roughly 25% of a typical 1080p MP4.
Edge Cases
- DRM-protected video (iTunes movie rentals, Netflix downloads, Disney+ offline files) cannot be extracted. The audio stream is encrypted at the container level. This is a hardware-enforced limit, not a tooling limit.
- Corrupted or partially-downloaded videos sometimes have a recoverable audio stream even when the video stream is broken. FFmpeg's '-err_detect ignore_err' flag forces extraction past errors.
- Variable frame rate (VFR) screen recordings from OBS or game capture can have audio sync drift. Extract to WAV first to bypass any container-level remapping.
- Live-streamed videos saved as .ts (transport stream) or fragmented MP4 may have multiple audio segments concatenated; FFmpeg handles them correctly but some browser tools may stop at the first segment break.
Mobile Workflows
iPhone: open the video in the Files app, share to your browser, navigate to /mov-to-mp3 or /mp4-to-mp3, drop the file in, and download the audio back to Files. The whole operation works without installing anything because Safari supports WebAssembly and the Files app handles input and output.
Android: identical pattern with Chrome and the Files app or any file manager. Android also has native FFmpeg apps (FFmpeg Media Encoder, Termux + ffmpeg) for users who want command-line control.
For further format decisions, see the lossless vs lossy guide, and the audio bitrate explainer covers what bitrate to pick for the extracted file. If your source is specifically an MP4 from YouTube or another website, the extract audio from MP4 guide walks through the no-software workflow specifically. For background on why MP4 typically holds AAC audio, see what is AAC.