WAV vs FLAC for Archiving: Which Is Best?
Compare WAV and FLAC for long-term audio archiving. Covers losslessness, file size, metadata, compatibility, and the right choice for your archive.
Archiving audio is a different problem from working on it. The mix engineer's choice (WAV) and the streaming listener's choice (AAC) both fall away once the question becomes: what format should this audio still play in 30 years, on hardware that does not yet exist, with the metadata intact, and verifiable as un-corrupted? This guide covers WAV and FLAC strictly through the archival lens — bit integrity, file size at library scale, metadata durability, format longevity, and what institutional archivists actually use.
If you came here from FLAC vs WAV for music production, this is the companion post. The production answer is "WAV, with FLAC for backup." The archival answer is more clearly weighted toward FLAC, and the reasons are different.
What Archival Actually Requires
A real archive — personal or institutional — has four hard requirements:
1. Bit-perfect lossless storage. The audio that comes out has to be byte-identical to the audio that went in. Any lossy codec is disqualified on its face. 2. Verifiable integrity over time. Disks corrupt. Backups silently rot. The archive needs a way to detect that a file has been damaged before you discover it on the day you need to restore. 3. Durable metadata. Title, artist, date, project, performer, recording details, rights — all of it has to travel with the audio file. A file that is technically intact but missing its context is half-archived. 4. Format longevity. The format has to be readable in 20-30 years. Proprietary or patent-encumbered formats are riskier than open ones.
WAV and FLAC both pass requirement 1. They differ on the other three, and that is what makes the archival choice interesting.
File Integrity: FLAC Has Built-in Checksums
This is the single biggest archival advantage FLAC has and the part most casual comparisons skip.
Every FLAC file contains an MD5 hash of the decoded PCM stream stored inside the STREAMINFO metadata block. When you decode a FLAC, the reference decoder can verify that hash against the actual decoded samples and tell you whether the file has been corrupted at any point since it was encoded. The 'flac --test' command runs that verification across an entire archive in a single pass:
'flac --test *.flac'
If a single bit on disk has flipped, FLAC will tell you. WAV has no equivalent. A WAV file with corrupted samples plays back as glitchy audio, but nothing in the file format itself signals that the corruption occurred. You can layer external integrity checking on top (par2 files, BagIt manifests, ZFS or btrfs checksumming), but you have to set that up yourself.
For a personal archive on consumer storage where silent corruption is a real long-term risk, FLAC's intrinsic checksum is a serious advantage.
File Size at Library Scale
Per file the saving is "interesting." Across an archive it becomes decisive.
A typical FLAC compresses to 50-60% of the equivalent WAV. Some example library sizes:
- 1,000-track personal music archive at 16-bit / 44.1 kHz, average track 4 minutes: WAV ~42 GB, FLAC ~22-25 GB.
- 10,000 tracks (a serious personal collection) at the same settings: WAV ~420 GB, FLAC ~220-250 GB.
- A field recording archive of 1,000 hours at 24-bit / 48 kHz mono: WAV ~518 GB, FLAC ~260-310 GB.
- A radio station archive of 10,000 hours at 16-bit / 44.1 kHz stereo: WAV ~6.3 TB, FLAC ~3.2-3.8 TB.
The FLAC half saves money on disks, on cloud storage, on backup time, and on the 3-2-1 backup rule (3 copies, 2 different media, 1 offsite). For the same archive, FLAC means half as many disks to buy, replace, transport offsite, and verify. If you have to keep PCM but want to trim bulk on individual sessions, you can also compress a WAV file by lowering the bit depth or sample rate before archiving.
Metadata: Vorbis Comments vs the WAV Tag Mess
Archival metadata has to be reliable. You should be able to write a tag once and trust that every tool that touches the file thereafter will preserve it.
FLAC uses Vorbis Comments. It is a clean, flexible key/value tag system. Every FLAC-aware tool (foobar2000, MusicBee, Mp3tag, Picard, beets, kid3, Roon, Plex, Jellyfin) reads and writes Vorbis Comments correctly. You can store:
- Standard fields: TITLE, ARTIST, ALBUM, ALBUMARTIST, DATE, GENRE, TRACKNUMBER, DISCNUMBER, COMPOSER, PERFORMER.
- Identifiers: ISRC, MUSICBRAINZ_TRACKID, ACOUSTID_FINGERPRINT.
- Loudness: REPLAYGAIN_TRACK_GAIN, REPLAYGAIN_ALBUM_GAIN.
- Custom fields you invent: 'PROJECT', 'CLIENT', 'RIGHTS', 'SOURCE_MEDIA'. Every FLAC tool will preserve them whether or not it understands them.
- Embedded album art at any resolution.
WAV metadata is a historical mess. The original WAV INFO chunk supports a small fixed list of fields (IART, ICMT, ICRD, INAM and a handful more) that nobody uses outside of broadcast tools. ID3 tags can be embedded in WAV via a non-standard 'id3 ' chunk, but support is patchy: iTunes / Apple Music has had broken WAV metadata behaviour for many years, some tools strip ID3 chunks on save, others duplicate them. Broadcast WAV (BWF) extends this with the bext chunk for film and TV post-production timecode and originator data, which is the one place WAV metadata works reliably — but bext is structured for production hand-off, not for general library tagging.
For an archive where you want every track to carry its full provenance through every move and conversion for the next 30 years, FLAC is meaningfully more reliable.
Where WAV Is Still the Right Archival Choice
There are specific archival contexts where WAV remains correct:
- Broadcast and film post-production archives — the bext chunk in BWF stores timecode, scene/take, originator, originator reference, and is the standard format for delivering finished mixes to broadcasters. AES31 and EBU specifications expect WAV / BWF.
- Sample libraries that need to load directly into hardware samplers and older DAWs — WAV is what they read.
- Mastered album deliveries to aggregators (Distrokid, CD Baby, TuneCore, ADA) — they require WAV.
- Forensic / legal evidence archives where chain-of-custody tooling is built around BWF.
- Library of Congress preservation copies — the LoC's own digital audio preservation guidelines accept both BWF and FLAC, but BWF is named first in the recommended formats.
For a personal music library or field recording archive, none of these constraints apply.
Format Longevity
Both formats are safe long-term bets, but for slightly different reasons.
WAV has been a stable spec since 1991. The format is documented in IBM/Microsoft's RIFF specification. It is supported by literally every audio tool ever made. Risk of obsolescence: essentially zero. Risk of being unable to play a WAV in 2055: also essentially zero.
FLAC is an open, patent-free format maintained by the Xiph.Org Foundation since 2001, with the reference encoder/decoder source available under a BSD-style license. It is supported on Windows, macOS (Catalina+), iOS (11+), Android, Linux, and every major media server (Plex, Jellyfin, Roon, Squeezebox). Risk of obsolescence: very low. The reference codec is small enough that even if Xiph disappeared tomorrow, a competent C programmer could maintain it from the published source.
Neither format is going anywhere. Avoid lossy and proprietary formats (WMA Lossless, ATRAC Lossless, RealAudio Lossless) for archival — those are the ones with real long-term risk.
Institutional Archive Practice
For context on what professional archives use:
- Library of Congress Recorded Sound Section uses BWF as the primary preservation format, with FLAC accepted for born-digital materials.
- British Library Sound Archive uses BWF for analog transfers, FLAC for derivative copies.
- IASA (International Association of Sound and Audiovisual Archives) Technical Committee TC-04 specifies BWF as the recommended format for "permanent preservation," with FLAC accepted.
- Internet Archive stores audio in both FLAC and WAV depending on collection.
- Bandcamp uses FLAC as the master and derives MP3 / AAC / Opus from it.
The pattern: institutions with film/broadcast lineage use BWF; born-digital and consumer-music focused institutions tend toward FLAC. Both are considered preservation-grade.
The Practical Recommendation for a Personal Archive
For a personal archive — your music library, your CD rips, your home recordings, your field recordings, family voice archives, podcast back-catalogues — FLAC is the better choice in almost every case:
- Bit-perfect lossless ✓
- Built-in MD5 integrity verification ✓
- Vorbis Comment metadata that every tool respects ✓
- Embedded album art ✓
- 50% the size of equivalent WAV ✓
- Open, patent-free, long-term safe ✓
WAV makes sense as an archival format only when downstream tooling forces it (BWF for broadcast, sample libraries for hardware samplers, mastering deliveries to aggregators). For everything else, the file size, metadata reliability and integrity verification put FLAC clearly ahead.
If you already have a WAV archive and want to migrate, /wav-to-flac does the conversion losslessly. Going the other way for a specific project is also lossless — /flac-to-wav. The audio is bit-identical at every step.
For more on the production-side trade-offs (which lean differently), see FLAC vs WAV for music production. For the FLAC vs ALAC question on Apple devices, see FLAC vs ALAC.
Storage and Backup Strategy for an Archive
Whichever format you choose, a single copy is not an archive. The 3-2-1 rule applies to audio as much as to anything else: three copies, on two different storage media, with one copy offsite. Practical setup for a personal music archive:
- Primary copy on a local NAS or DAS with redundancy (mirrored disks or RAIDZ).
- Secondary copy on a rotating external drive kept in a drawer.
- Offsite copy on cloud storage (Backblaze B2, AWS Glacier Deep Archive, rsync.net) or in a different physical location.
FLAC's intrinsic MD5 verification lets you run periodic 'flac --test' across the archive to detect silent corruption before it propagates to your backups. WAV needs a separate manifest (sha256sum, par2, BagIt) for the same level of integrity assurance.