ID3 Tags Explained: MP3 Metadata Standard
How ID3 tags work, the differences between v1, v2.3, and v2.4, frame IDs, album art, and how metadata behaves across MP3, M4A, FLAC, and WAV.
ID3 is the metadata standard that turns a bare MP3 file into a library entry. Without it, every track is just a numbered blob — no title, no artist, no cover art, no way for iTunes, Plex, Sonos, or your car stereo to know what it is. With it, the same bytes become a tagged record that carries title, artist, album, year, track number, genre, lyrics, embedded artwork, replay gain, ratings, and over a hundred other fields.
This is the practical guide to ID3 in 2026: what each version actually contains, why version mismatches break album art, how the format compares to the metadata schemes used by M4A, FLAC, OGG, and WAV, and what happens to your tags when you re-encode through any tool — including our audio compressor and audio cutter.
What ID3 Actually Is
ID3 stands for "Identify MP3." Eric Kemp coined the name in 1996 when he wrote the original spec for tagging MP3s with simple text fields. The standard is informal — there is no ISO document, no IETF RFC. The reference is id3.org, maintained by Martin Nilsson, plus the original spec text Kemp circulated and a series of community revisions.
The format has shipped in three major versions and a handful of minor ones. Each one is a different binary structure. A player that supports v2.3 may completely fail to read a v2.4 tag. Most metadata bugs you encounter — missing album art, garbled non-ASCII characters, fields that vanish after a sync — come down to one app writing one version and another app expecting a different one.
ID3v1: The 128-byte Sticker
The original ID3v1, finalized in 1996, is 128 bytes appended to the end of an MP3. The structure is fixed:
- 3 bytes: 'TAG' identifier
- 30 bytes: title
- 30 bytes: artist
- 30 bytes: album
- 4 bytes: year
- 30 bytes: comment
- 1 byte: genre (numeric ID, 0-79 originally, later extended to 0-191 by Winamp)
ID3v1.1, a 1997 tweak, sacrifices two bytes of the comment field for a track number. That is the entire spec.
The limits are crippling. Title and artist truncate at 30 characters — "Smashing Pumpkins - Tonight, Tonight" does not fit. Genre is a numeric code looking up a fixed table, so anything outside Winamp's hardcoded list (Blues, Classic Rock, Country... up to 191) cannot be expressed. Character encoding is officially ASCII; in practice it is ISO-8859-1, which means no Cyrillic, no CJK, no umlauts on systems that read it strictly. And appending the tag to the end of the file means anything streaming the audio without seeking to EOF never sees it.
ID3v1 still ships in many files for backwards compatibility — older car stereos and budget MP3 players read it and nothing else. But every modern tool writes ID3v2 alongside it.
ID3v2: A Real Container
ID3v2, introduced in 1998, fixed every architectural problem in v1. The tag is prepended to the file (so a streaming player sees it before the audio), tags are variable-length (no truncation), and the format is extensible — new field types can be added without breaking old readers.
The container has a 10-byte header:
- 3 bytes: 'ID3' identifier
- 2 bytes: version (major, revision)
- 1 byte: flags
- 4 bytes: synchsafe size of the tag
Synchsafe integers are an ID3v2 invention. Each byte uses only its low 7 bits, leaving the high bit zero. This guarantees the size field can never accidentally contain the byte sequence '0xFF 0xE0' that marks the start of an MP3 audio frame, so a player scanning for sync words will not get confused by a large tag.
After the header come a sequence of frames. Each frame is its own header plus payload.
ID3v2.2 (1998)
Frame IDs are 3 characters. 'TT2' is title, 'TP1' is artist, 'TAL' is album, 'TYE' is year, 'TRK' is track, 'PIC' is picture. Frame size is 3 bytes (24 bits, max 16 MB).
v2.2 was deprecated within a year and is rarely written today. Some old iTunes versions still read it. New tools should not write it.
ID3v2.3 (1999) — The Default
Frame IDs are 4 characters. Frame size is 4 bytes (synchsafe in v2.4, plain 32-bit in v2.3 — a real, frequently overlooked difference). Character encoding is selectable per frame: 0x00 = ISO-8859-1, 0x01 = UTF-16 with BOM. There is no UTF-8 in v2.3.
This is the most widely supported version on the planet. If you have any doubt about what to write, write v2.3. iTunes through about 2008 wrote v2.3, every car stereo built since 2000 reads v2.3, every podcast platform validates v2.3, and Mp3tag's default is v2.3.
The frames you actually use:
- 'TIT2' — Title
- 'TPE1' — Lead artist
- 'TPE2' — Album artist (sort by this for "Various Artists" albums)
- 'TALB' — Album
- 'TYER' — Year (4-digit string)
- 'TDAT' — Date (DDMM)
- 'TRCK' — Track number, formatted as 'current/total' like '03/12'
- 'TPOS' — Disc number, same 'current/total' format
- 'TCON' — Genre (free text, but '(17)' style numeric refs to v1 list still legal)
- 'COMM' — Comment, with language code and short description
- 'APIC' — Attached picture (album art)
- 'USLT' — Unsynchronized lyrics
- 'PCNT' — Play counter
- 'POPM' — Popularimeter (rating)
- 'TXXX' — User-defined text frame ('TXXX:replaygain_track_gain' is a common one)
- 'PRIV' — Private frame, used by iTunes, Apple Music, Beatport, and others to embed proprietary metadata
ID3v2.4 (2000)
The major revision was UTF-8 native (encoding byte 0x03), more granular date frames ('TDRC' replaces 'TYER' and supports full ISO 8601 timestamps), new fields ('TMOO' for mood, 'TPRO' for produced notice, 'TSOA' for album sort order), and the option to append the tag to the end of the file like v1 did.
Adoption is partial. Apple's Music app and modern iTunes write v2.4 by default. Mp3tag can write either, and defaults to v2.3 because compatibility wins more battles. Many car stereos, hardware MP3 players manufactured before 2015, and budget Bluetooth speakers do not read v2.4 correctly — they will display blank fields, garbled characters, or fall back to v1 if it is also present. If you are tagging a library and using a mix of devices, write v2.3 with UTF-16 encoding for safety.
How Frames Are Structured
Every v2.3 and v2.4 frame is laid out the same way:
- 4 bytes: frame ID ('TIT2', 'APIC', etc.)
- 4 bytes: frame size (in v2.4, synchsafe; in v2.3, plain big-endian uint32)
- 2 bytes: flags
- N bytes: payload
Text frames start with one encoding byte, then the string. 'TIT2' encoding=0x01, payload='FF FE 53 00 6F 00 6E 00 67 00' is 'Song' in UTF-16-LE with BOM. The lack of a length prefix on the string is intentional — the frame size already bounds it.
A common pitfall: frames with embedded null terminators inside the data (UTF-16 strings always have null bytes for ASCII characters) trip naive parsers that scan for a single 0x00 as a string terminator. Use the frame size, not null-scanning.
Album Art: The 'APIC' Frame
Album artwork is the field that breaks most often, so understanding the layout pays off. 'APIC' payload:
- 1 byte: text encoding for the description string
- N bytes: MIME type, null-terminated ASCII ('image/jpeg' or 'image/png')
- 1 byte: picture type
- N bytes: description (encoded per the encoding byte), null-terminated
- N bytes: raw image data (JPEG or PNG bytes, inline)
Picture types are a 21-entry enum. The ones you see in practice:
- 0x00 — Other
- 0x01 — 32x32 file icon (PNG only)
- 0x03 — Cover (front) — this is what 95% of players display
- 0x04 — Cover (back)
- 0x05 — Leaflet page
- 0x06 — Media (label side of CD)
- 0x08 — Artist
- 0x0E — During recording
- 0x12 — Illustration
If your album art does not show, check the picture type. Some players only display 0x03; if your tagger wrote 0x00, the cover is technically there but ignored.
Size matters. A 3000x3000 px JPEG at 90% quality is 1-2 MB. Embedded in every track of a 12-track album, that is 24 MB of duplicated artwork. Best practice for embedded art is 500x500 to 1000x1000 px JPEG at around 80% quality, which lands at 50-150 KB per file. iTunes and Apple Music display embedded art at this resolution comfortably; the high-res master cover lives in your Library.
The Genre Mess
ID3v1 used a numeric genre ID. The list started as Winamp's 80 entries (0=Blues, 1=Classic Rock, ... 79=Hard Rock), got extended in v1.1 to 191, and is fossilized — you cannot add genres without breaking compatibility.
ID3v2 supports free-text genre. 'TCON' = 'Synthwave' is legal. But for backwards compatibility, the v2 spec also allows '(17)' to mean "look up genre 17 in the v1 table" and '(17)Synthwave' to mean "v1 genre 17, plus the free text 'Synthwave' as a sub-tag."
In practice in 2026, write free-text genres only. The numeric prefix syntax is a relic. If your taxonomy is "EDM > Synthwave > Outrun," store the leaf node as the 'TCON' value.
Other Formats: Same Idea, Different Container
ID3 only lives in MP3 (and the optional 'id3 ' chunk inside WAV). Other audio formats use unrelated metadata schemes that solve the same problem.
M4A and MP4
M4A files use the MP4 atom system. Metadata lives inside the 'moov.udta.meta.ilst' atom path. Atom names are 4 bytes, often starting with the copyright character 0xA9 (rendered as '©' or '\xa9'):
- '©nam' — Title
- '©ART' — Artist
- '©alb' — Album
- '©day' — Year
- 'covr' — Cover art (full image bytes, like 'APIC')
- 'trkn' — Track number (binary uint16 pair, current and total)
This is iTunes's tagging system. Convert M4A to MP3 with m4a-to-mp3 and a good converter will translate '©nam' to 'TIT2', 'covr' to 'APIC', and so on. A bad converter will drop everything. See what is M4A for more on the container.
FLAC and OGG Vorbis
Both use Vorbis Comments — plaintext key=value pairs encoded as UTF-8:
``` TITLE=Song Name ARTIST=Artist Name ALBUM=Album Name DATE=2026 TRACKNUMBER=3 ```
Album art in FLAC is stored in a separate METADATA_BLOCK_PICTURE block, structurally similar to 'APIC' but base64-encoded when written into a Vorbis Comment field. See what is FLAC for the full container picture.
WAV
WAV's official metadata story is the RIFF INFO chunk: 'INAM' (title), 'IART' (artist), 'ICMT' (comment), 'ICRD' (creation date), 'IGNR' (genre). Support is patchy — Foobar2000 reads it, Windows Explorer reads it, iTunes ignores it.
In practice many tools embed an ID3v2 tag inside a non-standard 'id3 ' chunk in the WAV. This is what ffmpeg does by default when you write metadata into a WAV. It is not in the WAV spec but is increasingly common.
What Happens When You Re-Encode
This is the question that costs real time when something breaks. ffmpeg's behavior is the reference: by default, '-map_metadata 0' is implied for most format conversions, which copies all metadata from the first input to the output. Without it (or with '-map_metadata -1' to explicitly strip), tags are lost.
Audio cutting and re-encoding through our tools follows the same rules. Audio cutter re-encodes the trimmed segment, which means embedded tags survive only if the underlying ffmpeg pipeline preserves them — and the cutter does pass '-map_metadata 0', so common fields carry over. Audio compressor re-encodes at a lower bitrate; same rule applies. Album art ('APIC' frames) is the field most likely to silently disappear during conversion because some encoders strip non-text frames by default.
If a track's metadata is irreplaceable, back up the file before any re-encoding. Or re-tag after the fact with Mp3tag or another editor. Tagging is fast — 200 files in ten minutes. Diagnosing why your library lost its cover art three months later is not.
Bottom Line
For new MP3 files in 2026: write ID3v2.3 with UTF-16 encoding for maximum compatibility, keep embedded album art at 500-1000 px JPEG (under 200 KB per file), use free-text genres, and write 'TPE2' (album artist) on every track of compilation albums so they group correctly. Do not bother writing v2.2; only write v2.4 if you control the playback device. Always include a v1 tag too — it costs 128 bytes and rescues you on legacy hardware.
The format is old and informal, but it is the lingua franca of music libraries. Understanding the frame IDs and version differences saves real debugging time. Compared to the lossy-vs-lossless decision (see lossless vs lossy) or the bitrate question, tagging is where the most common everyday frustrations live.