Audio Formats for Mobile Apps
Choosing the right audio format for a mobile app affects startup time, battery life, file size, and playback quality. iOS and Android have different native codec support, different playback APIs, and different constraints. This guide covers everything a mobile developer or sound designer needs to know about audio format selection for apps.
iOS Audio Format Support
iOS natively supports AAC (M4A), MP3, WAV, AIFF, ALAC, and Apple Lossless playback through AVFoundation and the Audio Toolbox frameworks. The recommended format for most app audio on iOS is AAC inside an M4A container at 128–256 kbps. AAC is hardware-decoded on all iOS devices from the iPhone 3GS onwards, meaning playback consumes minimal battery and CPU. For short UI sounds (button taps, notifications), use uncompressed CAF (Core Audio Format) or AIFF files — hardware-assisted decompression has latency that is unacceptable for interactive sounds below 100ms. Opus is supported on iOS from version 11. OGG Vorbis has no native iOS support — avoid it for iOS-targeted audio.
Android Audio Format Support
Android natively supports MP3, AAC (M4A), OGG Vorbis, FLAC, WAV, and Opus across all devices from Android 5.0 onwards. For game audio and apps using the SoundPool API, prefer OGG Vorbis for background music and WAV for short sound effects — SoundPool has memory limits that make uncompressed files for long tracks impractical. OGG Vorbis decodes efficiently on Android's MediaCodec pipeline. For streaming audio, Opus at 64–96 kbps offers the best quality-per-byte on Android. For music playback apps, AAC inside M4A is the safe choice that works identically on both platforms. Avoid WMA, AIFF, and ALAC on Android — support is inconsistent across manufacturers.
UI Sounds and Sound Effects: Format Strategy
UI sounds — taps, swipes, confirmations, error beeps — need to play with minimal latency. For iOS: use uncompressed CAF files at 44.1 kHz, 16-bit, mono. CAF allows hardware-assisted playback with low latency through the Audio Queue Services API. For Android: use OGG files at 44.1 kHz loaded into SoundPool. SoundPool decodes OGG files into memory and plays them with low latency. Keep UI sounds short (under 2 seconds), mono, and small (under 100 KB per file). Longer sound effects can use compressed formats: MP3 or AAC on iOS, OGG on Android. For cross-platform development using Unity or Unreal Engine, the engine handles format selection automatically — provide WAV source files and let the build system handle target-specific encoding.
Music and Background Audio in Apps
Background music in apps — games, meditation apps, workout apps — benefits from the most efficient compression to minimize app bundle size. Target 128 kbps AAC (M4A) for music on iOS for an excellent quality-to-size balance. On Android, 128 kbps OGG Vorbis achieves similar quality at slightly smaller file size. For cross-platform targets, AAC M4A works on both iOS (natively) and Android (natively from API 16). Consider whether music should be bundled with the app (in the app binary) or streamed at runtime. Bundled audio increases app download size — App Store has a 200 MB cellular download limit. Streaming audio uses user data but keeps the app small. Music files exceeding 50 MB are generally better served from a CDN.
Voice and Speech in Apps
Voice assistants, audiobooks apps, narrated tutorials, and voice prompts use different format strategies than music. For voice-only speech, quality requirements are lower — intelligibility and naturalness matter more than full-spectrum fidelity. On iOS and Android, AAC at 64 kbps produces excellent intelligible voice audio at roughly 0.5 MB per minute. For interactive voice response (IVR) or voice prompts, 8 kHz G.711 or G.729 are telephony standards — overkill quality is not needed. For audiobook apps, the industry uses MP3 at 64–128 kbps (stereo or mono depending on whether ambient music is included). For speech synthesis (text-to-speech) cached audio, store generated audio as Opus at 32–48 kbps for maximum efficiency — Opus is specifically designed for voice and produces excellent results at these bitrates.