YouTube Background Music Mixing: Volume Levels and Transition Guide
Background music should be 18-20dB below voice. Here is the mixing guide covering LUFS, EQ, ducking, and licensing for YouTube.
Background music sets the mood of your video — but only if viewers cannot consciously hear it. The moment music competes with your voice for attention, it becomes a distraction that damages comprehension, retention, and watch time. The W3C accessibility guideline recommends non-speech sounds at least 20 dB lower than speech, and the BBC production standard follows the same principle. Viewers never complain about background music being too quiet, but they will click away the instant music masks dialogue.
This guide covers the specific volume levels, LUFS loudness standards, EQ techniques for voice clarity, ducking workflows in every major editor, music genre selection by video type, and licensing options — everything you need to produce clean, professional audio on YouTube.
For music licensing libraries, see our royalty-free music guide. For microphone selection, see our microphone guide.
The Volume Rule
Voice vs. Music Levels
| Audio Element | Recommended Level | Notes |
|---|---|---|
| Voice (primary) | -6 to -3 dBFS peak, -12 dBFS average | The dominant audio element — everything else is mixed relative to voice |
| Music during speech | -24 to -18 dB below voice (-25 to -30 dBFS absolute) | Barely audible — felt, not consciously heard |
| Music during non-speech | -12 to -6 dBFS | Can be louder during B-roll, transitions, and pauses |
| Sound effects | -14 to -20 dBFS | Short bursts only — should not linger or accumulate |
Epidemic Sound's production guide recommends music at -30 to -35 dBFS during dialogue — even quieter than the general -24 to -18 rule. This is because music with vocals or heavy bass content competes more aggressively with speech frequencies than ambient instrumental tracks.
The core principle: Set your voice level first. Voice is the anchor. Everything else is mixed relative to voice — never the other way around.
The Multi-Device Test
After mixing, test your video on three devices:
- Phone speakers — where most viewers watch. Phone speakers have poor bass separation, so music that sounds fine on headphones becomes muddled on phones.
- Laptop speakers — mid-range check. If music is audible during speech on laptop speakers, it is too loud.
- Headphones — detail check for clicks, pops, and abrupt transitions that speakers mask.
If the music is audible during speech on phone speakers, reduce it by 3-6 dB. The phone speaker test catches 90% of mixing problems.
LUFS and YouTube's Loudness Normalization
What LUFS Means for Creators
LUFS (Loudness Units relative to Full Scale) measures perceived loudness over time — unlike peak dB, which only measures instantaneous volume spikes. YouTube normalizes all audio to approximately -14 LUFS (integrated). This means:
- If your video is louder than -14 LUFS, YouTube automatically reduces the volume
- If your video is quieter than -14 LUFS, YouTube does not boost it — it stays quiet
- YouTube only attenuates, never amplifies
Practical LUFS Targets for Upload
| Parameter | Target |
|---|---|
| Integrated loudness | -13 to -14 LUFS |
| Short-term peak | No louder than -9 LUFS |
| True peak ceiling | -1 dBTP (prevents inter-sample clipping during re-encoding) |
| Dialogue average | -12 LUFS momentary |
| Music bed average | -18 to -20 LUFS integrated (6-8 LU quieter than voice) |
The -1 dBTP true peak ceiling is important because YouTube re-encodes your audio during processing. Audio that peaks at 0 dBFS can clip after re-encoding, creating audible distortion that was not present in your original export.
How to check LUFS: Most editors display LUFS meters. In DaVinci Resolve, the Fairlight page has a built-in loudness meter. In Premiere Pro, use the Loudness Radar effect. Free standalone options include Youlean Loudness Meter (works with any editor).
EQ for Voice Clarity Over Music
The most common reason music interferes with speech is frequency masking — voice and music occupy the same frequency range, so they compete. EQ (equalization) solves this by carving space for each element.
Voice EQ Settings
| Frequency | Action | Purpose |
|---|---|---|
| 80 Hz (male) / 100 Hz (female) | High-pass filter (24 dB/oct) | Removes low-end rumble, room noise, handling noise |
| 200-300 Hz | Cut 2-4 dB | Reduces boominess and muddiness |
| 500 Hz | Cut 1-3 dB if needed | Removes muffled, hollow quality |
| 2-6 kHz | Boost 2-4 dB | Increases speech clarity and intelligibility — the most impactful adjustment |
| 8-10 kHz | Gentle boost 1-2 dB | Adds air and presence (optional) |
Music EQ Settings (the Opposite)
Cut the same frequencies in the music track that you boosted in the voice:
- Cut 300 Hz to 3 kHz by 2-4 dB in the music track — this carves a "voice pocket" where speech sits without competition
- This technique is called "scooping" — you scoop out the frequencies in the music that overlap with voice
The result: voice sits clearly on top of the music, even at slightly higher music volumes than you would otherwise need.
Compression for Consistent Voice Volume
Compression reduces the volume difference between your loudest and quietest words, making speech consistently audible:
| Setting | Recommended Value |
|---|---|
| Ratio | 3:1 (starting point) |
| Threshold | -20 to -30 dBFS |
| Attack | Fast (5-10 ms) |
| Release | Medium (50-100 ms) |
Apply a de-esser before EQ and compression to tame sibilance (harsh "s" and "t" sounds). De-essing prevents the 2-6 kHz voice clarity boost from making sibilance painfully sharp.
For noise removal before mixing, see our noise removal guide.
Audio Ducking: Tool-by-Tool
Ducking automatically reduces music volume when speech is detected — the single most effective technique for clean voice-over-music mixing. Set ducking to reduce music by 6-12 dB during dialogue, with a 0.5-1 second fade-in and 1-2 second fade-out for natural transitions.
Premiere Pro
- Select the music clip in the timeline
- Open Essential Sound panel → tag the clip as Music
- Check Duck → set "Duck Against" to your dialogue track
- Adjust sensitivity, duck amount (-6 to -12 dB), and fade times
- Click Generate Keyframes — Premiere creates volume automation automatically
DaVinci Resolve (Fairlight)
- Switch to the Fairlight page
- Select the music track → open Inspector → Audio tab
- Apply the Ducker effect (new in DaVinci 19 — no manual sidechain setup required)
- Set the dialogue track as the trigger source
- Adjust threshold, range (-6 to -12 dB), and release time
Audacity
- Place voice and music on separate tracks
- Select the music track → Effect → Auto Duck
- Set duck amount (-12 dB default), inner fade (0 seconds), and outer fade (0.5 seconds)
- Audacity analyzes the voice track and creates volume automation on the music track
CapCut
CapCut does not have automatic ducking. Use manual volume keyframes:
- Select the music clip → tap the volume line
- Add keyframes where speech begins (volume down) and ends (volume up)
- Drag keyframes to create smooth fades
For a full editor comparison, see our editing software guide.
The 5-Step Mixing Workflow
Step 1: Set Voice Level First
Normalize your voice audio to peak at -6 to -3 dBFS, averaging -12 dBFS. This is your anchor — all other audio is mixed relative to voice. Apply compression and EQ to voice before adding any other elements.
Step 2: Add Music at -24 dB Below Voice
Start with music very quiet — barely perceptible during speech. Increase in 1 dB increments until you can just notice it, then pull back 2 dB. If your voice peaks at -6 dBFS, music during speech should sit around -24 to -30 dBFS.
Step 3: Automate Music Levels for Non-Speech Sections
During B-roll, transitions, or moments without speech, automate the music volume up to -12 to -6 dBFS. This creates dynamic audio that breathes with the content. The volume variation between speech and non-speech sections is what makes professional audio feel alive rather than flat.
Step 4: Add Sound Effects Sparingly
Transition sounds, whooshes, and notification effects at -14 to -20 dBFS. Keep them short (under 1 second) and infrequent. Over-used sound effects are as distracting as over-loud music.
Step 5: Check LUFS and Multi-Device Test
Export and check your integrated LUFS (target: -13 to -14). Then play on phone speakers, laptop speakers, and headphones. Fix any sections where music interferes with speech clarity. The phone speaker test is non-negotiable — skip it and you are mixing for headphone users only (a minority of your audience).
How Background Music Affects Watch Time
Background music is not decorative — it directly influences viewer behavior:
- Musical shifts signal new sections. When the music changes (tempo, key, or track), viewers subconsciously register a new segment starting. This resets attention and reduces mid-video drop-off.
- Musical hooks in the intro reduce early exit. An energetic music cue in the first 5-10 seconds creates a sense that something is happening — keeping viewers past the critical 8-second decision window.
- Silence is a tool. Dropping music entirely before a key point creates tension and emphasis. Viewers lean in when the music stops because it signals "pay attention — this is important."
- Wrong music accelerates drop-off. Upbeat pop music over a serious topic creates cognitive dissonance. The viewer feels something is "off" even if they cannot articulate why — and they leave.
For retention optimization, see our audience retention guide.
Matching Music Genre to Video Type
| Video Type | Recommended Genre | Why |
|---|---|---|
| Tutorials / How-To | Ambient, lo-fi, corporate | Non-distracting; does not compete with instructional speech |
| Vlogs / Day-in-Life | Upbeat acoustic, indie pop | Matches casual energy; complements personality |
| Product Reviews | Neutral electronic, light corporate | Does not bias the viewer's perception of the product |
| Gaming | Electronic, synthwave, hip-hop | Matches high-energy pacing; familiar to the audience |
| Travel | Cinematic, world music, acoustic guitar | Enhances visual storytelling; emotional resonance |
| Commentary / Opinion | Minimal ambient or no music | Voice is the content; music can feel manipulative |
| Montage / B-roll | Cinematic, emotional, dynamic | Music carries the section — can be louder and more prominent |
Rule of thumb: If the video is voice-driven (tutorials, reviews, commentary), use instrumental-only tracks without vocals. Vocals in background music compete with speech at every frequency.
Music Licensing for YouTube
Using unlicensed music results in copyright claims, demonetization, or strikes. For copyright strike risks, see our copyright strikes guide.
Library Comparison
| Library | Tracks | Price | License Model |
|---|---|---|---|
| YouTube Audio Library | 1,500+ | Free | Free for all YouTube use; some require attribution |
| Epidemic Sound | 55,000+ | $17.99/mo (creator plan) | Subscription covers YouTube, Instagram, TikTok, podcasts |
| Artlist | 30,000+ | $14.99/mo | "Download once, use forever" — license survives cancellation |
| Storyblocks | 100,000+ | $15/mo | Unlimited downloads; sync license included |
| Uppbeat | 10,000+ | Free (3 downloads/mo) | Free tier with attribution; paid removes attribution |
YouTube Audio Library is sufficient for creators starting out — it is free, pre-cleared for YouTube, and the quality has improved significantly. For professional production, Epidemic Sound and Artlist are the industry standards because they offer consistently high-quality tracks without the attribution requirements that make free libraries cumbersome.
For a detailed library comparison, see our royalty-free music guide.
6 Common Mixing Mistakes
1. Music Too Loud During Speech
The most common mistake. If a viewer has to strain to hear dialogue over music, they leave. Always mix with the voice first, music second.
2. No Headroom (Clipping)
Audio that peaks at 0 dBFS clips after YouTube's re-encoding. Leave at least 1 dB of headroom (-1 dBTP true peak maximum).
3. Jarring Music Cuts
Starting or stopping music abruptly sounds amateur. Every music entry and exit needs a 1-3 second fade. Every track change needs a crossfade with 1-2 seconds of overlap.
4. Same Volume Throughout the Entire Video
Flat audio is boring. Professional mixing has dynamic variation — music louder during B-roll, quieter during speech, absent during key moments. The variation keeps audio interesting and signals content structure to the viewer.
5. Choosing Music with Vocals
Vocals in background music compete directly with your speech at every frequency. Use instrumental-only tracks when speaking. Save vocal tracks for intros, outros, and montage sequences.
6. Not Testing on Phone Speakers
Mixing on studio monitors or headphones and never testing on phone speakers means you are optimizing for the wrong playback device. Most viewers watch on phones. The phone speaker test is mandatory.
Key Takeaways
- Music should be 18-20 dB below voice during speech (-24 to -30 dBFS absolute). If viewers can consciously hear the music while you talk, it is too loud.
- YouTube normalizes to -14 LUFS. Target -13 to -14 LUFS integrated with -1 dBTP true peak. YouTube only attenuates (reduces), never boosts — so uploading too quiet means it stays quiet.
- EQ carves space for voice. Boost voice at 2-6 kHz for clarity, cut music at 300 Hz-3 kHz to create a "voice pocket." This lets both elements coexist without masking.
- Use ducking in every editor. Premiere Pro (Essential Sound → Duck), DaVinci Resolve (Fairlight Ducker), Audacity (Auto Duck). Ducking automates the most tedious part of mixing.
- Test on phone speakers. Most viewers watch on phones with poor bass separation. If music interferes with speech on phone speakers, it is too loud.
- Music affects retention. Track changes signal new sections (resetting attention), musical hooks in intros reduce early drop-off, and silence before key points creates emphasis.
FAQ
How loud should background music be on YouTube?
18-20 dB below your voice during speech (-24 to -30 dBFS absolute). During non-speech moments (B-roll, transitions), music can rise to -12 to -6 dBFS. The W3C accessibility guideline recommends non-speech sounds at least 20 dB lower than speech. Always test on phone speakers — if music is audible during speech on a phone, reduce it.
What LUFS should I target for YouTube?
-13 to -14 LUFS integrated, with a true peak ceiling of -1 dBTP. YouTube normalizes audio to approximately -14 LUFS by reducing volume on louder content, but never boosts quieter content. Dialogue should average -12 LUFS momentary; music bed should average -18 to -20 LUFS integrated.
How do I set up audio ducking in Premiere Pro?
Select the music clip → open Essential Sound panel → tag as Music → check Duck → set "Duck Against" to your dialogue track → adjust sensitivity and duck amount (-6 to -12 dB) → click Generate Keyframes. Premiere automatically creates volume automation that lowers music when you speak.
What EQ settings improve voice clarity over music?
Apply a high-pass filter at 80 Hz (male) or 100 Hz (female) to remove rumble. Cut 2-4 dB at 200-300 Hz to reduce boominess. Boost 2-4 dB at 2-6 kHz for speech clarity. Then cut those same mid frequencies (300 Hz-3 kHz) in the music track to carve a "voice pocket."
Can I use any music on YouTube?
No. Unlicensed music results in copyright claims, demonetization, or channel strikes. Use YouTube's free Audio Library (1,500+ tracks), or subscribe to licensed libraries: Epidemic Sound ($17.99/mo), Artlist ($14.99/mo), or Storyblocks ($15/mo). All provide pre-cleared music for YouTube use.
Sources
- How to Master Audio for YouTube — Sweetwater — accessed 2026-04-03
- YouTube -13 LUFS Reference Level — Sweetwater — accessed 2026-04-03
- Loudness Standards: LUFS and Peaks — Sweetwater — accessed 2026-04-03
- Mastering for Streaming Platforms — iZotope — accessed 2026-04-03
- Background Music Volume — Pure Audio Insight — accessed 2026-04-03
- Audio Mixing for Video — Epidemic Sound — accessed 2026-04-03
- YouTubers Need Better Sound Mixing — How-To Geek — accessed 2026-04-03
- Auto Ducking in Premiere Pro — Adobe — accessed 2026-04-03
- DaVinci Resolve Audio Ducking — Boris FX — accessed 2026-04-03
- Mixing Narration with Background Music — Audacity Manual — accessed 2026-04-03
- YouTube Background Music by Genre — Soundstripe — accessed 2026-04-03
- Background Music Volume: Getting It Perfect — Wistia — accessed 2026-04-03
- Music and Viewer Engagement on YouTube — Soundraw — accessed 2026-04-03
- YouTube Audio Library — YouTube Help — accessed 2026-04-03
- Artlist vs Epidemic Sound 2025 — Photutorial — accessed 2026-04-03
- EQ for Voice Over — Music Guy Mixing — accessed 2026-04-03