YouTube Background Music Mixing: Volume Levels and Transition Guide

Background music sets the mood of your video — but only if viewers cannot consciously hear it. The moment music competes with your voice for attention, it becomes a distraction that damages comprehension, retention, and watch time. The W3C accessibility guideline recommends non-speech sounds at least 20 dB lower than speech, and the BBC production standard follows the same principle. Viewers never complain about background music being too quiet, but they will click away the instant music masks dialogue.

This guide covers the specific volume levels, LUFS loudness standards, EQ techniques for voice clarity, ducking workflows in every major editor, music genre selection by video type, and licensing options — everything you need to produce clean, professional audio on YouTube.

For music licensing libraries, see our royalty-free music guide. For microphone selection, see our microphone guide.

The Volume Rule

Voice vs. Music Levels

Audio Element	Recommended Level	Notes
Voice (primary)	-6 to -3 dBFS peak, -12 dBFS average	The dominant audio element — everything else is mixed relative to voice
Music during speech	-24 to -18 dB below voice (-25 to -30 dBFS absolute)	Barely audible — felt, not consciously heard
Music during non-speech	-12 to -6 dBFS	Can be louder during B-roll, transitions, and pauses
Sound effects	-14 to -20 dBFS	Short bursts only — should not linger or accumulate

Epidemic Sound's production guide recommends music at -30 to -35 dBFS during dialogue — even quieter than the general -24 to -18 rule. This is because music with vocals or heavy bass content competes more aggressively with speech frequencies than ambient instrumental tracks.

The core principle: Set your voice level first. Voice is the anchor. Everything else is mixed relative to voice — never the other way around.

The Multi-Device Test

After mixing, test your video on three devices:

Phone speakers — where most viewers watch. Phone speakers have poor bass separation, so music that sounds fine on headphones becomes muddled on phones.
Laptop speakers — mid-range check. If music is audible during speech on laptop speakers, it is too loud.
Headphones — detail check for clicks, pops, and abrupt transitions that speakers mask.

If the music is audible during speech on phone speakers, reduce it by 3-6 dB. The phone speaker test catches 90% of mixing problems.

LUFS and YouTube's Loudness Normalization

What LUFS Means for Creators

LUFS (Loudness Units relative to Full Scale) measures perceived loudness over time — unlike peak dB, which only measures instantaneous volume spikes. YouTube normalizes all audio to approximately -14 LUFS (integrated). This means:

If your video is louder than -14 LUFS, YouTube automatically reduces the volume
If your video is quieter than -14 LUFS, YouTube does not boost it — it stays quiet
YouTube only attenuates, never amplifies

Practical LUFS Targets for Upload

Parameter	Target
Integrated loudness	-13 to -14 LUFS
Short-term peak	No louder than -9 LUFS
True peak ceiling	-1 dBTP (prevents inter-sample clipping during re-encoding)
Dialogue average	-12 LUFS momentary
Music bed average	-18 to -20 LUFS integrated (6-8 LU quieter than voice)

The -1 dBTP true peak ceiling is important because YouTube re-encodes your audio during processing. Audio that peaks at 0 dBFS can clip after re-encoding, creating audible distortion that was not present in your original export.

How to check LUFS: Most editors display LUFS meters. In DaVinci Resolve, the Fairlight page has a built-in loudness meter. In Premiere Pro, use the Loudness Radar effect. Free standalone options include Youlean Loudness Meter (works with any editor).

EQ for Voice Clarity Over Music

The most common reason music interferes with speech is frequency masking — voice and music occupy the same frequency range, so they compete. EQ (equalization) solves this by carving space for each element.

Voice EQ Settings

Frequency	Action	Purpose
80 Hz (male) / 100 Hz (female)	High-pass filter (24 dB/oct)	Removes low-end rumble, room noise, handling noise
200-300 Hz	Cut 2-4 dB	Reduces boominess and muddiness
500 Hz	Cut 1-3 dB if needed	Removes muffled, hollow quality
2-6 kHz	Boost 2-4 dB	Increases speech clarity and intelligibility — the most impactful adjustment
8-10 kHz	Gentle boost 1-2 dB	Adds air and presence (optional)

Music EQ Settings (the Opposite)

Cut the same frequencies in the music track that you boosted in the voice:

Cut 300 Hz to 3 kHz by 2-4 dB in the music track — this carves a "voice pocket" where speech sits without competition
This technique is called "scooping" — you scoop out the frequencies in the music that overlap with voice

The result: voice sits clearly on top of the music, even at slightly higher music volumes than you would otherwise need.

Compression for Consistent Voice Volume

Compression reduces the volume difference between your loudest and quietest words, making speech consistently audible:

Setting	Recommended Value
Ratio	3:1 (starting point)
Threshold	-20 to -30 dBFS
Attack	Fast (5-10 ms)
Release	Medium (50-100 ms)

Apply a de-esser before EQ and compression to tame sibilance (harsh "s" and "t" sounds). De-essing prevents the 2-6 kHz voice clarity boost from making sibilance painfully sharp.

For noise removal before mixing, see our noise removal guide.

Audio Ducking: Tool-by-Tool

Ducking automatically reduces music volume when speech is detected — the single most effective technique for clean voice-over-music mixing. Set ducking to reduce music by 6-12 dB during dialogue, with a 0.5-1 second fade-in and 1-2 second fade-out for natural transitions.

Premiere Pro

Select the music clip in the timeline
Open Essential Sound panel → tag the clip as Music
Check Duck → set "Duck Against" to your dialogue track
Adjust sensitivity, duck amount (-6 to -12 dB), and fade times
Click Generate Keyframes — Premiere creates volume automation automatically

DaVinci Resolve (Fairlight)

Switch to the Fairlight page
Select the music track → open Inspector → Audio tab
Apply the Ducker effect (new in DaVinci 19 — no manual sidechain setup required)
Set the dialogue track as the trigger source
Adjust threshold, range (-6 to -12 dB), and release time

Audacity

Place voice and music on separate tracks
Select the music track → Effect → Auto Duck
Set duck amount (-12 dB default), inner fade (0 seconds), and outer fade (0.5 seconds)
Audacity analyzes the voice track and creates volume automation on the music track

CapCut

CapCut does not have automatic ducking. Use manual volume keyframes:

Select the music clip → tap the volume line
Add keyframes where speech begins (volume down) and ends (volume up)
Drag keyframes to create smooth fades

For a full editor comparison, see our editing software guide.

The 5-Step Mixing Workflow

Step 1: Set Voice Level First

Normalize your voice audio to peak at -6 to -3 dBFS, averaging -12 dBFS. This is your anchor — all other audio is mixed relative to voice. Apply compression and EQ to voice before adding any other elements.

Step 2: Add Music at -24 dB Below Voice

Start with music very quiet — barely perceptible during speech. Increase in 1 dB increments until you can just notice it, then pull back 2 dB. If your voice peaks at -6 dBFS, music during speech should sit around -24 to -30 dBFS.

Step 3: Automate Music Levels for Non-Speech Sections

During B-roll, transitions, or moments without speech, automate the music volume up to -12 to -6 dBFS. This creates dynamic audio that breathes with the content. The volume variation between speech and non-speech sections is what makes professional audio feel alive rather than flat.

Step 4: Add Sound Effects Sparingly

Transition sounds, whooshes, and notification effects at -14 to -20 dBFS. Keep them short (under 1 second) and infrequent. Over-used sound effects are as distracting as over-loud music.

Step 5: Check LUFS and Multi-Device Test

Export and check your integrated LUFS (target: -13 to -14). Then play on phone speakers, laptop speakers, and headphones. Fix any sections where music interferes with speech clarity. The phone speaker test is non-negotiable — skip it and you are mixing for headphone users only (a minority of your audience).

How Background Music Affects Watch Time

Background music is not decorative — it directly influences viewer behavior:

Musical shifts signal new sections. When the music changes (tempo, key, or track), viewers subconsciously register a new segment starting. This resets attention and reduces mid-video drop-off.
Musical hooks in the intro reduce early exit. An energetic music cue in the first 5-10 seconds creates a sense that something is happening — keeping viewers past the critical 8-second decision window.
Silence is a tool. Dropping music entirely before a key point creates tension and emphasis. Viewers lean in when the music stops because it signals "pay attention — this is important."
Wrong music accelerates drop-off. Upbeat pop music over a serious topic creates cognitive dissonance. The viewer feels something is "off" even if they cannot articulate why — and they leave.

For retention optimization, see our audience retention guide.

Matching Music Genre to Video Type

Video Type	Recommended Genre	Why
Tutorials / How-To	Ambient, lo-fi, corporate	Non-distracting; does not compete with instructional speech
Vlogs / Day-in-Life	Upbeat acoustic, indie pop	Matches casual energy; complements personality
Product Reviews	Neutral electronic, light corporate	Does not bias the viewer's perception of the product
Gaming	Electronic, synthwave, hip-hop	Matches high-energy pacing; familiar to the audience
Travel	Cinematic, world music, acoustic guitar	Enhances visual storytelling; emotional resonance
Commentary / Opinion	Minimal ambient or no music	Voice is the content; music can feel manipulative
Montage / B-roll	Cinematic, emotional, dynamic	Music carries the section — can be louder and more prominent

Rule of thumb: If the video is voice-driven (tutorials, reviews, commentary), use instrumental-only tracks without vocals. Vocals in background music compete with speech at every frequency.

Music Licensing for YouTube

Using unlicensed music results in copyright claims, demonetization, or strikes. For copyright strike risks, see our copyright strikes guide.

Library Comparison

Library	Tracks	Price	License Model
YouTube Audio Library	1,500+	Free	Free for all YouTube use; some require attribution
Epidemic Sound	55,000+	$17.99/mo (creator plan)	Subscription covers YouTube, Instagram, TikTok, podcasts
Artlist	30,000+	$14.99/mo	"Download once, use forever" — license survives cancellation
Storyblocks	100,000+	$15/mo	Unlimited downloads; sync license included
Uppbeat	10,000+	Free (3 downloads/mo)	Free tier with attribution; paid removes attribution

YouTube Audio Library is sufficient for creators starting out — it is free, pre-cleared for YouTube, and the quality has improved significantly. For professional production, Epidemic Sound and Artlist are the industry standards because they offer consistently high-quality tracks without the attribution requirements that make free libraries cumbersome.

For a detailed library comparison, see our royalty-free music guide.

6 Common Mixing Mistakes

1. Music Too Loud During Speech

The most common mistake. If a viewer has to strain to hear dialogue over music, they leave. Always mix with the voice first, music second.

2. No Headroom (Clipping)

Audio that peaks at 0 dBFS clips after YouTube's re-encoding. Leave at least 1 dB of headroom (-1 dBTP true peak maximum).

3. Jarring Music Cuts

Starting or stopping music abruptly sounds amateur. Every music entry and exit needs a 1-3 second fade. Every track change needs a crossfade with 1-2 seconds of overlap.

4. Same Volume Throughout the Entire Video

Flat audio is boring. Professional mixing has dynamic variation — music louder during B-roll, quieter during speech, absent during key moments. The variation keeps audio interesting and signals content structure to the viewer.

5. Choosing Music with Vocals

Vocals in background music compete directly with your speech at every frequency. Use instrumental-only tracks when speaking. Save vocal tracks for intros, outros, and montage sequences.

6. Not Testing on Phone Speakers

Mixing on studio monitors or headphones and never testing on phone speakers means you are optimizing for the wrong playback device. Most viewers watch on phones. The phone speaker test is mandatory.

Key Takeaways

Music should be 18-20 dB below voice during speech (-24 to -30 dBFS absolute). If viewers can consciously hear the music while you talk, it is too loud.
YouTube normalizes to -14 LUFS. Target -13 to -14 LUFS integrated with -1 dBTP true peak. YouTube only attenuates (reduces), never boosts — so uploading too quiet means it stays quiet.
EQ carves space for voice. Boost voice at 2-6 kHz for clarity, cut music at 300 Hz-3 kHz to create a "voice pocket." This lets both elements coexist without masking.
Use ducking in every editor. Premiere Pro (Essential Sound → Duck), DaVinci Resolve (Fairlight Ducker), Audacity (Auto Duck). Ducking automates the most tedious part of mixing.
Test on phone speakers. Most viewers watch on phones with poor bass separation. If music interferes with speech on phone speakers, it is too loud.
Music affects retention. Track changes signal new sections (resetting attention), musical hooks in intros reduce early drop-off, and silence before key points creates emphasis.

FAQ

How loud should background music be on YouTube?

18-20 dB below your voice during speech (-24 to -30 dBFS absolute). During non-speech moments (B-roll, transitions), music can rise to -12 to -6 dBFS. The W3C accessibility guideline recommends non-speech sounds at least 20 dB lower than speech. Always test on phone speakers — if music is audible during speech on a phone, reduce it.

What LUFS should I target for YouTube?

-13 to -14 LUFS integrated, with a true peak ceiling of -1 dBTP. YouTube normalizes audio to approximately -14 LUFS by reducing volume on louder content, but never boosts quieter content. Dialogue should average -12 LUFS momentary; music bed should average -18 to -20 LUFS integrated.

How do I set up audio ducking in Premiere Pro?

Select the music clip → open Essential Sound panel → tag as Music → check Duck → set "Duck Against" to your dialogue track → adjust sensitivity and duck amount (-6 to -12 dB) → click Generate Keyframes. Premiere automatically creates volume automation that lowers music when you speak.

What EQ settings improve voice clarity over music?

Apply a high-pass filter at 80 Hz (male) or 100 Hz (female) to remove rumble. Cut 2-4 dB at 200-300 Hz to reduce boominess. Boost 2-4 dB at 2-6 kHz for speech clarity. Then cut those same mid frequencies (300 Hz-3 kHz) in the music track to carve a "voice pocket."

Can I use any music on YouTube?

No. Unlicensed music results in copyright claims, demonetization, or channel strikes. Use YouTube's free Audio Library (1,500+ tracks), or subscribe to licensed libraries: Epidemic Sound ($17.99/mo), Artlist ($14.99/mo), or Storyblocks ($15/mo). All provide pre-cleared music for YouTube use.

Sources

How to Master Audio for YouTube — Sweetwater — accessed 2026-04-03
YouTube -13 LUFS Reference Level — Sweetwater — accessed 2026-04-03
Loudness Standards: LUFS and Peaks — Sweetwater — accessed 2026-04-03
Mastering for Streaming Platforms — iZotope — accessed 2026-04-03
Background Music Volume — Pure Audio Insight — accessed 2026-04-03
Audio Mixing for Video — Epidemic Sound — accessed 2026-04-03
YouTubers Need Better Sound Mixing — How-To Geek — accessed 2026-04-03
Auto Ducking in Premiere Pro — Adobe — accessed 2026-04-03
DaVinci Resolve Audio Ducking — Boris FX — accessed 2026-04-03
Mixing Narration with Background Music — Audacity Manual — accessed 2026-04-03
YouTube Background Music by Genre — Soundstripe — accessed 2026-04-03
Background Music Volume: Getting It Perfect — Wistia — accessed 2026-04-03
Music and Viewer Engagement on YouTube — Soundraw — accessed 2026-04-03
YouTube Audio Library — YouTube Help — accessed 2026-04-03
Artlist vs Epidemic Sound 2025 — Photutorial — accessed 2026-04-03
EQ for Voice Over — Music Guy Mixing — accessed 2026-04-03