B-Roll vs. A-Roll: What's the Difference?

Quick Answer

Term	What It Is	Example
A-roll	Primary footage that carries the narrative—usually a speaker on camera, an interview, or a direct-to-camera presentation.	A subject-matter expert explaining a new compliance policy.
B-roll	Supplemental footage edited over the A-roll to illustrate, emphasize, or add visual variety.	Shots of employees working at their desks while the expert's voiceover continues.

In short: A-roll tells the story; B-roll shows it.

A-Roll in Detail

A-roll is the backbone of any video. It is the footage your audience would hear if they closed their eyes—the narration, dialogue, or on-camera presentation that drives the content forward.

Common A-roll sources include:

Talking-head interviews — A subject speaks directly to camera or to an off-screen interviewer.
Presenter or narrator footage — A host walks the viewer through a topic.
Voiceover recordings — Audio narration recorded separately and laid over visuals.

A-roll establishes authority and keeps the viewer oriented. Without it, a video lacks a clear narrative thread.

B-Roll in Detail

B-roll is everything that isn't your primary speaker or narrator on screen. It is layered on top of the A-roll audio to add context, prevent visual monotony, and reinforce key points.

Common B-roll sources include:

Workplace footage — Employees using a product, collaborating in a meeting room, or operating equipment.
Screen recordings — Software demos, dashboards, or UI walkthroughs.
Stock footage — Licensed clips that illustrate abstract concepts (e.g., a rocket launch for "growth").
Motion graphics — Animated charts, diagrams, or text overlays.
Product shots — Close-ups of physical products, packaging, or hardware.

Why B-Roll Matters

Watching a single speaker for several minutes without visual breaks is fatiguing. Research on video engagement and viewer preference supports using visual variety to maintain attention. B-roll solves this by:

Maintaining attention — Visual variety keeps viewers engaged longer.
Covering edits — When you cut a sentence or rearrange an interview, B-roll hides the jump cut.
Adding proof — Showing the product in action is more persuasive than describing it.
Setting context — Establishing shots of a factory floor, hospital, or classroom ground the narrative in a real environment.

How AI Video Tools Handle A-Roll and B-Roll

AI video platforms like Knowlify generate both layers automatically. The AI creates a narrated voiceover (serving as A-roll) and pairs it with contextually relevant visuals, animations, and on-screen text (serving as B-roll)—all derived from your source documents. This eliminates the need for separate film shoots or stock-footage licensing for most training and explainer use cases.

A-Roll vs. B-Roll: How They Work Together in Practice

A-roll and B-roll are not separate pieces stitched end to end. In a finished video, B-roll is layered over the A-roll audio so the narration or interview continues uninterrupted while the visuals shift to something more illustrative. Editors call this a "cutaway" — the audio track stays anchored to the speaker while the picture cuts away to supporting footage.

The balance between the two layers depends on the format:

Training and explainer videos — Typically 40–60% B-roll coverage. The speaker establishes credibility on camera, then B-roll steps in to demonstrate processes, highlight key data, or show real-world context.
Executive communications — Heavier on A-roll (70%+). Employees want to see the leader speaking directly; B-roll is used sparingly for charts, product shots, or location footage.
Product demos — B-roll dominant (70–80%). The voiceover narrates while screen recordings, close-up product shots, and motion graphics do the heavy lifting.
Customer testimonials — Roughly even. The interviewee's face builds trust, but cutting to footage of the product in their environment reinforces their claims.

A useful rule of thumb: if a sentence in your script describes something visual—an action, a place, a data trend—that sentence is a B-roll opportunity. If it expresses opinion, emotion, or direct instruction, keep the speaker on screen.

Common Mistakes to Avoid

Even experienced editors fall into B-roll traps. Here are five of the most common:

Using B-roll that contradicts the narration. If the speaker says "our team works on-site," don't cut to footage of someone on a video call from home. Mismatched visuals undermine credibility faster than no B-roll at all.
Over-relying on generic stock footage. A few well-chosen stock clips are fine, but stringing together stock after stock makes a video feel impersonal. Whenever possible, mix in original footage, screen recordings, or custom motion graphics to keep the video grounded in your actual product or workplace.
Treating B-roll as filler. Every B-roll cut should serve one of three purposes: illustrate a point, provide proof, or give the viewer a visual break at a natural pause in narration. If a clip does none of those, it is dead weight.
Ignoring audio continuity. When you cut from A-roll to B-roll, the ambient audio changes—room tone drops out, background noise shifts, or the microphone character changes. Use consistent background music or a subtle room-tone bed under the entire timeline to smooth these transitions.
Mismatching B-roll pacing to narration speed. Fast narration paired with slow, sweeping B-roll (or vice versa) creates a jarring disconnect. Match the energy: quick cuts for energetic voiceover segments, longer holds for contemplative or detail-heavy explanations.

How Much B-Roll Do You Need?

The answer depends on video length and format, but here are practical benchmarks:

5-minute training video — Plan for 15–25 individual B-roll clips. Each clip typically runs 3–6 seconds on the timeline, so 25 clips at an average of 4 seconds gives you roughly 100 seconds of B-roll coverage—about a third of total runtime.
2-minute product demo — You may need 12–18 clips, since B-roll dominates the visual track and clips rotate quickly.
10-minute onboarding video — Budget 30–50 clips. Longer videos demand more visual variety to prevent fatigue.

If you are shooting original footage, a 3:1 shooting ratio is a safe starting point—capture three times more B-roll than you expect to use. This gives editors enough options to find the right shot for each narration beat without re-shooting.

For teams that use AI-generated visuals instead of live shoots, the calculus is different. AI tools can produce targeted visuals on demand, so you don't need to over-shoot. Instead, focus your effort on writing precise visual descriptions in your script so the generated B-roll matches the narration accurately.

Key Takeaways

A-roll is the primary footage (speaker, narration, interview) that carries the narrative; B-roll is the supplemental footage layered over it
B-roll maintains viewer attention, covers edits, and adds visual proof to support the narration
Plan for a 3:1 shooting ratio when capturing original B-roll — shoot three times more than you expect to use
AI video tools generate both layers automatically from your documents, eliminating the need for separate film shoots
Match B-roll pacing to narration energy: quick cuts for fast delivery, longer holds for detailed explanations

FAQ

What is the difference between A-roll and B-roll?

A-roll is the primary footage that carries the narrative—usually a speaker on camera, an interview, or direct-to-camera presentation. B-roll is supplemental footage (workplace shots, screen recordings, graphics) layered over the A-roll audio. A-roll tells the story; B-roll shows it and adds visual variety.

When should I use B-roll?

Use B-roll to maintain attention (visual variety), cover edits (hide jump cuts when you trim or rearrange), add proof (show the product or process in action), and set context (establishing shots of the environment). Every B-roll cut should illustrate a point, provide proof, or give a natural visual break—not just fill time.

How much B-roll do I need?

It depends on format and length. For a 5-minute training video, plan for roughly 15–25 B-roll clips (about a third of runtime). For product demos, B-roll often dominates (70–80% of the visual track). If shooting original footage, a 3:1 shooting ratio—three times more B-roll than you expect to use—gives editors enough options. AI video tools generate B-roll from your script, so you focus on clear visual descriptions.

Do AI video tools use A-roll and B-roll?

Yes. AI platforms generate narration (A-roll) and pair it with contextual visuals, animations, and on-screen text (B-roll) from your documents. You don't need separate film shoots or stock licensing for most training and explainer use cases—the AI produces both layers automatically.

What are common B-roll mistakes to avoid?

Mismatched visuals (B-roll that contradicts the narration), overusing generic stock (makes the video feel impersonal), treating B-roll as filler (every clip should serve a purpose), ignoring audio continuity (use consistent music or room tone), and mismatching pacing (fast narration with slow B-roll or vice versa) all weaken the final video.