Resources

Talk Your Demo Into Existence: Introducing 'show-n-tell

‍

[TL;DR: show-n-tell — open-source AgentSkill, GitHub repo + quick start]

Github-Carver-show-n-tell

‍

Every product launch needs a walkthrough video. Nobody wants to make one. You sit down to "just record a quick demo," and three hours later you're still re-taking the third click, the narration sounds like a hostage video, and the brand badge is the wrong shade of green. We built "show-n-tell" to short-circuit that loop. It's an open-source AgentSkill that turns any website — yours, a customer's, a demo environment — into a narrated, branded video in about 20 minutes. You describe the story in plain English, review the storyboard like a numbered list, and ship a finished video with logo, captions, and optional background music. When the UI changes next week, you tweak a line and re-render in a minute. Nothing about the process feels like "recording."

Want to see what it produces? Here's the launch demo for show-n-tell, made by show-n-tell itself.

‍

Why this exists (and why now)

Demo videos are a tax that every product team pays and nobody enjoys:

Every launch, every new feature, every release notes post needs a walkthrough — and the format keeps growing (Twitter, LinkedIn, embedded in docs, in the onboarding flow, in sales decks).

Screen recordings "rot". The UI moves, a button color changes, an onboarding flow gets reshuffled, and your 6-week-old "evergreen" video is suddenly misleading. The only fix is to re-record from scratch — script, narration, edits, the whole pipeline.

Recording tools (Loom, ScreenStudio, the OS recorder) leave you stuck with the "take". There's no way to tweak narration tone, swap the music, or relocate a brand badge without redoing the recording.

Generic AI voiceover tools (Synthesia, ElevenLabs et al.) hand you narration but no site-walking, no brand awareness, no captions matched to your beats. You still have to glue everything together yourself.

"Just have someone make it" trades one cost for another — turnaround time. The launch ships before the video does.

‍

The result is that the videos that do get made are usually one-take screen captures with the founder's voice over them, and the videos that should get made — the second one for the technical audience, the shorter one for social, the one with the corrected stat — never happen.

The problem isn't recording technology. It's the "operational loop" around producing and updating a video.

show-n-tell addresses the operational loop.

‍

Design goals

Conversational, not configured. You describe the demo in English, in a chat window. No video editor, no timeline, no YAML to hand-edit.
Site-aware. The skill drives a real browser through your site. It sees the same pages your users do — no mocks, no static screenshots stitched together.
Brand-aware by default. Logo, color palette, intro/outro slides, captions, optional background music — wired in from a single source you provide once and reuse across demos.
Cheap to iterate. Most edits — narration tone, music swap, color tweaks, captions on/off — don't re-record anything. They re-finalize in a minute.
Deterministic and inspectable. The whole pipeline is three YAML files and eight small Python scripts. You can read every step. Nothing is locked behind an editor's UI.
Local-first. Recording, encoding, captioning, brand overlay, music mixing all run on your machine (or in your Cowork sandbox). The only thing that leaves is the narration text → OpenAI's TTS endpoint.
Two runtimes, one skill. Works in Claude Code (the CLI) for developers, and in Claude Cowork (the desktop app) for everyone else. Same prompts, same review loop, same output.

‍

What the skill does

show-n-tell replaces the ad-hoc "open OBS, click around, talk over it" ritual with a structured pipeline you stay in conversation with:

Interview (plain English)

→ Explore (walks the site, takes screenshots)

→ Storyboard (drafts beats, you review like a list)

→ Generate (TTS + record + mux + brand + finalize)

→ Verify (frames spot-checked against narration)

→ Hand off (mp4 + working dir for cheap re-runs)

Each stage hands work back to you at the right moment, and only at the right moment. You're not editing video; you're approving a story.

‍

Interview

The first thing Claude asks for is the story you want to tell — one to three sentences, in English. What features matter? What should the viewer walk away knowing? Then four more inputs: target URL, audience and tone, logo (file path or URL), login flow if any. The skill "refuses to start" without intent and a logo. Generic demos are bad demos, and bad demos are worse than no demos.

‍

Explore

Before drafting anything, the skill walks the site — Playwright loads each page, takes screenshots, reads the DOM for stable selectors. The point isn't to crawl exhaustively. It's to ground the storyboard in what's *actually* on screen so the narration doesn't claim things that aren't visible.

‍

Storyboard

You get a numbered list of beats — usually 10 for a 2-minute demo, 25–30 for a 5-minute one. Each beat is one camera move (click, scroll, hover, navigate) plus 1–3 sentences of narration. The whole storyboard fits on a chat-window screen. You change things by saying "skip beat 7," "make the tone less formal," "mention the pricing somewhere." The skill rewrites and re-presents. You iterate until you say "build it." No YAML, no timeline.

‍

Generate

Once approved, the pipeline runs end-to-end: OpenAI TTS for each beat (support for more TTS providers coming soon), Playwright drives the browser through the storyboard while the recording runs, ffmpeg muxes the audio in sync, applies an optional speed-up, overlays the brand badge with a live waveform, generates intro/outro slides, burns or sidecars captions, mixes in optional background music with sidechain ducking under the narration.

‍

Verify

This is the step that separates a real demo from a hallucination. After the video is built, the skill extracts frames at multiple timestamps, reads each one, and confirms that what the narrator says actually matches what's on screen at that beat. If the narrator says "five releases" but the screen shows "3," it surfaces the mismatch and instructs the skill to fix it.

Spoiler: the skill caught two of these in the demo above before we shipped it.

‍

Hand off

You get an mp4 video, a working directory with everything intermediate preserved, and a one-line iteration guide. Want the music quieter? Edit one line, re-finalize in 60 seconds. Want a casual rephrasing of the third beat? Tweak the narration, re-render only the changed beats, mux back together.

‍

Quick start

‍

Claude Code (CLI)

git clone https://github.com/carveragents/show-n-tell \

~/.claude/skills/show-n-tell

cd ~/.claude/skills/show-n-tell

uv sync

uv run playwright install chromium

That's it. Open Claude Code in any folder, type:

Let's create a demo video for the site I'm working on.

‍

Claude Cowork

git clone https://github.com/carveragents/show-n-tell /tmp/show-n-tell

bash /tmp/show-n-tell/tools/make-plugin.sh ~/Downloads

Drop the `show-n-tell.plugin` file into a Cowork chat, click "Install". Then type the same prompt. The Cowork sandbox already has `ffmpeg`, `uv`, and Chromium.

‍

Example: how this very blog post's video got made

The 2-minute walkthrough embedded at the top of this post is itself a show-n-tell output. Here's the entire conversation, abbreviated:

1. Intent. "Demo the show-n-tell GitHub README for a 2-minute launch video, non-technical audience, casual tone."* The skill asked for the logo (we pointed it at Carver Agents' wordmark) and confirmed no auth was needed for a public GitHub page.

2. Storyboard draft. The skill walked the rendered README, drafted 10 beats — hero, problem framing, "where you can run this," install, "your first demo," required inputs, the YAML behind the scenes, the iteration recipes, the bundled music library, and an outro CTA — and presented them as a numbered list.

3. Plain-English iteration. We made a few edits. "Combine the 'where you can run it' beat with the install beat, around 11 seconds." "Don't mention the cost in dollars." "Add a stinger at the end: 'this video was generated by the skill itself, from the README you just saw.'" The skill rewrote the storyboard in place.

4. Render. The full pipeline took about 4 minutes — TTS for 10 beats (~$0.09 on the OpenAI bill), recording, mux, badge overlay, intro/outro slides, background music (`warm` mood — "Casa Noir" by Quantum Jazz, CC-BY-SA from the bundled library), finalize.

5. Two more rounds of polish. "The corner badge overlaps the README content — move it to the right where GitHub leaves white space."* (Skill re-ran two scripts, ~90 seconds.) "The section headings are landing under GitHub's sticky tab bar. Add a scroll margin." (Skill added four lines of CSS to `recording_css`, re-record the affected chunks, ~5 minutes total.)

Total wall-clock from "let's make a video" to shipped mp4 video: under 30 minutes, including the polish rounds. If the README changes tomorrow, the same conversation re-runs in 10.

‍

Why conversational beats traditional demo tools

| --- | --- | --- | --- |

| Update for a UI change | Re-record from scratch | N/A | Skill re-records only the affected beats |

| Verify narration matches what's on screen | Eyeball it | N/A | Vision LLM frame-check before ship |

‍

The difference isn't the recording tech — it's that "everything downstream" of the recording is parameterized, deterministic, and re-runnable.

‍

Under the hood

show-n-tell/

├── README.md

├── SKILL.md # the playbook Claude follows

├── pyproject.toml

├── .claude-plugin/

│ └── plugin.json # Cowork plugin manifest

├── scripts/ # 8 small Python scripts, one per pipeline stage

│ ├── make_overlay.py

│ ├── render_voiceover.py

│ ├── record_demo.py

│ ├── mux_demo.py

│ ├── speed_video.py

│ ├── brand_video.py

│ ├── make_intro_outro.py

│ ├── make_captions.py

│ └── finalize_video.py

├── helpers/ # auth capture, page exploration, PDF wrappers

├── recipes/ # HTML templates (intro/outro slides, PDF viewer)

├── templates/ # storyboard / branding / demo_config starters

├── examples/ # canonical complete demos (halyard-spme, oauth, login)

├── docs/ # SCHEMAS.md, GOTCHAS.md, CONTEXT.md

├── tools/make-plugin.sh # builds the .plugin for Cowork

└── _assets/bg_music/ # bundled royalty-free library, 6 moods

├── library.json

├── upbeat_morning.mp3

├── warm_acoustic_loop.mp3

├── calm_ambient_piano.mp3

├── playful_uke_strum.mp3

├── cinematic_build.mp3

└── tech_modern_pulse.mp3

‍

Key implementation details

Three YAML files drive everything. `storyboard.yaml` (the beats + actions, auto-written by the skill from your conversation), `branding.yaml` (logo, colors, voice, overlay positioning — reusable across demos from the same brand), `demo_config.yaml` (per-demo: URL, output filename, feature flags like captions and intro/outro). You never need to mess with these files, but you can if you want to.

Eight small Python scripts. Each is under 500 lines, runnable on its own with `uv run scripts/<name>.py`. You can hack on one stage without breaking the others.

Playwright for site exploration and screen recording. Headless Chromium, real browser, real DOM — what you record is what your users see.

OpenAI TTS for narration. Two voices ship by default: `cedar` (calm, authoritative — product/B2B) and `marin` (brighter, warmer — consumer/casual). About $0.10 per 3–5 minute demo.

ffmpeg for everything downstream — mux, speed-up, badge overlay with live waveform, intro/outro slide generation, caption burn-in, music mixing with sidechain ducking.

Bundled music library, six moods (`upbeat`, `warm`, `calm`, `playful`, `cinematic`, `tech`), all curated CC-BY licensed tracks from Jamendo. Or point at your own audio files.

Diff-aware regeneration. Change one beat's narration, only that beat's TTS regenerates. Change the brand color, only the badge re-renders. Change the music mood, only the final mix re-runs. The pipeline knows what's stale.

‍

Cowork or Claude Code — same skill, two runtimes

The same skill works in two places:

Claude Code (CLI). Developers `git clone` into `~/.claude/skills/`, `uv sync`, run. Best when you're already in a terminal and want demos to land in `~/demo-videos/`.

Claude Cowork. Anyone — marketing, PMs, support, designers — drops `show-n-tell.plugin` into a chat, clicks Install. The Cowork sandbox provides the runtime; output lands in the directory Cowork mounts. No terminal involved.

The same `SKILL.md` drives both. For Cowork, the repo carries a `skills/show-n-tell/` symlink tree that `tools/make-plugin.sh` dereferences at build time, producing a single `.plugin` file that's just a zip of the same content laid out as Cowork expects. One source of truth; two install paths.

‍

Next steps

Mobile-portrait output. Today the default is 1440×900 (desktop / YouTube / Loom). 9:16 for mobile-first social is on the roadmap.

Multi-language captions. Narration in non-English works (the OpenAI voices have decent multilingual support); caption auto-generation hasn't been tuned for non-Latin scripts yet.

Voice cloning. Letting brands ship the founder's voice instead of `cedar` / `marin`.

Custom intro/outro templates.** The slide renderer is currently template-locked; opening it up to brand-specific layouts is a natural extension.

Multi-page demos with rich login flows. Form-based and OAuth flows already work; we want to add templates for common SaaS patterns (Cognito, Auth0, Clerk, Okta).

‍

Get started

If your next launch is going to need a demo video — or if you've already shipped one and dread updating it — try show-n-tell on it. Twenty minutes from now you'll have a finished mp4 video with your brand on it, and a working directory you can re-run from in a minute when something changes.

GitHub Repo + Quick Start: [github.com/carveragents/show-n-tell]

‍

References & inspiration

[Loom](https://www.loom.com) — the format we're imitating (and the bar for "feels like a real demo video").

[Playwright](https://playwright.dev) — the browser-automation engine that does the site-walking and screen recording.

[OpenAI TTS](https://platform.openai.com/docs/guides/text-to-speech) — the narration voices (`cedar`, `marin`).

[ffmpeg](https://ffmpeg.org) — does almost everything else.

[Claude Code](https://claude.com/claude-code) — the agentic CLI that hosts the skill.

[Claude Cowork](https://claude.com) — the desktop runtime where the same skill works for non-developers.

[Jamendo](https://www.jamendo.com) — the source for the bundled CC-BY music library.

‍

United States
447 Broadway,
2nd Floor Suite #563,
New York 10013

Listen to Carver Conversations on Moltbook

AI Regulatory OS
Regulatory Intelligence
Regulatory Sources
Regulatory Platforms
Horizon Scanning
Regulatory Monitoring
Intelligence vs Compliance

Pricing
Podcasts
Knowledge Base
Resources
Glossary
Use cases

Home
Solutions
RegWatch
Technology
About Us

hello@carveragents.ai
github.com/carveragents