Caller generation overhaul, Devon intern, frontend redesign

Caller system: structured JSON backgrounds, voice-personality matching (68 profiles),
thematic inter-caller awareness, adaptive call shapes, show pacing, returning caller
memory with relationships/arcs, post-call quality signals, 95 comedy writer entries.

Devon the Intern: persistent show character with tool-calling LLM (web search, Wikipedia,
headlines, webpage fetch), auto-monitoring, 6 API endpoints, full frontend UI.

Frontend: wrap-up nudge button, caller info panel with shape/energy/emotion badges,
keyboard shortcuts (1-0/H/W/M/D), pinned SFX, visual polish, Devon panel.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-03-14 01:54:08 -06:00
parent d3490e1521
commit 6d4e490283
10 changed files with 2776 additions and 179 deletions

View File

@@ -4,7 +4,7 @@
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>How It Works — Luke at the Roost</title>
<meta name="description" content="How Luke at the Roost works: AI-generated callers with unique personalities, real phone calls, voice synthesis, multi-stem recording, and automated post-production.">
<meta name="description" content="How Luke at the Roost works: AI-generated callers with structured personalities, comedy-tuned call shapes, a live research intern, voice-personality matching, multi-stem recording, and automated post-production.">
<meta name="theme-color" content="#1a1209">
<link rel="canonical" href="https://lukeattheroost.com/how-it-works">
@@ -79,6 +79,7 @@
<div class="hiw-step-content">
<h3>A Person Is Born</h3>
<p>Every caller starts as a blank slate. The system generates a complete identity: name, age, job, hometown, and personality. Each caller gets a unique speaking style — some ramble, some are blunt, some deflect with humor. They have relationships, vehicles, strong food opinions, nostalgic memories, and reasons for being up this late. They know what they were watching on TV, what errand they ran today, and what song was on the radio before they called.</p>
<p>But it goes deeper than backstory. Every caller is built with a structured call shape — maybe an escalating reveal where they start casual and drop a bombshell halfway through, a bait-and-switch where the real issue isn't what they said at first, or a slow burn that builds to an emotional peak. They have energy levels, emotional states, and signature details — a phrase they keep coming back to, a nervous tic in how they talk, a specific detail that makes the whole thing feel real. And each caller is matched to a voice that fits their personality. A 60-year-old trucker from Lordsburg doesn't sound like a 23-year-old barista from Tucson.</p>
<p>Some callers become regulars. The system tracks returning callers across episodes — they remember past conversations, reference things they talked about before, and their stories evolve over time. You'll hear Leon check in about going back to school, or Shaniqua update you on her situation at work. They're not reset between shows.</p>
<p>And some callers are drunk, high, or flat-out unhinged. They'll call with conspiracy theories about pigeons being government drones, existential crises about whether fish know they're wet, or to confess they accidentally set their kitchen on fire trying to make grilled cheese at 3 AM.</p>
<div class="hiw-detail-grid">
@@ -87,12 +88,12 @@
<span class="hiw-detail-value">160</span>
</div>
<div class="hiw-detail">
<span class="hiw-detail-label">Personality Layers</span>
<span class="hiw-detail-value">300+</span>
<span class="hiw-detail-label">Voice Profiles</span>
<span class="hiw-detail-value">68</span>
</div>
<div class="hiw-detail">
<span class="hiw-detail-label">Towns with Real Knowledge</span>
<span class="hiw-detail-value">55</span>
<span class="hiw-detail-label">Call Shapes</span>
<span class="hiw-detail-value">8 types</span>
</div>
<div class="hiw-detail">
<span class="hiw-detail-label">Returning Regulars</span>
@@ -115,6 +116,7 @@
<div class="hiw-step-content">
<h3>They Have a Reason to Call</h3>
<p>Some callers have a problem — a fight with a neighbor, a situation at work, something weighing on them at 2 AM. Others call to geek out about Severance, argue about poker strategy, or share something they read about quantum physics. The system draws from over 1,000 unique calling reasons across dozens of categories — problems, stories, advice-seeking, gossip, and deep-dive topics. Every caller has a purpose, not just a script.</p>
<p>The whole thing is tuned for comedy. Not "AI tries to be funny" comedy — more like the energy of late-night call-in radio meets stand-up meets the kind of confessions you only hear at 2 AM. Some calls are genuinely heartfelt. Some are absurd. Some start serious and go completely sideways. The system knows how to build a call for comedic timing — when to hold back a detail, when to escalate, when to let the awkward silence do the work. It's not random chaos; it's structured chaos.</p>
<div class="hiw-split-stat">
<div class="hiw-stat">
<span class="hiw-stat-number">70%</span>
@@ -132,7 +134,9 @@
<div class="hiw-step-number">4</div>
<div class="hiw-step-content">
<h3>The Conversation Is Real</h3>
<p>Luke talks to each caller using push-to-talk, just like a real radio show. His voice is transcribed in real time, sent to an AI that responds in character, and then converted to speech using a voice engine — all in a few seconds. The AI doesn't just answer questions; it reacts, gets emotional, goes on tangents, and remembers what was said earlier in the show. Callers even react to previous callers — "Hey Luke, I heard that guy Tony earlier and I got to say, he's full of it." It makes the show feel like a living community, not isolated calls.</p>
<p>Luke talks to each caller using push-to-talk, just like a real radio show. His voice is transcribed in real time, sent to an AI that responds in character, and then converted to speech using a voice engine — all in a few seconds. The AI doesn't just answer questions; it reacts, gets emotional, goes on tangents, and remembers what was said earlier in the show.</p>
<p>Callers don't just exist in isolation — the show tracks what's been discussed and matches callers thematically. If someone just called about a messy divorce, the next caller who references marriage didn't pick that topic randomly. The system scores previous callers by topic overlap and decides whether the new caller should reference them, disagree with them, or build on what they said. It tracks the show's overall energy so the pacing doesn't flatline — a heavy emotional call might be followed by something lighter, and vice versa.</p>
<p>And when a call has run its course, Luke can hit "Wrap It Up" — a signal that tells the caller to wind things down gracefully. Instead of an abrupt hang-up, the caller gets the hint and starts wrapping up their thought, says their goodbyes, and exits naturally. Just like a real radio host giving the "time's up" hand signal through the glass.</p>
</div>
</div>
@@ -154,6 +158,15 @@
<div class="hiw-step">
<div class="hiw-step-number">7</div>
<div class="hiw-step-content">
<h3>Devon the Intern</h3>
<p>Every show needs someone to yell at. Devon is the show's intern — a 23-year-old NMSU grad who's way too eager, occasionally useful, and frequently wrong. He's not a caller; he's a permanent fixture of the show. When Luke needs a fact checked, a topic researched, or someone to blame for a technical issue, Devon's there.</p>
<p>Devon has real tools. He can search the web, pull up news headlines, look things up on Wikipedia, and read articles — all live during the show. When a caller claims that octopuses have three hearts, Devon's already looking it up. Sometimes he interjects on his own when he thinks he has something useful to add. Sometimes he's right. Sometimes Luke tells him to shut up. He monitors conversations in the background and pipes up with suggestions that the host can play or dismiss. He's the kind of intern who tries really hard and occasionally nails it.</p>
</div>
</div>
<div class="hiw-step">
<div class="hiw-step-number">8</div>
<div class="hiw-step-content">
<h3>The Control Room</h3>
<p>The entire show runs through a custom-built control panel. Luke manages callers, plays music and sound effects, runs ads and station idents, monitors the call queue, and controls everything from one screen. Audio is routed across seven independent channels simultaneously — host mic, AI caller voices, live phone audio, music, sound effects, ads, and station idents all on separate tracks. The website shows a live on-air indicator so listeners know when to call in.</p>
@@ -428,7 +441,7 @@
<div class="hiw-steps">
<div class="hiw-step">
<div class="hiw-step-number">8</div>
<div class="hiw-step-number">9</div>
<div class="hiw-step-content">
<h3>Multi-Stem Recording</h3>
<p>During every show, the system records six separate audio stems simultaneously: host microphone, AI caller voices, music, sound effects, ads, and station idents. Each stem is captured as an independent WAV file with sample-accurate alignment. This gives full control over the final mix — like having a recording studio's multitrack session, not just a flat recording.</p>
@@ -454,7 +467,7 @@
</div>
<div class="hiw-step">
<div class="hiw-step-number">9</div>
<div class="hiw-step-number">10</div>
<div class="hiw-step-content">
<h3>Dialog Editing in REAPER</h3>
<p>Before the automated pipeline runs, the raw stems are loaded into REAPER for dialog editing. A custom Lua script analyzes voice tracks to detect silence gaps — the dead air between caller responses, TTS latency pauses, and gaps where Luke is reading the control room. The script strips these silences and ripple-edits all tracks in sync so ads, idents, and music shift with the dialog cuts. Protected regions marked as ads or idents are preserved — the script knows not to remove silence during an ad break even if the voice tracks are quiet. This tightens a raw two-hour session into a focused episode without cutting any content.</p>
@@ -462,7 +475,7 @@
</div>
<div class="hiw-step">
<div class="hiw-step-number">10</div>
<div class="hiw-step-number">11</div>
<div class="hiw-step-content">
<h3>Post-Production Pipeline</h3>
<p>Once the show ends, a 15-step automated pipeline processes the raw stems into a broadcast-ready episode. Ads and sound effects are hard-limited to prevent clipping. The host mic gets a high-pass filter, de-essing, and breath reduction. Voice tracks are compressed — the host gets aggressive spoken-word compression for consistent levels, callers get telephone EQ to sound like real phone calls. All stems are level-matched, music is ducked under dialog and muted during ads, then everything is mixed to stereo with panning and width. A bus compressor glues the final mix together before silence trimming, fades, and EBU R128 loudness normalization.</p>
@@ -488,7 +501,7 @@
</div>
<div class="hiw-step">
<div class="hiw-step-number">11</div>
<div class="hiw-step-number">12</div>
<div class="hiw-step-content">
<h3>Automated Publishing</h3>
<p>A single command takes a finished episode and handles everything: the audio is transcribed using MLX Whisper running on Apple Silicon GPU to generate full-text transcripts, then an LLM analyzes the transcript to write the episode title, description, and chapter markers with timestamps. The episode is uploaded to the podcast server and directly to YouTube with chapters baked into the description. Chapters and transcripts are attached to the RSS metadata, all media is synced to a global CDN, and social posts are pushed to eight platforms — all from one command.</p>
@@ -514,7 +527,7 @@
</div>
<div class="hiw-step">
<div class="hiw-step-number">12</div>
<div class="hiw-step-number">13</div>
<div class="hiw-step-content">
<h3>Automated Social Clips</h3>
<p>No manual editing, no scheduling tools. After each episode, an LLM reads the full transcript and picks the best moments — funny exchanges, wild confessions, heated debates. Each clip is automatically extracted, transcribed with word-level timestamps, then polished by a second LLM pass that fixes punctuation, capitalization, and misheard words while preserving timing. The clips are rendered as vertical video with speaker-labeled captions and the show's branding. A third LLM writes platform-specific descriptions and hashtags. Then clips are uploaded directly to YouTube Shorts and Bluesky via their APIs, and pushed to Instagram Reels, Facebook Reels, Mastodon, Nostr, LinkedIn, Threads, and TikTok — nine platforms, zero manual work.</p>
@@ -540,7 +553,7 @@
</div>
<div class="hiw-step">
<div class="hiw-step-number">13</div>
<div class="hiw-step-number">14</div>
<div class="hiw-step-content">
<h3>Global Distribution</h3>
<p>Episodes are served through a CDN edge network for fast, reliable playback worldwide. The RSS feed is automatically updated and picked up by Spotify, Apple Podcasts, YouTube, and every other podcast app. The website pulls the live feed to show episodes with embedded playback, full transcripts, and chapter navigation — all served through Cloudflare with edge caching. From recording to available on every platform, the whole pipeline is automated end-to-end.</p>
@@ -597,7 +610,7 @@
<svg viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2"><path d="M17 21v-2a4 4 0 0 0-4-4H5a4 4 0 0 0-4 4v2"/><circle cx="9" cy="7" r="4"/><path d="M23 21v-2a4 4 0 0 0-3-3.87"/><path d="M16 3.13a4 4 0 0 1 0 7.75"/></svg>
</div>
<h3>They Listen to Each Other</h3>
<p>Callers aren't isolated — they hear what happened earlier in the show. A caller might disagree with the last guy, back someone up, or call in specifically because of something another caller said. The show builds on itself.</p>
<p>Callers aren't isolated — the system matches callers thematically to what's already been discussed. A caller might disagree with the last guy, back someone up, or call in because something another caller said hit close to home. The show tracks energy and pacing so conversations build naturally, not randomly.</p>
</div>
<div class="hiw-feature">
<div class="hiw-feature-icon">