How It Works — Luke at the Roost

The Anatomy of an AI Caller

A Person Is Born

Every caller starts as a blank slate. The system generates a complete identity: name, age, job, hometown, and personality. Each caller gets a unique speaking style — some ramble, some are blunt, some deflect with humor. They have relationships, vehicles, strong food opinions, nostalgic memories, and reasons for being up this late. They know what they were watching on TV, what errand they ran today, and what song was on the radio before they called.

But it goes deeper than backstory. Every caller is built with a structured call shape — maybe an escalating reveal where they start casual and drop a bombshell halfway through, a bait-and-switch where the real issue isn't what they said at first, or a slow burn that builds to an emotional peak. They have energy levels, emotional states, and signature details — a phrase they keep coming back to, a nervous tic in how they talk, a specific detail that makes the whole thing feel real. And each caller is matched to a voice that fits their personality. A 60-year-old trucker from Lordsburg doesn't sound like a 23-year-old barista from Tucson.

Some callers become regulars. The system tracks returning callers across episodes — they remember past conversations, reference things they talked about before, and their stories evolve over time. You'll hear Leon check in about going back to school, or Shaniqua update you on her situation at work. They're not reset between shows.

And some callers are drunk, high, or flat-out unhinged. They'll call with conspiracy theories about pigeons being government drones, existential crises about whether fish know they're wet, or to confess they accidentally set their kitchen on fire trying to make grilled cheese at 3 AM.

Unique Names 160

Voice Profiles 68

Call Shapes 8 types

Returning Regulars 12 callers

They Know Their World

Callers know real facts about where they live — the restaurants, the highways, the local gossip. The system has deep knowledge of 55 real towns across New Mexico and Arizona. When a caller says they're from Lordsburg, they actually know about the Shakespeare ghost town and the drive to Deming. They know the current weather outside their window, what day of the week it is, whether it's monsoon season or chile harvest. They have strong opinions about where to get the best green chile and get nostalgic about how their town used to be. The system also pulls in real-time news so callers can reference things that actually happened today.

They Have a Reason to Call

Some callers have a problem — a fight with a neighbor, a situation at work, something weighing on them at 2 AM. Others call to geek out about Severance, argue about poker strategy, or share something they read about quantum physics. The system draws from over 1,000 unique calling reasons across dozens of categories — problems, stories, advice-seeking, gossip, and deep-dive topics. Every caller has a purpose, not just a script.

The whole thing is tuned for comedy. Not "AI tries to be funny" comedy — more like the energy of late-night call-in radio meets stand-up meets the kind of confessions you only hear at 2 AM. Some calls are genuinely heartfelt. Some are absurd. Some start serious and go completely sideways. The system knows how to build a call for comedic timing — when to hold back a detail, when to escalate, when to let the awkward silence do the work. It's not random chaos; it's structured chaos.

70% Need advice

30% Want to talk about something

The Conversation Is Real

Luke talks to each caller using push-to-talk, just like a real radio show. His voice is transcribed in real time, sent to an AI that responds in character, and then converted to speech using a voice engine — all in a few seconds. The AI doesn't just answer questions; it reacts, gets emotional, goes on tangents, and remembers what was said earlier in the show.

Callers don't just exist in isolation — the show tracks what's been discussed and matches callers thematically. If someone just called about a messy divorce, the next caller who references marriage didn't pick that topic randomly. The system scores previous callers by topic overlap and decides whether the new caller should reference them, disagree with them, or build on what they said. It tracks the show's overall energy so the pacing doesn't flatline — a heavy emotional call might be followed by something lighter, and vice versa.

And when a call has run its course, Luke can hit "Wrap It Up" — a signal that tells the caller to wind things down gracefully. Instead of an abrupt hang-up, the caller gets the hint and starts wrapping up their thought, says their goodbyes, and exits naturally. Just like a real radio host giving the "time's up" hand signal through the glass.

Real Callers Call In Too

When you dial 208-439-LUKE, your call goes into a live queue. Luke sees you waiting and can take your call right from the control room. Your voice streams in real time — no pre-recording, no delay. You're live on the show, talking to Luke, and the AI callers might even react to what you said. And if Luke isn't live, you can leave a voicemail — it gets transcribed and may get played on a future episode.

Listener Emails

Listeners can send emails to submissions@lukeattheroost.com and have them read on the show. A background poller checks for new messages every 30 seconds — they show up in the control room as soon as they arrive. Luke can read them himself on the mic, or hit a button to have an AI voice read them aloud on the caller channel. It's like a call-in show meets a letters segment — listeners who can't call in can still be part of the conversation.

Devon the Intern

Every show needs someone to yell at. Devon is the show's intern — a 23-year-old NMSU grad who's way too eager, occasionally useful, and frequently wrong. He's not a caller; he's a permanent fixture of the show. When Luke needs a fact checked, a topic researched, or someone to blame for a technical issue, Devon's there.

Devon has real tools. He can search the web, pull up news headlines, look things up on Wikipedia, and read articles — all live during the show. When a caller claims that octopuses have three hearts, Devon's already looking it up. Sometimes he interjects on his own when he thinks he has something useful to add. Sometimes he's right. Sometimes Luke tells him to shut up. He monitors conversations in the background and pipes up with suggestions that the host can play or dismiss. He's the kind of intern who tries really hard and occasionally nails it.

The Control Room

The entire show runs through a custom-built control panel. Luke manages callers, plays music and sound effects, runs ads and station idents, monitors the call queue, and controls everything from one screen. Audio is routed across seven independent channels simultaneously — host mic, AI caller voices, live phone audio, music, sound effects, ads, and station idents all on separate tracks. The website shows a live on-air indicator so listeners know when to call in.

Audio Channels 7 independent

Caller Slots 10 per session

Phone System VoIP + WebSocket

Live Status Real-time CDN

Live Show

Luke (Host)

AI Callers

Real Callers

Voicemails

Listener Emails

↓

Control Room

↓

LLM Dialog

Voice Synthesis

Live Data

Audio Router

Phone System

Ad Engine

Multi-Stem Recorder

↓

Post-Production

Compression & Ducking

Loudness Normalization

Transcription

↓

Publishing

Podcast Server

CDN Edge Network

Website

Social Clips

Monitoring

↓

Distribution

Spotify

Apple

YouTube

RSS

Instagram

Facebook

Bluesky

Mastodon

Nostr

Threads

TikTok

Analytics

From Live Show to Podcast

Multi-Stem Recording

During every show, the system records six separate audio stems simultaneously: host microphone, AI caller voices, music, sound effects, ads, and station idents. Each stem is captured as an independent WAV file with sample-accurate alignment. This gives full control over the final mix — like having a recording studio's multitrack session, not just a flat recording.

Stems Captured 6 parallel

Format 48kHz WAV

Sync Method Time-aligned

Architecture Lock-free I/O

Dialog Editing in REAPER

Before the automated pipeline runs, the raw stems are loaded into REAPER for dialog editing. A custom Lua script analyzes voice tracks to detect silence gaps — the dead air between caller responses, TTS latency pauses, and gaps where Luke is reading the control room. The script strips these silences and ripple-edits all tracks in sync so ads, idents, and music shift with the dialog cuts. Protected regions marked as ads or idents are preserved — the script knows not to remove silence during an ad break even if the voice tracks are quiet. This tightens a raw two-hour session into a focused episode without cutting any content.

Post-Production Pipeline

Once the show ends, a 15-step automated pipeline processes the raw stems into a broadcast-ready episode. Ads and sound effects are hard-limited to prevent clipping. The host mic gets a high-pass filter, de-essing, and breath reduction. Voice tracks are compressed — the host gets aggressive spoken-word compression for consistent levels, callers get telephone EQ to sound like real phone calls. All stems are level-matched, music is ducked under dialog and muted during ads, then everything is mixed to stereo with panning and width. A bus compressor glues the final mix together before silence trimming, fades, and EBU R128 loudness normalization.

Pipeline Steps 15

Loudness Target -16 LUFS

Loudness Range ~5.5 LU

Output Stereo MP3

Automated Publishing

A single command takes a finished episode and handles everything: the audio is transcribed using MLX Whisper running on Apple Silicon GPU to generate full-text transcripts, then an LLM analyzes the transcript to write the episode title, description, and chapter markers with timestamps. The episode is uploaded to the podcast server and directly to YouTube with chapters baked into the description. Chapters and transcripts are attached to the RSS metadata, all media is synced to a global CDN, and social posts are pushed to eight platforms — all from one command.

Transcription MLX Whisper (GPU)

Metadata LLM-generated

Chapters Auto-detected

Deploy Time ~2 min

Automated Social Clips

No manual editing, no scheduling tools. After each episode, an LLM reads the full transcript and picks the best moments — funny exchanges, wild confessions, heated debates. Each clip is automatically extracted, transcribed with word-level timestamps, then polished by a second LLM pass that fixes punctuation, capitalization, and misheard words while preserving timing. The clips are rendered as vertical video with speaker-labeled captions and the show's branding. A third LLM writes platform-specific descriptions and hashtags. Then clips are uploaded directly to YouTube Shorts and Bluesky via their APIs, and pushed to Instagram Reels, Facebook Reels, Mastodon, Nostr, LinkedIn, Threads, and TikTok — nine platforms, zero manual work.

Human Effort Zero

Video Format 1080x1920 MP4

Captions LLM-polished

Simultaneous Push 9 platforms

Global Distribution

Episodes are served through a CDN edge network for fast, reliable playback worldwide. The RSS feed is automatically updated and picked up by Spotify, Apple Podcasts, YouTube, and every other podcast app. The website pulls the live feed to show episodes with embedded playback, full transcripts, and chapter navigation — all served through Cloudflare with edge caching. From recording to available on every platform, the whole pipeline is automated end-to-end.

Audio Delivery Global CDN

Website Cloudflare Edge

Platforms 5+ directories

Feed Format RSS + Podcast 2.0

What Makes This Different

Not Scripted

Every conversation is improvised. Luke doesn't know what the caller is going to say. The AI doesn't follow a script. It's a real conversation between a human and an AI character who has a life, opinions, and something on their mind.

Built From Scratch

This isn't an app with a plugin. Every piece — the caller generator, the voice engine, the control room, the phone system, the post-production pipeline, the publishing automation — was built specifically for this show.

Real Time

Everything happens live. Caller generation, voice synthesis, news lookups, weather checks, phone routing — all in real time during the show. There's no post-production trickery on the caller side. What you hear is what happened.

They Listen to Each Other

Callers aren't isolated — the system matches callers thematically to what's already been discussed. A caller might disagree with the last guy, back someone up, or call in because something another caller said hit close to home. The show tracks energy and pacing so conversations build naturally, not randomly.

Broadcast-Grade Audio

Every episode runs through a 15-step post-production pipeline: stem limiting, high-pass filtering, de-essing, breath reduction, spoken-word compression, telephone EQ, level matching, music ducking with ad muting, stereo imaging, bus compression, and EBU R128 loudness normalization.

Fully Automated Pipeline

From recording to your podcast app, the entire pipeline is automated. Post-production kicks off when the show ends, then a publish script handles transcription, AI-generated metadata, chapter detection, CDN sync, and RSS distribution — all with a single command.

Post-Production in Action

The entire post-production pipeline runs automatically through Reaper scripting. Silence removal, ad ducking, and EBU R128 loudness normalization — all triggered with a single command when the show ends.

Want to hear it for yourself?

Listen to Episodes

Or call in live: 208-439-LUKE

Support the Show