The Anatomy of an AI Caller

1

A Person Is Born

Every caller starts as a blank slate. The system generates a complete identity: name, age, job, hometown, and personality. Each caller gets a unique speaking style — some ramble, some are blunt, some deflect with humor. They have relationships, vehicles, strong food opinions, nostalgic memories, and reasons for being up this late. They know what they were watching on TV, what errand they ran today, and what song was on the radio before they called.

Some callers become regulars. The system tracks returning callers across episodes — they remember past conversations, reference things they talked about before, and their stories evolve over time. You'll hear Carla update you on her divorce, or Carl check in about his gambling recovery. They're not reset between shows.

And some callers are drunk, high, or flat-out unhinged. They'll call with conspiracy theories about pigeons being government drones, existential crises about whether fish know they're wet, or to confess they accidentally set their kitchen on fire trying to make grilled cheese at 3 AM.

Unique Names 320
Personality Layers 189+
Towns with Real Knowledge 20
Returning Regulars 12 callers
2

They Know Their World

Callers know real facts about where they live — the restaurants, the highways, the local gossip. When a caller says they're from Lordsburg, they actually know about the Shakespeare ghost town and the drive to Deming. They know the current weather outside their window, what day of the week it is, whether it's monsoon season or chile harvest. They have strong opinions about where to get the best green chile and get nostalgic about how their town used to be. The system also pulls in real-time news so callers can reference things that actually happened today.

3

They Have a Reason to Call

Some callers have a problem — a fight with a neighbor, a situation at work, something weighing on them at 2 AM. Others call to geek out about Severance, argue about poker strategy, or share something they read about quantum physics. The system draws from over 570 discussion topics across dozens of categories and more than 1,400 life scenarios. Every caller has a purpose, not just a script.

70% Need advice
30% Want to talk about something
4

The Conversation Is Real

Luke talks to each caller using push-to-talk, just like a real radio show. His voice is transcribed in real time, sent to an AI that responds in character, and then converted to speech using a voice engine — all in a few seconds. The AI doesn't just answer questions; it reacts, gets emotional, goes on tangents, and remembers what was said earlier in the show. Callers even react to previous callers — "Hey Luke, I heard that guy Tony earlier and I got to say, he's full of it." It makes the show feel like a living community, not isolated calls.

5

Real Callers Call In Too

When you dial 208-439-LUKE, your call goes into a live queue. Luke sees you waiting and can take your call right from the control room. Your voice streams in real time — no pre-recording, no delay. You're live on the show, talking to Luke, and the AI callers might even react to what you said. And if Luke isn't live, you can leave a voicemail — it gets transcribed and may get played on a future episode.

6

Listener Emails

Listeners can send emails to submissions@lukeattheroost.com and have them read on the show. A background poller checks for new messages every 30 seconds — they show up in the control room as soon as they arrive. Luke can read them himself on the mic, or hit a button to have an AI voice read them aloud on the caller channel. It's like a call-in show meets a letters segment — listeners who can't call in can still be part of the conversation.

7

The Control Room

The entire show runs through a custom-built control panel. Luke manages callers, plays music and sound effects, runs ads, monitors the call queue, and controls everything from one screen. Audio is routed across multiple channels simultaneously — caller voices, music, sound effects, and live phone audio all on separate tracks. The website shows a live on-air indicator so listeners know when to call in.

Audio Channels 5 independent
Caller Slots 10 per session
Phone System VoIP + WebSocket
Live Status Real-time CDN
Live Show
Luke (Host)
AI Callers
Real Callers
Voicemails
Listener Emails
Control Room
LLM Dialog
Voice Synthesis
Live Data
Audio Router
Phone System
Ad Engine
Multi-Stem Recorder
Post-Production
Compression & Ducking
Loudness Normalization
Transcription
Publishing
Podcast Server
CDN Edge Network
Website
Social Clips
Monitoring
Distribution
Spotify
Apple
YouTube
RSS
Instagram
Facebook
Bluesky
Mastodon
Nostr
Analytics

From Live Show to Podcast

8

Multi-Stem Recording

During every show, the system records five separate audio stems simultaneously: host microphone, AI caller voices, music, sound effects, and ads. Each stem is captured as an independent WAV file with sample-accurate alignment. This gives full control over the final mix — like having a recording studio's multitrack session, not just a flat recording.

Stems Captured 5 parallel
Format 48kHz WAV
Sync Method Time-aligned
Architecture Lock-free I/O
9

Post-Production Pipeline

Once the show ends, a 15-step automated pipeline processes the raw stems into a broadcast-ready episode. Ads and sound effects are hard-limited to prevent clipping. The host mic gets a high-pass filter, de-essing, and breath reduction. Voice tracks are compressed — the host gets aggressive spoken-word compression for consistent levels, callers get telephone EQ to sound like real phone calls. All stems are level-matched, music is ducked under dialog and muted during ads, then everything is mixed to stereo with panning and width. A bus compressor glues the final mix together before silence trimming, fades, and EBU R128 loudness normalization.

Pipeline Steps 15
Loudness Target -16 LUFS
Loudness Range ~5.5 LU
Output Stereo MP3
10

Automated Publishing

A single command takes a finished episode and handles everything: the audio is transcribed using MLX Whisper running on Apple Silicon GPU to generate full-text transcripts, then an LLM analyzes the transcript to write the episode title, description, and chapter markers with timestamps. The episode is uploaded to the podcast server, chapters and transcripts are attached to the metadata, and all media is synced to a global CDN so listeners everywhere get fast downloads.

Transcription MLX Whisper (GPU)
Metadata LLM-generated
Chapters Auto-detected
Deploy Time ~2 min
11

Automated Social Clips

No manual editing, no scheduling tools. After each episode, an LLM reads the full transcript and picks the best moments — funny exchanges, wild confessions, heated debates. Each clip is automatically extracted, transcribed with word-level timestamps, then polished by a second LLM pass that fixes punctuation, capitalization, and misheard words while preserving timing. The clips are rendered as vertical video with speaker-labeled captions and the show's branding. A third LLM writes platform-specific descriptions and hashtags. Then clips are uploaded directly to YouTube Shorts and Bluesky via their APIs, and pushed to Instagram Reels, Facebook, and Mastodon — six platforms, zero manual work.

Human Effort Zero
Video Format 1080x1920 MP4
Captions LLM-polished
Simultaneous Push 6 platforms
12

Global Distribution

Episodes are served through a CDN edge network for fast, reliable playback worldwide. The RSS feed is automatically updated and picked up by Spotify, Apple Podcasts, YouTube, and every other podcast app. The website pulls the live feed to show episodes with embedded playback, full transcripts, and chapter navigation — all served through Cloudflare with edge caching. From recording to available on every platform, the whole pipeline is automated end-to-end.

Audio Delivery Global CDN
Website Cloudflare Edge
Platforms 5+ directories
Feed Format RSS + Podcast 2.0

What Makes This Different

Not Scripted

Every conversation is improvised. Luke doesn't know what the caller is going to say. The AI doesn't follow a script. It's a real conversation between a human and an AI character who has a life, opinions, and something on their mind.

Built From Scratch

This isn't an app with a plugin. Every piece — the caller generator, the voice engine, the control room, the phone system, the post-production pipeline, the publishing automation — was built specifically for this show.

Real Time

Everything happens live. Caller generation, voice synthesis, news lookups, weather checks, phone routing — all in real time during the show. There's no post-production trickery on the caller side. What you hear is what happened.

They Listen to Each Other

Callers aren't isolated — they hear what happened earlier in the show. A caller might disagree with the last guy, back someone up, or call in specifically because of something another caller said. The show builds on itself.

Broadcast-Grade Audio

Every episode runs through a 15-step post-production pipeline: stem limiting, high-pass filtering, de-essing, breath reduction, spoken-word compression, telephone EQ, level matching, music ducking with ad muting, stereo imaging, bus compression, and EBU R128 loudness normalization.

Fully Automated Pipeline

From recording to your podcast app, the entire pipeline is automated. Post-production kicks off when the show ends, then a publish script handles transcription, AI-generated metadata, chapter detection, CDN sync, and RSS distribution — all with a single command.

Want to hear it for yourself?

Listen to Episodes
Or call in live: 208-439-LUKE
Support the Show