- Add testimonial slider with 8 fake caller reviews - Add how-it-works page with visual architecture diagram - Expand voice pools: Inworld 25 voices (14M/11F), ElevenLabs 22 (14M/8F) - Voice pools auto-switch when TTS provider changes - Add cover art locally, update cache-busted image refs - Add "More from Luke" footer links (MMG, prints, YouTube) - Ad channel configurable in settings UI Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
16 KiB
16 KiB
Luke at the Roost — Architecture
System Overview
┌─────────────────────────────────────────────────────────────────────────┐
│ BROWSER (Control Panel) │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌────────┐ ┌───────────────┐ │
│ │ Caller │ │ Chat │ │ Music/ │ │Settings│ │ Server Log │ │
│ │ Buttons │ │ Window │ │ Ads/SFX │ │ Modal │ │ (live tail) │ │
│ │ (0-9) │ │ │ │ │ │ │ │ │ │
│ └────┬─────┘ └────┬─────┘ └────┬─────┘ └───┬────┘ └───────┬───────┘ │
│ │ │ │ │ │ │
│ ┌────┴────────────┴────────────┴────────────┴──────────────┴───────┐ │
│ │ frontend/js/app.js │ │
│ │ Polling: queue (3s), chat updates (real-time), logs (1s) │ │
│ │ Push-to-talk: record/stop → transcribe → chat → TTS → play │ │
│ └──────────────────────────┬───────────────────────────────────────┘ │
└─────────────────────────────┼───────────────────────────────────────────┘
│ REST API + WebSocket
▼
┌─────────────────────────────────────────────────────────────────────────┐
│ FastAPI Backend (main.py) │
│ uvicorn :8000 │
└─────────────────────────────────────────────────────────────────────────┘
Caller Generation Pipeline
Session Reset / First Access to Caller Slot
│
▼
_randomize_callers()
│ Assigns unique names (from 24M/24F pool) and voices (Inworld: 14M/11F, ElevenLabs: 14M/8F) to 10 slots
│
▼
generate_caller_background(base)
│
├─ Demographics: age (from range), job (gendered pool), location
│ │
│ ┌─────────────────────────┘
│ ▼
│ pick_location()
│ 80% LOCATIONS_LOCAL (weighted: Animas, Lordsburg)
│ 20% LOCATIONS_OUT_OF_STATE
│ │
│ ▼
│ _get_town_from_location()
│ └─ TOWN_KNOWLEDGE[town]
│ 32 towns with real facts
│ "Only reference real places..."
│
├─ 70% → PROBLEMS (100+ templates)
│ Fill {affair_person}, {fantasy_subject}, etc. from PROBLEM_FILLS
│
├─ 30% → TOPIC_CALLIN (61 entries)
│ Prestige TV, science, poker, photography, physics, US news
│
├─ 2x random INTERESTS (86 entries: TV shows, science, tech, poker, etc.)
│
└─ 2x random QUIRKS (conversational style traits)
│
▼
Result: "43, works IT for the city in Lordsburg. Just finished Severance
season 2... Follows JWST discoveries... Deflects with humor...
ABOUT WHERE THEY LIVE (Lordsburg): Small town on I-10, about 2,500
people... Only reference real places..."
News Enrichment (at pickup time)
POST /api/call/{key}
│
▼
enrich_caller_background(background) ← 5s timeout, fails silently
│
├─ _extract_search_query(background)
│ ├─ Check _TOPIC_SEARCH_MAP (50+ keyword→query mappings)
│ │ "severance" → "Severance TV show"
│ │ "quantum" → "quantum physics research"
│ │ "poker" → "poker tournament"
│ │
│ └─ Fallback: extract keywords from problem sentence
│
▼
SearXNG (localhost:8888)
│ /search?q=...&format=json&categories=news
│
▼
LLM summarizes headline+snippet → natural one-liner
│ "Recently read about how Severance ties up the Lumon mystery"
│
▼
Appended to background: "..., and it's been on their mind."
AI Caller Conversation Flow
Host speaks (push-to-talk or type)
│
▼
POST /api/record/start → record from input device
POST /api/record/stop → transcribe (Whisper @ 16kHz)
│
▼
POST /api/chat { text }
│
├─ session.add_message("user", text)
│
├─ Build system prompt: get_caller_prompt()
│ ├─ Caller identity + background + town knowledge
│ ├─ Show history (summaries of previous callers)
│ ├─ Conversation summary (last 6 messages)
│ └─ HOW TO TALK rules (varied length, no rehashing, etc.)
│
├─ Last 10 messages → _normalize_messages_for_llm()
│
▼
LLMService.generate(messages, system_prompt)
│
├─ OpenRouter: primary model (15s timeout)
├─ Fallback 1: gemini-flash-1.5 (10s)
├─ Fallback 2: gpt-4o-mini (10s)
├─ Fallback 3: llama-3.1-8b (10s)
└─ Last resort: "Sorry, I totally blanked out..."
│
▼
clean_for_tts() → strip (actions), *gestures*, fix phonetics
ensure_complete_thought() → trim to last complete sentence
│
▼
Response returned to frontend
│
▼
POST /api/tts { text, voice_id }
│
▼
generate_speech(text, voice_id)
│
├─ Inworld (default cloud) ─┐
├─ ElevenLabs (cloud) │
├─ F5-TTS (local, cloned) ├─→ PCM audio bytes (24kHz)
├─ Kokoro MLX (local, fast) │
├─ ChatTTS / StyleTTS2 / etc. ─┘
│
▼
AudioService.play_caller_audio(bytes, 24000)
│
└─→ Output Device Channel 1 (caller TTS)
Real Caller (Phone) Flow
Caller dials 208-439-LUKE
│
▼
SignalWire routes to webhook
│
▼
POST /api/signalwire/voice
│
├─ If OFF AIR → play message + hangup
│
└─ If ON AIR → return BXML:
<Stream url="wss://.../api/signalwire/stream" codec="L16@16000h">
│
▼
WebSocket /api/signalwire/stream connects
│
├─ "start" event → add to queue, play ring SFX
│ broadcast_event("caller_queued")
│
│ [Caller waits in queue until host takes them]
│
├─ Host clicks "Take Call" in UI
│ POST /api/queue/take/{caller_id}
│ └─ CallerService.take_call() → allocate channel
│ └─ Start host mic streaming → _host_audio_sender()
│
├─ "media" events (continuous) ← caller's voice
│ │
│ ├─ route_real_caller_audio(pcm) → Ch 9 (host monitoring)
│ │
│ └─ Buffer 3s chunks → transcribe (Whisper)
│ │
│ └─ broadcast_chat() → appears in chat window
│
│ Host mic audio → _host_audio_sync_callback()
│ │
│ └─ _host_audio_sender() → CallerService.send_audio_to_caller()
│ └─ base64 encode → WebSocket → SignalWire → caller's phone
│
│ If AI caller also active (auto-respond mode):
│ │
│ └─ _debounced_auto_respond() (4s silence)
│ └─ LLM → TTS → play on Ch 1 + stream to real caller
│
├─ Host hangs up
│ POST /api/hangup/real
│ └─ _signalwire_end_call(call_sid) → end phone call
│ └─ _summarize_real_call() → LLM summary → call_history
│ └─ Optional: _auto_followup() → pick AI caller to continue
│
└─ "stop" event or disconnect → cleanup
Audio Routing (Multi-Channel Output)
All audio goes to ONE physical output device (Loopback/interface)
Each content type on a separate channel for mixing in DAW/OBS
┌─────────────────────────────────────────────────────────────┐
│ Output Device (e.g. Loopback 16ch) │
│ │
│ Ch 1 ◄── Caller TTS (AI voices) play_caller_audio()
│ Ch 2 ◄── Music (loops) play_music()
│ Ch 3 ◄── Sound Effects (one-shots) play_sfx()
│ Ch 9 ◄── Live Caller Audio (monitoring) route_real_caller_audio()
│ Ch 11 ◄── Ads (one-shots, no loop) play_ad()
│ │
│ All channels configurable via Settings panel │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ Input Device (mic/interface) │
│ │
│ Ch N ──► Host mic recording (push-to-talk) │
│ ──► Host mic streaming (to real callers via WS) │
└─────────────────────────────────────────────────────────────┘
External Services
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ SignalWire │ │ OpenRouter │ │ SearXNG │
│ │ │ │ │ (local) │
│ Phone calls │ │ LLM API │ │ News search │
│ REST + WS │ │ Claude,GPT │ │ :8888 │
│ Bidirectional│ │ Gemini,Llama│ │ │
│ audio stream│ │ Fallback │ │ │
└──────────────┘ └──────────────┘ └──────────────┘
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Inworld │ │ ElevenLabs │ │ Local TTS │
│ │ │ │ │ │
│ TTS (cloud) │ │ TTS (cloud) │ │ Kokoro MLX │
│ Default │ │ Premium │ │ F5-TTS │
│ provider │ │ │ │ ChatTTS │
│ │ │ │ │ + others │
└──────────────┘ └──────────────┘ └──────────────┘
┌──────────────┐
│ Castopod │
│ │
│ Podcast │
│ publishing │
│ (NAS) │
└──────────────┘
Session Lifecycle
New Session (reset)
│
├─ Randomize all 10 caller names + voices
├─ Clear conversation, call history, research
├─ New session ID
│
▼
Show goes ON AIR (toggle)
│
├─ SignalWire starts accepting calls
│
▼
Caller interactions (loop)
│
├─ Pick AI caller (click button 0-9)
│ ├─ Generate background (if first time this session)
│ ├─ Enrich with news (SearXNG → LLM summary)
│ ├─ Conversation loop (chat/respond/auto-respond)
│ └─ Hangup → summarize → add to call_history
│
├─ Take real caller from queue
│ ├─ Route audio both directions
│ ├─ Transcribe caller speech in real-time
│ ├─ Optional: AI caller auto-responds to real caller
│ └─ Hangup → summarize → add to call_history
│
├─ Play music / ads / SFX between calls
│
└─ Each new caller sees show_history (summaries of all previous calls)
"EARLIER IN THE SHOW: Tony talked about... Carmen discussed..."
│
▼
Show goes OFF AIR
│
└─ Incoming calls get off-air message + hangup
Key Design Patterns
| Pattern | Where | Why |
|---|---|---|
| Epoch-based staleness | _session_epoch in main.py |
Prevents stale LLM/TTS responses from playing after hangup |
| Fallback chain | LLMService | Guarantees a response even if primary model times out |
| Debounced auto-respond | _debounced_auto_respond() |
Waits 4s for real caller to stop talking before AI jumps in |
| Silent failure | News enrichment | If search/LLM fails, caller just doesn't have news context |
| Threading for audio | play_caller_audio() |
Audio playback can't block the async event loop |
| Ring buffer | route_real_caller_audio() |
Absorbs jitter in real caller audio stream |
| Lock contention guard | _ai_response_lock |
Only one AI response generates at a time |
| Town knowledge injection | TOWN_KNOWLEDGE dict |
Prevents LLM from inventing fake local businesses |
File Map
ai-podcast/
├── backend/
│ ├── main.py ← FastAPI app, all endpoints, caller generation, session
│ ├── config.py ← Settings (env vars, paths)
│ └── services/
│ ├── audio.py ← Multi-channel audio I/O (sounddevice)
│ ├── caller_service.py← Phone queue, WebSocket registry, audio routing
│ ├── llm.py ← OpenRouter/Ollama with fallback chain
│ ├── news.py ← SearXNG search + caching
│ ├── tts.py ← 8 TTS providers (cloud + local)
│ └── transcription.py ← Whisper speech-to-text
├── frontend/
│ ├── index.html ← Control panel layout
│ ├── js/app.js ← UI logic, polling, event handlers
│ └── css/style.css ← Dark theme styling
├── sounds/ ← SFX files (ring, hangup, busy, etc.)
├── music/ ← Background music tracks
├── ads/ ← Ad audio files
├── website/ ← Landing page (lukeattheroost.com)
├── publish_episode.py ← Castopod episode publisher
└── run.sh ← Server launcher with restart support