From 6447edd0aec1434e1cbd414d1f42d36a260b1b71 Mon Sep 17 00:00:00 2001
From: tcpsyn <tcpsyn@gmail.com>
Date: Sat, 7 Feb 2026 00:36:17 -0700
Subject: [PATCH] Add architecture diagram and system documentation

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---
 docs/architecture.md | 361 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 361 insertions(+)
 create mode 100644 docs/architecture.md

diff --git a/docs/architecture.md b/docs/architecture.md
new file mode 100644
index 0000000..7379587
--- /dev/null
+++ b/docs/architecture.md
@@ -0,0 +1,361 @@
+# Luke at the Roost — Architecture
+
+## System Overview
+
+```
+┌─────────────────────────────────────────────────────────────────────────┐
+│                        BROWSER (Control Panel)                          │
+│                                                                         │
+│  ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌────────┐ ┌───────────────┐  │
+│  │ Caller   │ │  Chat    │ │  Music/  │ │Settings│ │  Server Log   │  │
+│  │ Buttons  │ │  Window  │ │  Ads/SFX │ │ Modal  │ │  (live tail)  │  │
+│  │ (0-9)    │ │          │ │          │ │        │ │               │  │
+│  └────┬─────┘ └────┬─────┘ └────┬─────┘ └───┬────┘ └───────┬───────┘  │
+│       │            │            │            │              │           │
+│  ┌────┴────────────┴────────────┴────────────┴──────────────┴───────┐  │
+│  │                    frontend/js/app.js                             │  │
+│  │  Polling: queue (3s), chat updates (real-time), logs (1s)        │  │
+│  │  Push-to-talk: record/stop → transcribe → chat → TTS → play     │  │
+│  └──────────────────────────┬───────────────────────────────────────┘  │
+└─────────────────────────────┼───────────────────────────────────────────┘
+                              │ REST API + WebSocket
+                              ▼
+┌─────────────────────────────────────────────────────────────────────────┐
+│                     FastAPI Backend (main.py)                            │
+│                     uvicorn :8000                                        │
+└─────────────────────────────────────────────────────────────────────────┘
+```
+
+---
+
+## Caller Generation Pipeline
+
+```
+Session Reset / First Access to Caller Slot
+    │
+    ▼
+_randomize_callers()
+    │  Assigns unique names (from 24M/24F pool) and voices (5M/5F) to 10 slots
+    │
+    ▼
+generate_caller_background(base)
+    │
+    ├─ Demographics: age (from range), job (gendered pool), location
+    │                                                        │
+    │                              ┌─────────────────────────┘
+    │                              ▼
+    │                     pick_location()
+    │                     80% LOCATIONS_LOCAL (weighted: Animas, Lordsburg)
+    │                     20% LOCATIONS_OUT_OF_STATE
+    │                              │
+    │                              ▼
+    │                     _get_town_from_location()
+    │                     └─ TOWN_KNOWLEDGE[town]
+    │                        32 towns with real facts
+    │                        "Only reference real places..."
+    │
+    ├─ 70% → PROBLEMS (100+ templates)
+    │        Fill {affair_person}, {fantasy_subject}, etc. from PROBLEM_FILLS
+    │
+    ├─ 30% → TOPIC_CALLIN (61 entries)
+    │        Prestige TV, science, poker, photography, physics, US news
+    │
+    ├─ 2x random INTERESTS (86 entries: TV shows, science, tech, poker, etc.)
+    │
+    └─ 2x random QUIRKS (conversational style traits)
+    │
+    ▼
+Result: "43, works IT for the city in Lordsburg. Just finished Severance
+        season 2... Follows JWST discoveries... Deflects with humor...
+        ABOUT WHERE THEY LIVE (Lordsburg): Small town on I-10, about 2,500
+        people... Only reference real places..."
+```
+
+### News Enrichment (at pickup time)
+
+```
+POST /api/call/{key}
+    │
+    ▼
+enrich_caller_background(background)     ← 5s timeout, fails silently
+    │
+    ├─ _extract_search_query(background)
+    │   ├─ Check _TOPIC_SEARCH_MAP (50+ keyword→query mappings)
+    │   │   "severance" → "Severance TV show"
+    │   │   "quantum"   → "quantum physics research"
+    │   │   "poker"     → "poker tournament"
+    │   │
+    │   └─ Fallback: extract keywords from problem sentence
+    │
+    ▼
+SearXNG (localhost:8888)
+    │  /search?q=...&format=json&categories=news
+    │
+    ▼
+LLM summarizes headline+snippet → natural one-liner
+    │  "Recently read about how Severance ties up the Lumon mystery"
+    │
+    ▼
+Appended to background: "..., and it's been on their mind."
+```
+
+---
+
+## AI Caller Conversation Flow
+
+```
+    Host speaks (push-to-talk or type)
+        │
+        ▼
+POST /api/record/start → record from input device
+POST /api/record/stop  → transcribe (Whisper @ 16kHz)
+        │
+        ▼
+POST /api/chat { text }
+        │
+        ├─ session.add_message("user", text)
+        │
+        ├─ Build system prompt: get_caller_prompt()
+        │   ├─ Caller identity + background + town knowledge
+        │   ├─ Show history (summaries of previous callers)
+        │   ├─ Conversation summary (last 6 messages)
+        │   └─ HOW TO TALK rules (varied length, no rehashing, etc.)
+        │
+        ├─ Last 10 messages → _normalize_messages_for_llm()
+        │
+        ▼
+LLMService.generate(messages, system_prompt)
+        │
+        ├─ OpenRouter: primary model (15s timeout)
+        ├─ Fallback 1: gemini-flash-1.5 (10s)
+        ├─ Fallback 2: gpt-4o-mini (10s)
+        ├─ Fallback 3: llama-3.1-8b (10s)
+        └─ Last resort: "Sorry, I totally blanked out..."
+        │
+        ▼
+clean_for_tts()              → strip (actions), *gestures*, fix phonetics
+ensure_complete_thought()    → trim to last complete sentence
+        │
+        ▼
+Response returned to frontend
+        │
+        ▼
+POST /api/tts { text, voice_id }
+        │
+        ▼
+generate_speech(text, voice_id)
+        │
+        ├─ Inworld (default cloud)     ─┐
+        ├─ ElevenLabs (cloud)           │
+        ├─ F5-TTS (local, cloned)       ├─→ PCM audio bytes (24kHz)
+        ├─ Kokoro MLX (local, fast)     │
+        ├─ ChatTTS / StyleTTS2 / etc.  ─┘
+        │
+        ▼
+AudioService.play_caller_audio(bytes, 24000)
+        │
+        └─→ Output Device Channel 1 (caller TTS)
+```
+
+---
+
+## Real Caller (Phone) Flow
+
+```
+Caller dials 208-439-LUKE
+        │
+        ▼
+SignalWire routes to webhook
+        │
+        ▼
+POST /api/signalwire/voice
+        │
+        ├─ If OFF AIR → play message + hangup
+        │
+        └─ If ON AIR → return BXML:
+           <Stream url="wss://.../api/signalwire/stream" codec="L16@16000h">
+        │
+        ▼
+WebSocket /api/signalwire/stream connects
+        │
+        ├─ "start" event → add to queue, play ring SFX
+        │                   broadcast_event("caller_queued")
+        │
+        │   [Caller waits in queue until host takes them]
+        │
+        ├─ Host clicks "Take Call" in UI
+        │   POST /api/queue/take/{caller_id}
+        │   └─ CallerService.take_call() → allocate channel
+        │   └─ Start host mic streaming → _host_audio_sender()
+        │
+        ├─ "media" events (continuous) ← caller's voice
+        │   │
+        │   ├─ route_real_caller_audio(pcm) → Ch 9 (host monitoring)
+        │   │
+        │   └─ Buffer 3s chunks → transcribe (Whisper)
+        │       │
+        │       └─ broadcast_chat() → appears in chat window
+        │
+        │   Host mic audio → _host_audio_sync_callback()
+        │   │
+        │   └─ _host_audio_sender() → CallerService.send_audio_to_caller()
+        │       └─ base64 encode → WebSocket → SignalWire → caller's phone
+        │
+        │   If AI caller also active (auto-respond mode):
+        │   │
+        │   └─ _debounced_auto_respond() (4s silence)
+        │       └─ LLM → TTS → play on Ch 1 + stream to real caller
+        │
+        ├─ Host hangs up
+        │   POST /api/hangup/real
+        │   └─ _signalwire_end_call(call_sid) → end phone call
+        │   └─ _summarize_real_call() → LLM summary → call_history
+        │   └─ Optional: _auto_followup() → pick AI caller to continue
+        │
+        └─ "stop" event or disconnect → cleanup
+```
+
+---
+
+## Audio Routing (Multi-Channel Output)
+
+```
+All audio goes to ONE physical output device (Loopback/interface)
+Each content type on a separate channel for mixing in DAW/OBS
+
+┌─────────────────────────────────────────────────────────────┐
+│                   Output Device (e.g. Loopback 16ch)        │
+│                                                             │
+│   Ch 1  ◄── Caller TTS (AI voices)          play_caller_audio()
+│   Ch 2  ◄── Music (loops)                   play_music()
+│   Ch 3  ◄── Sound Effects (one-shots)       play_sfx()
+│   Ch 9  ◄── Live Caller Audio (monitoring)  route_real_caller_audio()
+│   Ch 11 ◄── Ads (one-shots, no loop)        play_ad()
+│                                                             │
+│   All channels configurable via Settings panel              │
+└─────────────────────────────────────────────────────────────┘
+
+┌─────────────────────────────────────────────────────────────┐
+│                   Input Device (mic/interface)               │
+│                                                             │
+│   Ch N  ──► Host mic recording (push-to-talk)               │
+│         ──► Host mic streaming (to real callers via WS)     │
+└─────────────────────────────────────────────────────────────┘
+```
+
+---
+
+## External Services
+
+```
+┌──────────────┐     ┌──────────────┐     ┌──────────────┐
+│  SignalWire   │     │  OpenRouter   │     │   SearXNG    │
+│              │     │              │     │  (local)     │
+│  Phone calls │     │  LLM API     │     │  News search │
+│  REST + WS   │     │  Claude,GPT  │     │  :8888       │
+│  Bidirectional│     │  Gemini,Llama│     │              │
+│  audio stream│     │  Fallback    │     │              │
+└──────────────┘     └──────────────┘     └──────────────┘
+
+┌──────────────┐     ┌──────────────┐     ┌──────────────┐
+│   Inworld    │     │  ElevenLabs  │     │  Local TTS   │
+│              │     │              │     │              │
+│  TTS (cloud) │     │  TTS (cloud) │     │  Kokoro MLX  │
+│  Default     │     │  Premium     │     │  F5-TTS      │
+│  provider    │     │              │     │  ChatTTS     │
+│              │     │              │     │  + others    │
+└──────────────┘     └──────────────┘     └──────────────┘
+
+┌──────────────┐
+│  Castopod    │
+│              │
+│  Podcast     │
+│  publishing  │
+│  (NAS)       │
+└──────────────┘
+```
+
+---
+
+## Session Lifecycle
+
+```
+New Session (reset)
+    │
+    ├─ Randomize all 10 caller names + voices
+    ├─ Clear conversation, call history, research
+    ├─ New session ID
+    │
+    ▼
+Show goes ON AIR (toggle)
+    │
+    ├─ SignalWire starts accepting calls
+    │
+    ▼
+Caller interactions (loop)
+    │
+    ├─ Pick AI caller (click button 0-9)
+    │   ├─ Generate background (if first time this session)
+    │   ├─ Enrich with news (SearXNG → LLM summary)
+    │   ├─ Conversation loop (chat/respond/auto-respond)
+    │   └─ Hangup → summarize → add to call_history
+    │
+    ├─ Take real caller from queue
+    │   ├─ Route audio both directions
+    │   ├─ Transcribe caller speech in real-time
+    │   ├─ Optional: AI caller auto-responds to real caller
+    │   └─ Hangup → summarize → add to call_history
+    │
+    ├─ Play music / ads / SFX between calls
+    │
+    └─ Each new caller sees show_history (summaries of all previous calls)
+        "EARLIER IN THE SHOW: Tony talked about... Carmen discussed..."
+    │
+    ▼
+Show goes OFF AIR
+    │
+    └─ Incoming calls get off-air message + hangup
+```
+
+---
+
+## Key Design Patterns
+
+| Pattern | Where | Why |
+|---------|-------|-----|
+| **Epoch-based staleness** | `_session_epoch` in main.py | Prevents stale LLM/TTS responses from playing after hangup |
+| **Fallback chain** | LLMService | Guarantees a response even if primary model times out |
+| **Debounced auto-respond** | `_debounced_auto_respond()` | Waits 4s for real caller to stop talking before AI jumps in |
+| **Silent failure** | News enrichment | If search/LLM fails, caller just doesn't have news context |
+| **Threading for audio** | `play_caller_audio()` | Audio playback can't block the async event loop |
+| **Ring buffer** | `route_real_caller_audio()` | Absorbs jitter in real caller audio stream |
+| **Lock contention guard** | `_ai_response_lock` | Only one AI response generates at a time |
+| **Town knowledge injection** | `TOWN_KNOWLEDGE` dict | Prevents LLM from inventing fake local businesses |
+
+---
+
+## File Map
+
+```
+ai-podcast/
+├── backend/
+│   ├── main.py              ← FastAPI app, all endpoints, caller generation, session
+│   ├── config.py            ← Settings (env vars, paths)
+│   └── services/
+│       ├── audio.py         ← Multi-channel audio I/O (sounddevice)
+│       ├── caller_service.py← Phone queue, WebSocket registry, audio routing
+│       ├── llm.py           ← OpenRouter/Ollama with fallback chain
+│       ├── news.py          ← SearXNG search + caching
+│       ├── tts.py           ← 8 TTS providers (cloud + local)
+│       └── transcription.py ← Whisper speech-to-text
+├── frontend/
+│   ├── index.html           ← Control panel layout
+│   ├── js/app.js            ← UI logic, polling, event handlers
+│   └── css/style.css        ← Dark theme styling
+├── sounds/                  ← SFX files (ring, hangup, busy, etc.)
+├── music/                   ← Background music tracks
+├── ads/                     ← Ad audio files
+├── website/                 ← Landing page (lukeattheroost.com)
+├── publish_episode.py       ← Castopod episode publisher
+└── run.sh                   ← Server launcher with restart support
+```