Add show improvement features: crossfade, emotions, returning callers, transcripts, screening
- Music crossfade: smooth 3-second blend between tracks instead of hard stop/start - Emotional detection: analyze host mood from recent messages so callers adapt tone - AI caller summaries: generate call summaries with timestamps for show history - Returning callers: persist regular callers across sessions with call history - Session export: generate transcripts with speaker labels and chapter markers - Caller screening: AI pre-screens phone callers to get name and topic while queued Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
189
docs/plans/2026-02-05-real-callers-design.md
Normal file
189
docs/plans/2026-02-05-real-callers-design.md
Normal file
@@ -0,0 +1,189 @@
|
||||
# Real Callers + AI Follow-Up Design
|
||||
|
||||
## Overview
|
||||
|
||||
Add real phone callers to the AI Radio Show via Twilio, alongside existing AI callers. Real callers dial a phone number, wait in a hold queue, and get taken on air by the host. Three-way conversations between host, real caller, and AI caller are supported. AI follow-up callers automatically reference what real callers said.
|
||||
|
||||
## Requirements
|
||||
|
||||
- Real callers connect via Twilio phone number
|
||||
- Full-duplex audio — host and caller talk simultaneously, talk over each other
|
||||
- Each real caller gets their own dedicated audio channel for recording
|
||||
- Three-way calls: host + real caller + AI caller all live at once
|
||||
- AI caller can respond manually (host-triggered) or automatically (listens and decides when to jump in)
|
||||
- AI follow-up callers reference real caller conversations via show history
|
||||
- Auto follow-up mode: system picks an AI caller and connects them after a real call
|
||||
- Simple hold queue — callers wait with hold music, host sees list and picks who goes on air
|
||||
- Twilio webhooks exposed via Cloudflare tunnel
|
||||
|
||||
## Architecture
|
||||
|
||||
### Audio Routing (Loopback Channels)
|
||||
|
||||
```
|
||||
Ch 1: Host mic (existing)
|
||||
Ch 2: AI callers / TTS (existing)
|
||||
Ch 3+: Real callers (dynamically assigned per call)
|
||||
Ch N-1: Music (existing)
|
||||
Ch N: SFX (existing)
|
||||
```
|
||||
|
||||
### Call Flow — Real Caller
|
||||
|
||||
```
|
||||
Caller dials Twilio number
|
||||
→ Twilio POST /api/twilio/voice
|
||||
→ TwiML response: greeting + enqueue with hold music
|
||||
→ Caller waits in hold queue
|
||||
→ Host sees caller in dashboard queue panel
|
||||
→ Host clicks "Take Call"
|
||||
→ POST /api/queue/take/{call_sid}
|
||||
→ Twilio opens WebSocket to /api/twilio/stream
|
||||
→ Bidirectional audio:
|
||||
Caller audio → decode mulaw → dedicated Loopback channel
|
||||
Host audio + AI TTS → encode mulaw → Twilio → caller hears both
|
||||
→ Real-time Whisper transcription of caller audio
|
||||
→ Host hangs up → call summarized → stored in show history
|
||||
```
|
||||
|
||||
### Three-Way Call Flow
|
||||
|
||||
```
|
||||
Host mic ──────→ Ch 1 (recording)
|
||||
→ Twilio outbound (real caller hears you)
|
||||
→ Whisper transcription (AI gets your words)
|
||||
|
||||
Real caller ──→ Ch 3+ (recording, dedicated channel)
|
||||
→ Whisper transcription (AI gets their words)
|
||||
→ Host headphones
|
||||
|
||||
AI TTS ───────→ Ch 2 (recording)
|
||||
→ Twilio outbound (real caller hears AI)
|
||||
→ Host headphones (already works)
|
||||
```
|
||||
|
||||
Conversation history becomes three-party with role labels: `host`, `real_caller`, `ai_caller`.
|
||||
|
||||
### AI Auto-Respond Mode
|
||||
|
||||
When toggled on, after each real caller transcription chunk:
|
||||
|
||||
1. Lightweight LLM call ("should I respond?" — use fast model like Haiku)
|
||||
2. If YES → full response generated → TTS → plays on AI channel + streams to Twilio
|
||||
3. Cooldown (~10s) prevents rapid-fire
|
||||
4. Host can override with mute button
|
||||
|
||||
### AI Follow-Up System
|
||||
|
||||
After a real caller hangs up:
|
||||
|
||||
1. Full transcript (host + real caller + any AI) summarized by LLM
|
||||
2. Summary stored in `session.call_history`
|
||||
3. Next AI caller's system prompt includes show history:
|
||||
```
|
||||
EARLIER IN THE SHOW:
|
||||
- Dave (real caller) called about his wife leaving after 12 years.
|
||||
He got emotional about his kids.
|
||||
- Jasmine called about her boss hitting on her at work.
|
||||
You can reference these if it feels natural. Don't force it.
|
||||
```
|
||||
|
||||
**Host-triggered (default):** Click any AI caller as normal. They already have show context.
|
||||
|
||||
**Auto mode:** After real caller hangs up, system waits ~5-10s, picks a fitting AI caller via short LLM call, biases their background generation toward the topic, auto-connects.
|
||||
|
||||
## Backend Changes
|
||||
|
||||
### New Module: `backend/services/twilio_service.py`
|
||||
|
||||
Manages Twilio integration:
|
||||
- WebSocket handler for Media Streams (decode/encode mulaw 8kHz ↔ PCM)
|
||||
- Call queue state (waiting callers, SIDs, timestamps, assigned channels)
|
||||
- Channel pool management (allocate/release Loopback channels for real callers)
|
||||
- Outbound audio mixing (host + AI TTS → mulaw → Twilio)
|
||||
- Methods: `take_call()`, `hangup_real_caller()`, `get_queue()`, `send_audio_to_caller()`
|
||||
|
||||
### New Endpoints
|
||||
|
||||
```python
|
||||
# Twilio webhooks
|
||||
POST /api/twilio/voice # Incoming call → TwiML (greet + enqueue)
|
||||
POST /api/twilio/hold-music # Hold music TwiML for waiting callers
|
||||
WS /api/twilio/stream # Media Streams WebSocket (bidirectional audio)
|
||||
|
||||
# Host controls
|
||||
GET /api/queue # List waiting callers (number, wait time)
|
||||
POST /api/queue/take/{call_sid} # Dequeue caller → start media stream
|
||||
POST /api/queue/drop/{call_sid} # Drop caller from queue
|
||||
|
||||
# AI follow-up
|
||||
POST /api/followup/generate # Summarize last real call, trigger AI follow-up
|
||||
```
|
||||
|
||||
### Session Model Changes
|
||||
|
||||
```python
|
||||
class CallRecord:
|
||||
caller_type: str # "ai" or "real"
|
||||
caller_name: str # "Tony" or "Caller #3"
|
||||
summary: str # LLM-generated summary after hangup
|
||||
transcript: list[dict] # Full conversation [{role, content}]
|
||||
|
||||
class Session:
|
||||
# Existing fields...
|
||||
call_history: list[CallRecord] # All calls this episode
|
||||
active_real_caller: dict | None # {call_sid, phone, channel, name}
|
||||
active_ai_caller: str | None # Caller key
|
||||
ai_respond_mode: str # "manual" or "auto"
|
||||
auto_followup: bool # Auto-generate AI follow-up after real calls
|
||||
```
|
||||
|
||||
Three-party conversation history uses roles: `host`, `real_caller:{name}`, `ai_caller:{name}`.
|
||||
|
||||
### AI Caller Prompt Changes
|
||||
|
||||
`get_caller_prompt()` extended to include:
|
||||
- Show history from `session.call_history`
|
||||
- Current real caller context (if three-way call active)
|
||||
- Instructions for referencing real callers naturally
|
||||
|
||||
## Frontend Changes
|
||||
|
||||
### New: Call Queue Panel
|
||||
|
||||
Between callers section and chat. Shows waiting real callers with phone number and wait time. "Take Call" and "Drop" buttons per caller. Polls `/api/queue` every few seconds.
|
||||
|
||||
### Modified: Active Call Indicator
|
||||
|
||||
Shows real caller and AI caller simultaneously when both active:
|
||||
- Real caller: name, channel number, call duration, hang up button
|
||||
- AI caller: name, Manual/Auto toggle, "Let [name] respond" button (manual mode)
|
||||
- Auto Follow-Up checkbox
|
||||
|
||||
### Modified: Chat Log
|
||||
|
||||
Three-party with visual distinction:
|
||||
- Host messages: existing style
|
||||
- Real caller: labeled "Dave (caller)", distinct color
|
||||
- AI caller: labeled "Tony (AI)", distinct color
|
||||
|
||||
### Modified: Caller Grid
|
||||
|
||||
When real caller is active, clicking an AI caller adds them as third party instead of starting fresh call. Indicator shows which AI callers have been on the show this session.
|
||||
|
||||
## Dependencies
|
||||
|
||||
- `twilio` Python package (for TwiML generation, REST API)
|
||||
- Twilio account with phone number (~$1.15/mo + per-minute)
|
||||
- Cloudflare tunnel for exposing webhook endpoints
|
||||
- `audioop` or equivalent for mulaw encode/decode (stdlib in Python 3.11)
|
||||
|
||||
## Configuration
|
||||
|
||||
New env vars in `.env`:
|
||||
```
|
||||
TWILIO_ACCOUNT_SID=...
|
||||
TWILIO_AUTH_TOKEN=...
|
||||
TWILIO_PHONE_NUMBER=+1...
|
||||
TWILIO_WEBHOOK_BASE_URL=https://your-tunnel.cloudflare.com
|
||||
```
|
||||
Reference in New Issue
Block a user