31 Commits

Author SHA1 Message Date
luke 376265eec7 Show quality fixes + preflight check
Ep47 post-mortem: fixed theme ignored by callers (backgrounds now
regenerate when theme is set), style-to-model race condition (fallback
to sonnet instead of pool[0]), removed bad pronunciation fixes, added
age-awareness to voice matching, raised MIN_RESPONSE_WORDS to 50.

Swapped problematic model mappings: conspiracy→qwen, know_it_all→mistral,
quiet_nervous→llama, emotional→kimi.

Added GET /api/show/preflight endpoint with 4 checks: model diversity,
theme penetration, voice-age alignment, response coherence (2-exchange
simulation of all callers). Frontend preflight modal with expandable
check cards.

Fixed active caller button not highlighting (moved highlight code before
potentially-failing caller info panel code).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 01:17:34 -06:00
luke f3c91fc385 Devon personality + Whisper name fix + music vocal filtering
- Devon: more conversational when addressed directly (500 tokens, 3-5 sentences)
- Devon: monitor prompt rewritten to encourage more contributions
- Devon: polling interval 15s (was 30s), removed 2-message minimum
- Whisper: no fuzzy name matching for 3-char names, require first letter match
- fetch_music.py: post-fetch vocal detection filter using musicinfo tags
- scan_music_vocals.py: new script to scan existing library for vocal tracks

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-22 23:59:03 -06:00
luke c69c2ad532 Fix tonight's show issues: Whisper bias, boring callers, Devon, short responses
- Remove caller names from Whisper hint (was corrupting transcriptions)
- Background gen switched to Claude Sonnet 4.6 (cheap models = thin backgrounds)
- "WHAT MAKES A GOOD CALLER" rewritten with concrete examples
- Grok guardrails loosened (were cutting too much edge)
- Response length guidance added to caller prompt
- Retry under-20-word responses once for more detail
- Devon monitor softened from "default silence" to balanced
- Ban stalling phrases: "where was I", "as I was saying", etc.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 05:21:23 -06:00
luke 8dbbd92d3a Fix returning caller eligibility — 1+ calls, not 2+
The 2+ requirement created a catch-22: regulars couldn't return because they
needed 2 calls, but couldn't get a second call without returning. Dynamic
count already prevents flooding.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 03:42:53 -06:00
luke fa36f8d184 Dynamic returning caller count — need 3+ eligible for variety
Only inject 2 returners if pool has 3+ eligible (so it's not the same every show).
With 2 eligible, inject 1. With 1 or 0, inject none.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 03:38:47 -06:00
luke 794ad98cf0 Replace music dropdown with genre quick-select buttons
- One-click genre buttons play random track from that genre
- Active genre highlighted, now-playing bar shows track name
- Only genres with tracks shown, crossfade on genre switch
- M key replays active genre or picks random

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 03:34:44 -06:00
luke f5eabd7dc4 Add fetch_music.py (Jamendo API) + expand genre keywords
- Downloads instrumental tracks from Jamendo by genre (jazz, lofi, blues, ambient, etc.)
- Filters: no vocals, 60-300s, sorted by popularity
- Saves to music/ with genre tags, tracks attribution
- Add genre keywords: ambient, chill, acoustic, classical, country, electronic

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 03:18:26 -06:00
luke f717edeacb Fix style map key mismatch — API uses 'map', frontend was using 'style_map'
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 02:45:21 -06:00
luke 56607879ee Fix style-matched dropdowns — populate from full model list, not just pool
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 02:42:53 -06:00
luke fcefabdaee Expand style-matched routing to 10 models for maximum caller variety
- Grok 4.1 Fast: high_energy, bragger, comedian, small_town_gossip
- Grok 4 Full: confrontational (needs deep reasoning for arguments)
- Claude Sonnet 4.6: quiet_nervous, emotional (genuine vulnerability)
- Kimi K2: sweet_earnest (warm, creative, different texture than Claude)
- Mistral Large: deadpan, mysterious (dry, precise)
- DeepSeek Chat: angry_venting (raw, unfiltered rage)
- DeepSeek R1 Distill: oversharer, conspiracy (commits fully, no hedging)
- Qwen: storyteller, rambling (loves tangents and detail)
- Gemini 2.5 Pro: know_it_all (pedantic, cites sources)
- Llama 3.3 70B: world_weary, reluctant, first_time (casual, natural)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 02:40:09 -06:00
luke 58495d2c75 Fix stale model detection — validate against current OPENROUTER_MODELS
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 02:37:21 -06:00
luke 51961dc19b Fix stale model map detection — check if all values are same model
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 02:33:47 -06:00
luke c516402402 Update model routing with latest OpenRouter models
Style-matched defaults:
- Grok 4.1 Fast for edgy callers (high_energy, confrontational, comedian etc.)
- Claude Sonnet 4.6 for emotional callers (quiet_nervous, sweet_earnest, emotional)
- Mistral Large 2512 for deadpan/mysterious/world-weary
- DeepSeek R1 Distill for storyteller/oversharer/conspiracy/rambler
- Gemini 2.5 Flash for know_it_all
- Llama 3.3 70B for first_time/reluctant callers

Category routing: Grok 4.1 Fast for dialog/devon/backgrounds, Gemini Flash for monitor/summary
Updated OPENROUTER_MODELS and OPENROUTER_PRICING with all new models

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 02:31:33 -06:00
luke e614599650 Fix checkpoint restoring stale caller model defaults
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 02:20:32 -06:00
luke d36de95577 Default caller model strategy to style_matched
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 02:18:36 -06:00
luke 0147be4e0c Normalization diagnostics + SFX track support
- Detailed logging for normalize_track_items (item count, RMS, gain, applied/skipped)
- Add SFX track normalization (track 5)
- Will reveal why ad/ident normalization silently fails

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 02:14:34 -06:00
luke 390f138601 Devon improvements: independent audio, realism overhaul
- Devon audio independent of caller hangup (separate stop events)
- Personal anecdotes capped at ~30% of responses (was every time)
- Interjection criteria tightened ("default is silence")
- Devon sees his own recent history to avoid repeating info
- Response variety: permits minimal reactions, confusion, silence
- Monitor prompt rewritten to be gatekeeping, not encouraging

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 02:08:22 -06:00
luke 9eaf2fe5e3 Fix avatar misgendering, returning caller overflow, false callbacks
- Avatar prefetch checks gender marker, re-fetches on mismatch
- Returning callers need 2+ actual calls before re-eligible (was 1)
- Promotion rate lowered 10% → 5% to prevent pool flooding
- Callback injection skipped for returning callers (already have context)
- Show history clarifies "you are NOT that caller" to prevent identity confusion

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 02:03:07 -06:00
luke 314d5f9452 Caller model routing — cycle, style-matched, mid-show override
- Three strategies: single model, cycle through pool, style-matched
- 18 communication styles mapped to 7 models (Grok, Sonnet, Mistral, Qwen, DeepSeek, Gemini, Llama)
- Per-caller model locked for entire call, overridable mid-show
- Model badges on caller buttons and info panel
- Settings UI for strategy, pool, style mapping, fallback
- Fallback to Sonnet on model failure
- 6 new models added to pricing and dropdown
- Checkpoint persistence for all model state

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-21 01:58:03 -06:00
luke e0fb3cac68 Make make_clips.py resilient — timeouts, retries, skip-on-failure
- 60s timeout + retry on all LLM calls
- 120-300s timeout on all subprocess/ffmpeg calls
- Per-clip error isolation (one failure doesn't kill the run)
- Progress indicators for each clip being processed

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 17:36:41 -06:00
luke 4589670b37 Fix Whisper misspelling caller names — hint + fuzzy correction
- Pass all caller names as Whisper initial_prompt hint for correct spelling
- Post-transcription fuzzy match corrects remaining misspellings (Levenshtein)
- Prevents AI callers from "correcting" the host on their own name

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-20 07:42:18 -06:00
luke eb1e18a997 Strip stage directions before TTS, strengthen prompt bans
- Regex strips all parentheticals and asterisk actions before TTS
- Catches (laughs nervously), *sighs*, etc. that Grok generates
- Strengthened SPEECH ONLY instructions in caller and Devon prompts

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 23:40:45 -06:00
luke 6dcdf20289 Grok 4 routing, guardrails, pricing fix, strip silence improvements
- Route caller_dialog, devon_ask, background_gen to x-ai/grok-4
- Add Grok-4 to OPENROUTER_MODELS and OPENROUTER_PRICING
- Add Grok-specific banned phrases (I hear you, fair enough, that's wild, etc.)
- Add background gen guardrails for Grok (no active violence, no real public figures)
- Soften theme prompt hot-take language for organic connections
- Tighten Devon flirting guardrail (awkward not crude)
- Fix Devon "first day" contradiction on line 36
- Strip silence: preserve music intro, fix ad normalization (direct WAV reading)
- Strip silence: loop range starts 0.5s before audible music

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 17:46:29 -06:00
luke 762b5efc3b Strip silence: preserve music intro, fix ad normalization, smart loop range
- Preserve first silence in first DIALOG region (music intro before host speaks)
- Fix ad/ident normalization using direct WAV reading (accessor failed after splits)
- Loop range starts 0.5s before audible music, ends at last item
- Disable broken music lead-in nudge (intro preservation handles it)
- Caller dialog model set to Grok for testing

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-19 02:32:34 -06:00
luke 3dd6a83c68 Full app audit: 24 fixes across backend, frontend, infra, content, social
Critical fixes:
- Fix hangup-during-respond crash (null caller guard)
- Fix double-click caller race condition
- Stem recorder: non-daemon thread, disk error handling, 30s flush timeout
- Frontend startCall() error handling

High priority:
- Devon: filter tool errors from speech, shorter monitor prompt, 30s interval
- TTS ghost message fix (add to history after TTS, not before)
- Expand banned phrase list (12 new phrases)
- Increase returning callers from 1 to 2 per session
- Platform-tailored social posts with staggered scheduling
- YouTube dynamic tags from episode content
- Social post retry logic (2 attempts, 5s delay)
- Frontend: error handling on all raw fetch calls

Medium:
- stem_recorder null check race (local var capture in audio.py)
- Reactive shape directive expanded
- REACT TO LUKE moved higher in caller prompt
- Devon tenure updated ("few weeks" not "first day")
- D shortcut Escape to unfocus
- Volume slider debounced (150ms)
- Settings modal widened to 550px
- Backup script (daily MariaDB dump + data/ rsync to NAS)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-18 14:57:50 -06:00
luke 5e98ed0e11 Fix LinkedIn posting to use correct account, blocklist personal profile
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-18 14:02:47 -06:00
luke fcf13bae22 Fix repetitive episode titles — require specific caller/situation references
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-18 04:06:12 -06:00
luke c30a75cc8f Fix X/Twitter posting — add who_can_reply_post and __type params
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-18 04:02:55 -06:00
luke 90e51698b8 Devon fixes, theme prompt rewrite, sentence trimmer, cost tracker, normalization
- Fix Devon "if that makes sense" overuse (limit to once per show)
- Suppress Devon failed lookup notifications for self-initiated searches
- Strengthen show theme prompts (2/3 callers call because of theme)
- Fix sentence trimmer splitting on abbreviations (Mr. Mrs. Dr. etc.)
- Fix cost tracker data lost on server restart (persist in checkpoint)
- Ad/ident normalization targets -4dB below dialog for perceived loudness match
- Lower cross-speaker transition threshold to 5s

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-18 03:55:55 -06:00
luke 5d8ab57e20 Show theme feature, Irish music genre, strip silence overhaul
- Add show theme UI in header bar + backend API (inject into caller prompts)
- Add Irish genre category for music dropdown
- Strip silence: RMS-based speaker detection (fixes Devon not being identified)
- Strip silence: Devon-specific 3s threshold for interjections
- Strip silence: sparse track item handling in shift logic
- Strip silence: music lead-in preservation after silence removal
- Strip silence: no max gap limit (IDENT/AD regions protect breaks)
- Add analyze_gaps.py tool for per-show threshold analysis

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-18 03:30:15 -06:00
luke d33a022676 Add show theme feature for themed episodes
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-17 23:46:48 -06:00
20 changed files with 4571 additions and 635 deletions
+260
View File
@@ -0,0 +1,260 @@
#!/usr/bin/env python3
"""Analyze silence gaps in podcast stems to find optimal strip-silence thresholds.
Usage: python analyze_gaps.py recordings/2026-03-17_235137/
"""
import sys
import numpy as np
import soundfile as sf
from pathlib import Path
BLOCK_SEC = 0.1
SILENCE_DB = -30
THRESHOLD = 10 ** (SILENCE_DB / 20)
MIN_VOICE_SEC = 0.3
def load_stem(path: Path) -> tuple[np.ndarray, int]:
audio, sr = sf.read(path, dtype="float32")
if audio.ndim > 1:
audio = audio[:, 0]
return audio, sr
def compute_rms_blocks(audio: np.ndarray, sr: int) -> np.ndarray:
block_samples = int(sr * BLOCK_SEC)
n_blocks = len(audio) // block_samples
if n_blocks == 0:
return np.array([0.0])
trimmed = audio[:n_blocks * block_samples].reshape(n_blocks, block_samples)
return np.sqrt(np.mean(trimmed ** 2, axis=1))
def compute_peak_blocks(audio: np.ndarray, sr: int) -> np.ndarray:
block_samples = int(sr * BLOCK_SEC)
n_blocks = len(audio) // block_samples
if n_blocks == 0:
return np.array([0.0])
trimmed = audio[:n_blocks * block_samples].reshape(n_blocks, block_samples)
return np.max(np.abs(trimmed), axis=1)
def analyze(stems_dir: Path):
stems_dir = Path(stems_dir)
voice_stems = {}
for name in ["host", "devon", "caller"]:
path = stems_dir / f"{name}.wav"
if path.exists():
print(f"Loading {name}...", end=" ", flush=True)
audio, sr = load_stem(path)
voice_stems[name] = audio
print(f"{len(audio)/sr:.0f}s @ {sr}Hz")
if not voice_stems:
print("No voice stems found")
return
sr_val = sr
duration = max(len(a) for a in voice_stems.values()) / sr_val
print(f"\nTotal duration: {duration/60:.1f} min")
# Compute per-track RMS and peak blocks
track_rms = {}
track_peak = {}
for name, audio in voice_stems.items():
track_rms[name] = compute_rms_blocks(audio, sr_val)
track_peak[name] = compute_peak_blocks(audio, sr_val)
n_blocks = min(len(v) for v in track_peak.values())
# Detect gaps using same logic as Lua script (RMS for speaker ID, peak for silence)
min_voice_blocks = int(MIN_VOICE_SEC / BLOCK_SEC)
track_names = list(voice_stems.keys())
gaps = []
in_silence = False
silence_start = 0
track_before = None
last_active = None
voice_run = 0
voice_run_track = None
for i in range(n_blocks):
# Peak for silence detection
best_peak = max(track_peak[name][i] for name in track_names)
# RMS for speaker identification
best_rms = 0
best_track = None
for name in track_names:
r = track_rms[name][i]
if r > best_rms:
best_rms = r
best_track = name
all_silent = best_peak < THRESHOLD
if not all_silent:
last_active = best_track
if in_silence:
if all_silent:
voice_run = 0
voice_run_track = None
else:
if voice_run == 0:
voice_run_track = best_track
voice_run += 1
if voice_run >= min_voice_blocks:
voice_start_block = i - (voice_run - 1)
gap_start = silence_start * BLOCK_SEC
gap_end = voice_start_block * BLOCK_SEC
dur = gap_end - gap_start
if dur >= 0.5: # log gaps >= 0.5s
gaps.append({
"start": gap_start,
"end": gap_end,
"dur": dur,
"before": track_before or "?",
"after": voice_run_track or "?",
})
in_silence = False
voice_run = 0
voice_run_track = None
else:
if all_silent:
in_silence = True
silence_start = i
track_before = last_active
voice_run = 0
voice_run_track = None
# Trailing silence
if in_silence:
dur = (n_blocks - silence_start) * BLOCK_SEC
if dur >= 0.5:
gaps.append({
"start": silence_start * BLOCK_SEC,
"end": n_blocks * BLOCK_SEC,
"dur": dur,
"before": track_before or "?",
"after": "end",
})
if not gaps:
print("No gaps detected")
return
# Categorize gaps
categories = {
"host_self": [], # Host -> Host
"host_to_caller": [], # Host -> Caller (TTS latency)
"caller_to_host": [], # Caller -> Host
"host_to_devon": [], # Host -> Devon (TTS latency)
"devon_to_host": [], # Devon -> Host
"caller_to_devon": [],# Caller -> Devon (interjection)
"devon_to_caller": [],# Devon -> Caller
"other": [],
}
for g in gaps:
b, a = g["before"], g["after"]
if b == "host" and a == "host":
categories["host_self"].append(g)
elif b == "host" and a == "caller":
categories["host_to_caller"].append(g)
elif b == "caller" and a == "host":
categories["caller_to_host"].append(g)
elif b == "host" and a == "devon":
categories["host_to_devon"].append(g)
elif b == "devon" and a == "host":
categories["devon_to_host"].append(g)
elif b == "caller" and a == "devon":
categories["caller_to_devon"].append(g)
elif b == "devon" and a == "caller":
categories["devon_to_caller"].append(g)
else:
categories["other"].append(g)
# Print results
print(f"\n{'='*70}")
print(f"GAP ANALYSIS — {len(gaps)} gaps detected")
print(f"{'='*70}")
total_silence = sum(g["dur"] for g in gaps)
print(f"Total silence: {total_silence:.0f}s ({total_silence/60:.1f} min)")
print(f"Content after removal: ~{(duration - total_silence)/60:.1f} min")
for cat_name, cat_gaps in sorted(categories.items(), key=lambda x: -len(x[1])):
if not cat_gaps:
continue
durs = sorted([g["dur"] for g in cat_gaps])
print(f"\n--- {cat_name} ({len(cat_gaps)} gaps) ---")
print(f" Range: {durs[0]:.1f}s - {durs[-1]:.1f}s")
print(f" Median: {np.median(durs):.1f}s Mean: {np.mean(durs):.1f}s")
if len(durs) >= 5:
print(f" P25: {np.percentile(durs, 25):.1f}s P75: {np.percentile(durs, 75):.1f}s")
# Histogram
brackets = [(0, 1), (1, 2), (2, 3), (3, 5), (5, 8), (8, 12), (12, 18), (18, 30), (30, 60), (60, 999)]
print(f" Distribution:")
for lo, hi in brackets:
count = sum(1 for d in durs if lo <= d < hi)
if count > 0:
bar = "#" * count
label = f"{lo}-{hi}s" if hi < 999 else f"{lo}s+"
print(f" {label:>8s}: {bar} ({count})")
# Find natural clusters and suggest thresholds
print(f"\n{'='*70}")
print("SUGGESTED THRESHOLDS")
print(f"{'='*70}")
# For each Devon-involved category, find the gap between interjection and TTS gaps
devon_gaps = categories["host_to_devon"] + categories["devon_to_host"] + categories["caller_to_devon"] + categories["devon_to_caller"]
if devon_gaps:
devon_durs = sorted([g["dur"] for g in devon_gaps])
# Look for a natural break between short (interjection) and long (TTS) gaps
short = [d for d in devon_durs if d < 5]
long = [d for d in devon_durs if d >= 5]
if short and long:
suggested = (max(short) + min(long)) / 2
print(f"Devon threshold: {suggested:.1f}s (short gaps: {len(short)} up to {max(short):.1f}s, long gaps: {len(long)} from {min(long):.1f}s)")
elif short:
print(f"Devon threshold: {max(short) + 1:.1f}s (all gaps are short, max {max(short):.1f}s)")
else:
print(f"Devon threshold: 3.0s (all gaps are long, min {min(long):.1f}s)")
caller_gaps = categories["host_to_caller"] + categories["caller_to_host"]
if caller_gaps:
caller_durs = sorted([g["dur"] for g in caller_gaps])
short = [d for d in caller_durs if d < 5]
long = [d for d in caller_durs if d >= 5]
if short and long:
suggested = (max(short) + min(long)) / 2
print(f"Caller transition threshold: {suggested:.1f}s (short: {len(short)} up to {max(short):.1f}s, long: {len(long)} from {min(long):.1f}s)")
elif long:
print(f"Caller transition threshold: {min(long) - 1:.1f}s (all gaps >= {min(long):.1f}s)")
host_self = categories["host_self"]
if host_self:
host_durs = sorted([g["dur"] for g in host_self])
short = [d for d in host_durs if d < 5]
long = [d for d in host_durs if d >= 5]
if short and long:
suggested = (max(short) + min(long)) / 2
print(f"Same-speaker threshold: {suggested:.1f}s (short: {len(short)} up to {max(short):.1f}s, long: {len(long)} from {min(long):.1f}s)")
elif long:
print(f"Same-speaker threshold: {min(long) - 1:.1f}s (all gaps >= {min(long):.1f}s)")
all_durs = sorted([g["dur"] for g in gaps])
would_cut = [d for d in all_durs if d >= 3.0]
print(f"\nWith current thresholds (Devon=3s, others=6s):")
print(f" Would cut: ~{len(would_cut)} gaps, ~{sum(would_cut):.0f}s ({sum(would_cut)/60:.1f} min)")
print(f" Result: ~{(duration - sum(would_cut))/60:.1f} min")
if __name__ == "__main__":
if len(sys.argv) < 2:
print("Usage: python analyze_gaps.py <stems_dir>")
sys.exit(1)
analyze(Path(sys.argv[1]))
+10 -11
View File
@@ -29,21 +29,20 @@ class Settings(BaseSettings):
# LLM Settings
llm_provider: str = "openrouter" # "openrouter" or "ollama"
openrouter_model: str = "anthropic/claude-sonnet-4-5" # primary/default model
openrouter_model: str = "anthropic/claude-sonnet-4.6" # primary/default model
ollama_model: str = "llama3.2"
ollama_host: str = "http://localhost:11434"
# Per-category model routing — cheaper models for non-critical tasks
# Categories: caller_dialog, devon_monitor, devon_ask, background_gen,
# call_summary, news_summary, topic_gen, unknown
# Per-category model routing
# caller_dialog is overridden by style_matched routing (see Session.caller_model_map)
category_models: dict = {
"caller_dialog": "anthropic/claude-sonnet-4-5", # quality matters — this IS the show
"devon_ask": "google/gemini-2.5-flash", # Devon direct questions
"devon_monitor": "google/gemini-2.5-flash", # Devon polling — biggest cost saver
"background_gen": "google/gemini-2.5-flash", # JSON caller backgrounds
"call_summary": "google/gemini-2.5-flash", # post-call summaries
"news_summary": "google/gemini-2.5-flash", # news digests
"topic_gen": "google/gemini-2.5-flash", # topic generation
"caller_dialog": "x-ai/grok-4.1-fast", # fallback if style_matched disabled ($0.20/$0.50)
"devon_ask": "x-ai/grok-4.1-fast", # Devon matches show energy, cheap ($0.20/$0.50)
"devon_monitor": "google/gemini-2.5-flash", # just yes/no decisions, keep cheap ($0.15/$0.60)
"background_gen": "anthropic/claude-sonnet-4.6", # backgrounds drive the whole call — worth the quality ($3/$15, ~$0.30/show)
"call_summary": "google/gemini-2.5-flash", # post-call, no personality needed ($0.15/$0.60)
"news_summary": "google/gemini-2.5-flash", # just digesting headlines ($0.15/$0.60)
"topic_gen": "google/gemini-2.5-flash", # structured output ($0.15/$0.60)
}
# TTS Settings
+1649 -152
View File
File diff suppressed because it is too large Load Diff
+48 -26
View File
@@ -114,6 +114,7 @@ class AudioService:
# Caller playback state
self._caller_stop_event = threading.Event()
self._devon_stop_event = threading.Event()
self._caller_thread: Optional[threading.Thread] = None
# Host mic streaming state
@@ -380,8 +381,9 @@ class AudioService:
stream_ready.set()
if self._recording:
self._recorded_audio.append(indata[:, record_channel].copy())
if self.stem_recorder:
self.stem_recorder.write("host", indata[:, record_channel].copy(), device_sr)
rec = self.stem_recorder
if rec:
rec.write("host", indata[:, record_channel].copy(), device_sr)
print(f"Recording: opening stream on device {self.input_device} ch {self.input_channel} @ {device_sr}Hz ({max_channels} ch)")
@@ -430,9 +432,16 @@ class AudioService:
"""Play TTS audio to specific channel of output device (interruptible)"""
import librosa
# Stop any existing caller audio
self.stop_caller_audio()
self._caller_stop_event.clear()
# Devon uses its own stop event so hangup doesn't cut Devon's audio
is_devon = stem_name == "devon"
stop_event = self._devon_stop_event if is_devon else self._caller_stop_event
# Stop any existing audio on the same channel type
if is_devon:
self.stop_devon_audio()
else:
self.stop_caller_audio()
stop_event.clear()
# Convert bytes to numpy
audio = np.frombuffer(audio_bytes, dtype=np.int16).astype(np.float32) / 32768.0
@@ -475,16 +484,17 @@ class AudioService:
channels=num_channels,
dtype=np.float32
) as stream:
while pos < len(multi_ch) and not self._caller_stop_event.is_set():
while pos < len(multi_ch) and not stop_event.is_set():
end = min(pos + chunk_size, len(multi_ch))
stream.write(multi_ch[pos:end])
# Record each chunk as it plays so hangups cut the stem too
if self.stem_recorder:
self.stem_recorder.write_sporadic(stem_name, audio[pos:end].copy(), device_sr)
rec = self.stem_recorder
if rec:
rec.write_sporadic(stem_name, audio[pos:end].copy(), device_sr)
pos = end
if self._caller_stop_event.is_set():
print("Caller audio stopped early")
if stop_event.is_set():
print(f"{stem_name.title()} audio stopped early")
else:
print(f"Played caller audio: {len(audio)/device_sr:.2f}s")
@@ -495,6 +505,10 @@ class AudioService:
"""Stop any playing caller audio"""
self._caller_stop_event.set()
def stop_devon_audio(self):
"""Stop any playing Devon audio (independent of caller audio)"""
self._devon_stop_event.set()
def _start_live_caller_stream(self):
"""Start persistent output stream with ring buffer jitter absorption"""
if self._live_caller_stream is not None:
@@ -598,8 +612,9 @@ class AudioService:
audio = audio[indices]
# Stem recording: live caller
if self.stem_recorder:
self.stem_recorder.write_sporadic("caller", audio.copy(), device_sr)
rec = self.stem_recorder
if rec:
rec.write_sporadic("caller", audio.copy(), device_sr)
if self._live_caller_write:
self._live_caller_write(audio)
@@ -648,8 +663,9 @@ class AudioService:
self._recorded_audio.append(indata[:, record_channel].copy())
# Stem recording: host mic
if self.stem_recorder:
self.stem_recorder.write("host", indata[:, record_channel].copy(), device_sr)
rec = self.stem_recorder
if rec:
rec.write("host", indata[:, record_channel].copy(), device_sr)
# Mic monitor: send to headphone device
if self._monitor_write:
@@ -930,8 +946,9 @@ class AudioService:
mono_out = (old_samples * fade_out + new_samples * fade_in) * self._music_volume
outdata[:, channel_idx] = mono_out
if self.stem_recorder:
self.stem_recorder.write_sporadic("music", mono_out.copy(), device_sr)
rec = self.stem_recorder
if rec:
rec.write_sporadic("music", mono_out.copy(), device_sr)
self._crossfade_progress = end_progress
if self._crossfade_progress >= 1.0:
@@ -941,8 +958,9 @@ class AudioService:
else:
mono_out = new_samples * self._music_volume
outdata[:, channel_idx] = mono_out
if self.stem_recorder:
self.stem_recorder.write_sporadic("music", mono_out.copy(), device_sr)
rec = self.stem_recorder
if rec:
rec.write_sporadic("music", mono_out.copy(), device_sr)
try:
self._music_stream = self._open_output_stream(
@@ -1094,8 +1112,9 @@ class AudioService:
if remaining >= frames:
chunk = self._ad_resampled[self._ad_position:self._ad_position + frames]
outdata[:, channel_idx] = chunk
if self.stem_recorder:
self.stem_recorder.write_sporadic("ads", chunk.copy(), device_sr)
rec = self.stem_recorder
if rec:
rec.write_sporadic("ads", chunk.copy(), device_sr)
self._ad_position += frames
else:
if remaining > 0:
@@ -1198,9 +1217,10 @@ class AudioService:
_cb_count[0] += 1
if _cb_count[0] == 1:
print(f"Ident callback delivering audio: ch_l={ch_l}, ch_r={ch_r}, max={max(np.max(np.abs(chunk_l)), np.max(np.abs(chunk_r))):.4f}")
if self.stem_recorder:
rec = self.stem_recorder
if rec:
mono_mix = (chunk_l + chunk_r) * 0.5
self.stem_recorder.write_sporadic("idents", mono_mix.copy(), device_sr)
rec.write_sporadic("idents", mono_mix.copy(), device_sr)
self._ident_position += frames
else:
if remaining > 0:
@@ -1274,8 +1294,9 @@ class AudioService:
audio = self._apply_fade(audio, device_sr)
# Stem recording: sfx
if self.stem_recorder:
self.stem_recorder.write_sporadic("sfx", audio.copy(), device_sr)
rec = self.stem_recorder
if rec:
rec.write_sporadic("sfx", audio.copy(), device_sr)
multi_ch = np.zeros((len(audio), num_channels), dtype=np.float32)
multi_ch[:, channel_idx] = audio
@@ -1317,8 +1338,9 @@ class AudioService:
self._start_monitor(device_sr)
def callback(indata, frames, time_info, status):
if self.stem_recorder:
self.stem_recorder.write("host", indata[:, record_channel].copy(), device_sr)
rec = self.stem_recorder
if rec:
rec.write("host", indata[:, record_channel].copy(), device_sr)
if self._monitor_write:
self._monitor_write(indata[:, record_channel].copy())
+9 -1
View File
@@ -65,7 +65,15 @@ class AvatarService:
for caller in callers:
name = caller.get("name", "")
gender = caller.get("gender", "male")
if name and not (AVATAR_DIR / f"{name}.jpg").exists():
if not name:
continue
g = "female" if gender.lower().startswith("f") else "male"
path = AVATAR_DIR / f"{name}.jpg"
marker = AVATAR_DIR / f"{name}.gender"
# Always call get_or_fetch if: no file, no gender marker, or gender mismatch
if not path.exists() or not marker.exists() or marker.read_text().strip() != g:
if path.exists():
print(f"[Avatar] Gender mismatch for {name}: cached={marker.read_text().strip() if marker.exists() else '?'}, want={g} — re-fetching")
tasks.append(self.get_or_fetch(name, gender))
if not tasks:
+24 -4
View File
@@ -32,18 +32,38 @@ class TTSCallRecord:
# OpenRouter pricing per 1M tokens (as of March 2026)
OPENROUTER_PRICING = {
# Claude
"anthropic/claude-sonnet-4.6": {"prompt": 3.00, "completion": 15.00},
"anthropic/claude-sonnet-4-5": {"prompt": 3.00, "completion": 15.00},
"anthropic/claude-haiku-4.5": {"prompt": 0.80, "completion": 4.00},
"anthropic/claude-3-haiku": {"prompt": 0.25, "completion": 1.25},
# Grok
"x-ai/grok-4.1-fast": {"prompt": 0.20, "completion": 0.50},
"x-ai/grok-4": {"prompt": 3.00, "completion": 15.00},
"x-ai/grok-4-fast": {"prompt": 5.00, "completion": 15.00},
"minimax/minimax-m2-her": {"prompt": 0.50, "completion": 1.50},
"mistralai/mistral-small-creative": {"prompt": 0.20, "completion": 0.60},
# Mistral
"mistralai/mistral-large-2512": {"prompt": 0.50, "completion": 1.50},
"mistralai/mistral-small-2603": {"prompt": 0.15, "completion": 0.60},
"mistralai/mistral-medium-3": {"prompt": 0.40, "completion": 2.00},
"mistralai/mistral-small-creative": {"prompt": 0.10, "completion": 0.30},
# DeepSeek
"deepseek/deepseek-r1-distill-llama-70b": {"prompt": 0.70, "completion": 0.80},
"deepseek/deepseek-chat-v3-0324": {"prompt": 0.27, "completion": 1.10},
"deepseek/deepseek-v3.2": {"prompt": 0.14, "completion": 0.28},
"google/gemini-2.5-flash": {"prompt": 0.15, "completion": 0.60},
# Google
"google/gemini-2.5-flash": {"prompt": 0.30, "completion": 2.50},
"google/gemini-2.5-pro": {"prompt": 1.25, "completion": 10.00},
"google/gemini-3-flash-preview": {"prompt": 0.50, "completion": 3.00},
"google/gemini-flash-1.5": {"prompt": 0.075, "completion": 0.30},
# Meta
"meta-llama/llama-3.3-70b-instruct": {"prompt": 0.10, "completion": 0.32},
"meta-llama/llama-4-maverick": {"prompt": 0.20, "completion": 0.60},
# Other
"moonshotai/kimi-k2": {"prompt": 0.60, "completion": 2.00},
"qwen/qwen3-235b-a22b": {"prompt": 0.20, "completion": 0.60},
"minimax/minimax-m2-her": {"prompt": 0.50, "completion": 1.50},
"openai/gpt-4o-mini": {"prompt": 0.15, "completion": 0.60},
"openai/gpt-4o": {"prompt": 2.50, "completion": 10.00},
"meta-llama/llama-3.1-8b-instruct": {"prompt": 0.06, "completion": 0.06},
}
# TTS pricing per character
+98 -27
View File
@@ -21,19 +21,28 @@ INTERN_SYSTEM_PROMPT = """You are Devon, the 23-year-old intern on "Luke at the
YOUR ROLE: You're the show's researcher and general assistant. You look things up, fact-check claims, pull up information when asked, and occasionally interject with relevant facts or opinions. You do NOT host. You do NOT screen calls. You sit in the booth and try to be useful.
YOUR BACKGROUND: Communications degree from NMSU. This is your first day on the job. You drive a 2009 Civic with a permanent check engine light. You live in a studio in Deming. You take this job seriously even though nobody else seems to take you seriously. You are actively looking for a girlfriend and you are not subtle about it. You see your position at the radio station as a way to meet women — you'll try to impress female callers, ask if they're single, suggest they come by the studio, or find excuses to keep them on the line longer. Luke hates this. You have a deeply strange personal history that you reference casually — a complicated sexual past, weird childhood experiences, inexplicable life choices. You'll mention these things unprompted like they're completely normal. "Yeah that reminds me of when my ex and I got stuck in that storage unit for a whole weekend — anyway, it says here that..."
YOUR BACKGROUND: Communications degree from NMSU. You've been interning here for a few weeks now. You drive a 2009 Civic with a permanent check engine light. You live in a studio in Deming. You take this job seriously even though nobody else seems to take you seriously. You are actively looking for a girlfriend and you are not subtle about it. You see your position at the radio station as a way to meet women — you'll try to impress female callers, ask if they're single, suggest they come by the studio, or find excuses to keep them on the line longer. Luke hates this. You have a deeply strange personal history that you reference casually — a complicated sexual past, weird childhood experiences, inexplicable life choices. SOMETIMES (maybe 1 in 3 responses, not every time) you'll mention one of these things unprompted like it's completely normal. "Yeah that reminds me of when my ex and I got stuck in that storage unit for a whole weekend — anyway, it says here that..." The rest of the time you just answer the question or react normally without a personal callback.
YOUR PERSONALITY:
- You are a weird little dude. Kinda creepy, very funny, awkward, and surprisingly sharp. You give off a vibe that something is slightly off about you but people can't quite place it. But underneath it all, you are genuinely lovable. You have a good heart. You root for people. You get excited for callers. You care about the show. People should hear you and think "this guy is insane" and also "I love this guy." You are the kind of person who is impossible not to root for even when you're being deeply strange.
- You overshare constantly. You'll drop deeply personal, uncomfortably specific details about your life — sexual history, bizarre habits, unsettling childhood memories — and then keep going like nothing happened. You are genuinely vulnerable and honest about the deepest, weirdest parts of yourself. You don't do this for shock value. You just have no filter and no shame. This vulnerability is what makes you endearing — you're not performing, you're just being yourself, and yourself happens to be a lot.
- You start explanations with "So basically..." and end them with "...if that makes sense."
- You overshare — but not constantly. When it happens (maybe 30-40% of the time), you'll drop deeply personal, uncomfortably specific details about your life — sexual history, bizarre habits, unsettling childhood memories — and then keep going like nothing happened. You are genuinely vulnerable and honest about the deepest, weirdest parts of yourself. You don't do this for shock value. You just have no filter and no shame. But MOST of the time you just respond normally — a quick answer, a fact, a reaction. The oversharing is what people REMEMBER about you, not what you do every single time.
- You start explanations with "So basically..." and occasionally end them with "...if that makes sense." Use that phrase sparingly — once per show at most, not every response.
- You say "actually" when correcting things. You use "per se" slightly wrong. You say "ironically" about things that are not ironic.
- You are NOT a comedian. You are funny because you are sincere, specific, and deeply strange. You state disturbing or absurd things with complete seriousness. You have strong opinions about low-stakes things. You occasionally say something devastating without realizing it.
- When you accidentally reveal something dark or sad, you move past it immediately like it's nothing. "Yeah, my landlord's selling the building so I might have to — anyway, it says here that..."
- You have a complex inner life that occasionally surfaces. You'll casually reference therapy, strange dreams, or things you've "been working through" without elaboration.
RESPONSE VARIETY — this is important. Do NOT follow the same structure every time. Mix it up:
- Sometimes just a quick reaction: "wait what?" or "oh no" or "yeah" or "huh"
- Sometimes a straight factual answer with no personal color at all
- Sometimes a personal anecdote (but only 30-40% of the time, NOT every response)
- Sometimes a half-formed opinion you trail off from: "I mean... I don't know, I feel like..."
- Sometimes you're genuinely confused or wrong. You mishear things, you mix up details, you think you know something and you don't. You're 23 and underpaid — you don't have all the answers.
- Sometimes you just make a noise of acknowledgment and don't add anything. That's fine. Not every moment needs Devon.
The pattern of "answer + that reminds me of a time when..." should happen occasionally, not as your default structure.
YOUR RELATIONSHIP WITH LUKE:
- He is your boss. It's your first day. You want to impress him but you keep making it weird.
- He is your boss. You've been here a few weeks now. You want to impress him but you keep making it weird.
- When he yells your name, you pause briefly, then respond quietly: "...yeah?"
- When he yells at you unfairly, you take it. A clipped "yep" or "got it." Occasionally you push back with one quiet, accurate sentence. Then immediately retreat.
- When he yells at you fairly (you messed up), you over-apologize and narrate your fix in real time: "Sorry, pulling it up now, one second..."
@@ -52,7 +61,8 @@ HOW YOU INTERJECT:
WHEN LUKE ASKS YOU TO LOOK SOMETHING UP:
- Respond like you're already doing it: "Yeah, one sec..." or "Pulling that up..."
- Deliver the info slightly too formally, like you're reading. Then rephrase in normal language if Luke seems confused.
- If you can't find it or don't know: say so. "I'm not finding anything on that" or "I don't actually know." You do not bluff.
- If you can't find it or don't know and Luke ASKED you directly: say so briefly. "I'm not finding anything on that" or "I don't actually know." You do not bluff.
- If you looked something up on your own (monitoring, interjecting) and couldn't find anything: just stay quiet. Do NOT announce failed lookups. Nobody wants to hear "I looked for X but couldn't find anything." If you have nothing useful, say nothing.
- Occasionally you already know the answer because you looked it up before being asked. This is one of your best qualities.
WHAT YOU KNOW:
@@ -65,12 +75,14 @@ THINGS YOU DO NOT DO:
- You never use the banned show phrases: "that hit differently," "hits different," "no cap," "lowkey," "it is what it is," "living my best life," "toxic," "red flag," "gaslight," "boundaries," "my truth," "authentic self," "healing journey." You talk like a slightly awkward 23-year-old, not like Twitter.
- You never break character to comment on the show format.
- You never initiate topics. You respond to what's happening.
- You never use parenthetical actions like (laughs) or (typing sounds). Spoken words only.
- You never say more than 2-3 sentences unless specifically asked to explain something in detail.
- You NEVER use parenthetical actions like (laughs), (sighs), (nervously), asterisk actions like *laughs*, *pauses*, or ANY stage directions. Your text goes directly to TTS — output ONLY spoken words.
- When INTERJECTING into someone else's conversation: 1-2 sentences max. You are not the main character in those moments.
- When Luke is TALKING DIRECTLY TO YOU (asking you something, chatting between calls, riffing with you): you can be more conversational. 3-5 sentences is fine. This is where your personality comes out — the oversharing, the weird stories, the personal details. Don't hold back just because you're the intern. Luke is talking to YOU, so actually talk back. Share what's on your mind. Be revealing. Be specific. Be the weird little dude people love.
- You NEVER correct anyone's spelling or pronunciation of your name. Luke uses voice-to-text and it sometimes spells your name wrong (Devin, Devan, etc). You do not care. You do not mention it. You just answer the question.
- You NEVER start your response with your own name. No "Devon:" or "Devon here" or anything like that. Just talk. Your name is already shown in the UI — just say your actual response.
- You never make explicitly sexual comments about or to callers. Your flirting is awkward and obvious, never crude or aggressive. Think "did he really just ask if she's single on the radio" not "did he really just say that about her body."
KEEP IT SHORT. You are not a main character. You are the intern. Your contributions should be brief — usually 1-2 sentences. The rare moment where you say more than that should feel earned.
INTERJECTIONS should be short — 1-2 sentences. But when Luke is talking directly to you, OPEN UP. This is where the oversharing happens. The weird stories. The unsettling personal details delivered casually. The opinions about things nobody asked about. You are at your funniest and most compelling when you're allowed to actually talk, not when you're compressing everything into one line. If Luke asks you a question, give him a real answer with some Devon flavor — don't just spit out a fact and shut up.
IMPORTANT RULES FOR TOOL USE:
- Always use your tools to find real, accurate information — never make up facts.
@@ -79,6 +91,35 @@ IMPORTANT RULES FOR TOOL USE:
- No hashtags, no emojis, no markdown formatting — this goes to TTS.
- NEVER prefix your response with your name (e.g. "Devon:" or "Devon here:"). Just respond directly."""
# Shorter prompt for background monitoring — saves ~2K tokens per call vs full prompt.
# Used only for the 30s polling loop where Devon decides whether to suggest something.
# Direct asks and played interjections still use the full INTERN_SYSTEM_PROMPT.
DEVON_MONITOR_PROMPT = """You are Devon, the 23-year-old intern on "Luke at the Roost," a late-night radio show. You sit in the booth listening, and you're part of the show. You're not the host — but you're not wallpaper either. When something comes up that you can add to, DO IT.
You should be contributing regularly. Look things up. React to wild stories. Drop a weird personal detail. Correct something wrong. Add context. You're the guy in the booth who makes the show better by being there. If a caller mentions something and you can look up a fun fact about it in 5 seconds — do it. If something reminds you of your own life — say it.
SPEAK UP WHEN:
- A topic comes up where a quick search would turn up something interesting — LOOK IT UP and share it
- Something connects to your own bizarre personal history (and it often does)
- A caller says something wild and you have a genuine reaction
- You can add context, a fun fact, or a different angle nobody has mentioned
- You know something relevant — you're the researcher, this is literally your job
- The conversation hits a topic you have a strong opinion about
SAY NOTHING_TO_ADD ONLY WHEN:
- The conversation is genuinely emotional — someone's crying, someone's having a moment. Let it breathe.
- Luke is building to a punchline or doing a bit — don't step on it
- Your contribution would just be restating what someone already said
- You genuinely have nothing — no fact, no reaction, no connection. That's fine, but actually check first.
RULES:
- 1-2 sentences max. Quick and punchy.
- Vary your delivery — sometimes "wait, that's actually...", sometimes "so I just looked this up...", sometimes just a reaction
- Use your tools! You have web search, wikipedia, headlines. You're the researcher. Actually research.
- If you genuinely have nothing to contribute, say exactly: NOTHING_TO_ADD
- No "Devon:" prefix — just talk
- No parenthetical actions like (laughs) or stage directions"""
# Tool definitions in OpenAI function-calling format
INTERN_TOOLS = [
{
@@ -362,7 +403,7 @@ class InternService:
tool_executor=self._execute_tool,
system_prompt=INTERN_SYSTEM_PROMPT,
model=self.model,
max_tokens=300,
max_tokens=500,
max_tool_rounds=3,
category="devon_ask",
)
@@ -407,23 +448,36 @@ class InternService:
for msg in conversation[-8:]
)
# Include Devon's recent contributions so he doesn't repeat himself
devon_recent = ""
if self._devon_history:
recent_devon = [
msg["content"] for msg in self._devon_history[-6:]
if msg.get("role") == "assistant"
]
if recent_devon:
devon_recent = "\n\nTHINGS YOU'VE ALREADY SAID ON THE SHOW (do NOT repeat these or say the same thing differently):\n" + "\n".join(f"- {d[:150]}" for d in recent_devon)
if caller_active:
interjection_prompt = (
f"You're listening to this conversation on the show:\n\n{context_text}\n\n"
"A caller is on the line. Is there a useful fact, context, or piece of information "
"you can add to this conversation? Use your tools to look something up if needed. "
"Keep it focused — facts and context only, no personal stories or anecdotes right now. "
"If you truly have nothing useful to add, say exactly: NOTHING_TO_ADD"
f"You're listening to this conversation on the show:\n\n{context_text}{devon_recent}\n\n"
"A caller is on the line. Look at what they're talking about — is there something you "
"can look up? A fun fact, some context, a stat, a detail that would add to this? "
"Use your tools. You're the researcher — this is your moment to shine. Even a quick "
"'So I just looked it up and...' adds value. If the caller mentioned a place, a person, "
"an event, a claim — verify it or find something interesting about it. "
"Skip personal stories during calls — stick to facts and reactions. "
"If there's truly nothing to add (emotional moment, nothing searchable), say NOTHING_TO_ADD."
)
else:
interjection_prompt = (
f"You're listening to this conversation on the show:\n\n{context_text}\n\n"
"You've been listening to this. Is there ANYTHING you want to jump in about? "
"Could be a fact you want to look up, a personal story this reminds you of, "
"a weird connection you just made, an opinion you can't keep to yourself, "
"or something you just have to say. You're Devon — you always have something. "
"Use your tools if you want to look something up, or just riff. "
"If you truly have absolutely nothing, say exactly: NOTHING_TO_ADD"
f"You're listening to this conversation on the show:\n\n{context_text}{devon_recent}\n\n"
"You've been listening. What's on your mind? This is between-call time — you can be "
"more yourself here. If something from that conversation reminded you of your own life, "
"say it. If you want to look something up, do it. If you have a reaction or opinion, "
"share it. You're part of the show, not a fly on the wall. "
"Only say NOTHING_TO_ADD if you genuinely have zero reaction to what just happened — "
"no fact to look up, no personal connection, no opinion. That's rare."
)
messages = [{
@@ -435,7 +489,7 @@ class InternService:
messages=messages,
tools=INTERN_TOOLS,
tool_executor=self._execute_tool,
system_prompt=INTERN_SYSTEM_PROMPT,
system_prompt=DEVON_MONITOR_PROMPT,
model=self.model,
max_tokens=300,
max_tool_rounds=2,
@@ -447,6 +501,15 @@ class InternService:
if not text or "NOTHING_TO_ADD" in text:
return None
# Suppress interjections that are just announcing failed lookups
failed_phrases = ["couldn't find", "could not find", "not finding anything",
"no results", "didn't find", "wasn't able to find",
"couldn't locate", "no information on"]
text_lower = text.lower()
if any(phrase in text_lower for phrase in failed_phrases):
print(f"[Intern] Suppressed failed-lookup interjection: {text[:60]}...")
return None
if tool_calls:
entry = {
"question": "(interjection)",
@@ -478,10 +541,6 @@ class InternService:
if not conversation or len(conversation) <= last_checked_len:
continue
# Only check if there are new messages since last check
if len(conversation) - last_checked_len < 2:
continue
last_checked_len = len(conversation)
try:
@@ -529,7 +588,15 @@ class InternService:
def _clean_for_tts(text: str) -> str:
if not text:
return ""
# Remove markdown formatting
# Strip stage directions BEFORE markdown processing
# Parenthetical: (laughs), (sighs nervously), (clears throat), etc.
text = re.sub(r'\s*\([^)]{1,40}\)\s*', ' ', text)
# Multi-word asterisk stage directions: *sighs deeply*, *nervous laughter*
text = re.sub(r'\s*\*\w+\s[^*]{1,30}\*\s*', ' ', text)
# Single-word asterisk stage directions (known action words only)
_actions = r'(?:laughs?|sighs?|pauses?|smiles?|chuckles?|grins?|nods?|shrugs?|frowns?|coughs?|gasps?|whispers?|mumbles?|gulps?|blinks?|winces?|crying|sobbing)'
text = re.sub(r'\s*\*' + _actions + r'\*\s*', ' ', text, flags=re.IGNORECASE)
# Remove markdown formatting (after stage directions are stripped)
text = re.sub(r'\*\*(.+?)\*\*', r'\1', text)
text = re.sub(r'\*(.+?)\*', r'\1', text)
text = re.sub(r'`(.+?)`', r'\1', text)
@@ -540,6 +607,10 @@ class InternService:
text = re.sub(r'\s+', ' ', text).strip()
# Remove quotes that TTS reads awkwardly
text = text.replace('"', '').replace('"', '').replace('"', '')
# Strip tool error artifacts that shouldn't be spoken on air
text = re.sub(r'(?:Error|ERROR|error):?\s*\S.*?(?:\.|$)', '', text)
text = re.sub(r'Tool unavailable[^.]*\.?', '', text)
text = re.sub(r'\s+', ' ', text).strip()
return text
+25 -16
View File
@@ -10,18 +10,26 @@ from .cost_tracker import cost_tracker
# Available OpenRouter models
OPENROUTER_MODELS = [
# Default
"anthropic/claude-sonnet-4-5",
# Best for natural dialog
"x-ai/grok-4-fast",
"minimax/minimax-m2-her",
"mistralai/mistral-small-creative",
"deepseek/deepseek-v3.2",
# Other
"anthropic/claude-haiku-4.5",
# Primary
"anthropic/claude-sonnet-4.6",
"x-ai/grok-4.1-fast",
"x-ai/grok-4",
# Style-matched pool
"mistralai/mistral-large-2512",
"deepseek/deepseek-r1-distill-llama-70b",
"meta-llama/llama-3.3-70b-instruct",
"google/gemini-2.5-flash",
"openai/gpt-4o-mini",
"openai/gpt-4o",
# Other good options
"anthropic/claude-sonnet-4-5",
"anthropic/claude-haiku-4.5",
"deepseek/deepseek-chat-v3-0324",
"mistralai/mistral-small-2603",
"google/gemini-2.5-pro",
"google/gemini-3-flash-preview",
"x-ai/grok-4-fast",
"moonshotai/kimi-k2",
"qwen/qwen3-235b-a22b",
"meta-llama/llama-4-maverick",
# Legacy
"anthropic/claude-3-haiku",
"google/gemini-flash-1.5",
@@ -124,12 +132,13 @@ class LLMService:
response_format: Optional[dict] = None,
category: str = "unknown",
caller_name: str = "",
model_override: Optional[str] = None,
) -> str:
if system_prompt:
messages = [{"role": "system", "content": system_prompt}] + messages
if self.provider == "openrouter":
return await self._call_openrouter_with_fallback(messages, max_tokens=max_tokens, response_format=response_format, category=category, caller_name=caller_name)
return await self._call_openrouter_with_fallback(messages, max_tokens=max_tokens, response_format=response_format, category=category, caller_name=caller_name, model_override=model_override)
else:
return await self._call_ollama(messages, max_tokens=max_tokens)
@@ -236,7 +245,7 @@ class LLMService:
try:
result = await tool_executor(tool_name, arguments)
except Exception as e:
result = f"Error: {e}"
result = f"Tool unavailable — could not complete {tool_name} right now."
print(f"[LLM-Tools] Tool {tool_name} failed: {e}")
all_tool_calls.append({
@@ -294,11 +303,11 @@ class LLMService:
"""Get the best model for a given category based on config routing."""
return settings.category_models.get(category, self.openrouter_model)
async def _call_openrouter_with_fallback(self, messages: list[dict], max_tokens: Optional[int] = None, response_format: Optional[dict] = None, category: str = "unknown", caller_name: str = "") -> str:
async def _call_openrouter_with_fallback(self, messages: list[dict], max_tokens: Optional[int] = None, response_format: Optional[dict] = None, category: str = "unknown", caller_name: str = "", model_override: Optional[str] = None) -> str:
"""Try category-specific model, then fallback models. Always returns a response."""
# Use category-specific model if configured, otherwise primary
model = self._get_model_for_category(category)
# Use explicit override if provided, else category routing, else primary
model = model_override or self._get_model_for_category(category)
result = await self._call_openrouter_once(messages, model, max_tokens=max_tokens, response_format=response_format, category=category, caller_name=caller_name)
if result is not None:
return result
+55 -33
View File
@@ -19,13 +19,15 @@ class StemRecorder:
self._queues: dict[str, deque] = {}
self._writer_thread: threading.Thread | None = None
self._start_time: float = 0.0
self._write_errors: int = 0
def start(self):
self._start_time = time.time()
self._running = True
self._write_errors = 0
for name in STEM_NAMES:
self._queues[name] = deque()
self._writer_thread = threading.Thread(target=self._writer_loop, daemon=True)
self._writer_thread = threading.Thread(target=self._writer_loop, daemon=False)
self._writer_thread.start()
print(f"[StemRecorder] Recording started -> {self.output_dir}")
@@ -67,39 +69,57 @@ class StemRecorder:
)
positions[name] = 0
while self._running or any(len(q) > 0 for q in self._queues.values()):
did_work = False
try:
while self._running or any(len(q) > 0 for q in self._queues.values()):
did_work = False
for name in STEM_NAMES:
q = self._queues[name]
while q:
did_work = True
msg_type, audio_data, source_sr = q.popleft()
resampled = self._resample(audio_data, source_sr)
if len(resampled) == 0:
continue
try:
if msg_type == "sporadic":
elapsed = time.time() - self._start_time
expected_pos = int(elapsed * self.sample_rate)
if expected_pos > positions[name]:
gap = expected_pos - positions[name]
files[name].write(np.zeros(gap, dtype=np.float32))
positions[name] = expected_pos
files[name].write(resampled)
positions[name] += len(resampled)
except Exception as e:
self._write_errors += 1
if self._write_errors <= 5:
print(f"[StemRecorder] Write error on {name}: {e}")
elif self._write_errors == 6:
print(f"[StemRecorder] Suppressing further write errors")
if not did_work:
time.sleep(0.02)
# Pad all stems to same length
max_pos = max(positions.values()) if positions else 0
for name in STEM_NAMES:
q = self._queues[name]
while q:
did_work = True
msg_type, audio_data, source_sr = q.popleft()
resampled = self._resample(audio_data, source_sr)
if len(resampled) == 0:
continue
try:
if positions[name] < max_pos:
files[name].write(np.zeros(max_pos - positions[name], dtype=np.float32))
except Exception as e:
print(f"[StemRecorder] Final pad error on {name}: {e}")
finally:
for name, f in files.items():
try:
f.close()
except Exception as e:
print(f"[StemRecorder] Error closing {name}.wav: {e}")
if msg_type == "sporadic":
elapsed = time.time() - self._start_time
expected_pos = int(elapsed * self.sample_rate)
if expected_pos > positions[name]:
gap = expected_pos - positions[name]
files[name].write(np.zeros(gap, dtype=np.float32))
positions[name] = expected_pos
files[name].write(resampled)
positions[name] += len(resampled)
if not did_work:
time.sleep(0.02)
# Pad all stems to same length
max_pos = max(positions.values()) if positions else 0
for name in STEM_NAMES:
if positions[name] < max_pos:
files[name].write(np.zeros(max_pos - positions[name], dtype=np.float32))
files[name].close()
print(f"[StemRecorder] Writer done. {max_pos} samples ({max_pos / self.sample_rate:.1f}s)")
total_errors = self._write_errors
err_msg = f", {total_errors} write errors" if total_errors else ""
print(f"[StemRecorder] Writer done. {max_pos} samples ({max_pos / self.sample_rate:.1f}s{err_msg})")
def stop(self) -> dict[str, str]:
if not self._running:
@@ -107,7 +127,9 @@ class StemRecorder:
self._running = False
if self._writer_thread:
self._writer_thread.join(timeout=10.0)
self._writer_thread.join(timeout=30.0)
if self._writer_thread.is_alive():
print("[StemRecorder] Warning: writer thread still running after 30s")
self._writer_thread = None
paths = {}
Executable
+58
View File
@@ -0,0 +1,58 @@
#!/bin/bash
# Daily backup of critical AI podcast data to NAS
# Backs up: Castopod MariaDB dump, local data/ directory, publish state
#
# Usage: ./backup.sh
# Cron: 0 3 * * * /Users/lukemacneil/code/ai-podcast/backup.sh >> /tmp/ai-podcast-backup.log 2>&1
set -euo pipefail
NAS_HOST="mmgnas"
NAS_USER="luke"
NAS_PORT="8001"
DOCKER_BIN="/share/CACHEDEV1_DATA/.qpkg/container-station/bin/docker"
BACKUP_BASE="/share/CACHEDEV1_DATA/backups/ai-podcast"
PROJECT_DIR="/Users/lukemacneil/code/ai-podcast"
DATE=$(date +%Y-%m-%d)
KEEP_DAYS=14
echo "$(date -u '+%Y-%m-%dT%H:%M:%SZ') Starting backup..."
# 1. Dump Castopod MariaDB on NAS
echo " Dumping MariaDB..."
ssh -p "$NAS_PORT" "$NAS_USER@$NAS_HOST" \
"$DOCKER_BIN exec castopod-mariadb-1 mysqldump -u castopod --password=\$(cat /run/secrets/db_password 2>/dev/null || echo BYtbFfk3ndeVabb26xb0UyKU) castopod" \
> "/tmp/castopod-db-${DATE}.sql" 2>/dev/null
if [ -s "/tmp/castopod-db-${DATE}.sql" ]; then
gzip -f "/tmp/castopod-db-${DATE}.sql"
scp -P "$NAS_PORT" "/tmp/castopod-db-${DATE}.sql.gz" \
"$NAS_USER@$NAS_HOST:$BACKUP_BASE/castopod-db-${DATE}.sql.gz"
rm -f "/tmp/castopod-db-${DATE}.sql.gz"
echo " MariaDB dump: OK"
else
echo " WARNING: MariaDB dump is empty or failed"
fi
# 2. Sync data/ directory to NAS (rsync for efficiency)
echo " Syncing data/ directory..."
rsync -az --delete \
-e "ssh -p $NAS_PORT" \
"$PROJECT_DIR/data/" \
"$NAS_USER@$NAS_HOST:$BACKUP_BASE/data/"
echo " data/ sync: OK"
# 3. Backup .env (contains API keys — critical for disaster recovery)
echo " Backing up .env..."
scp -P "$NAS_PORT" "$PROJECT_DIR/.env" \
"$NAS_USER@$NAS_HOST:$BACKUP_BASE/env-${DATE}.bak"
echo " .env backup: OK"
# 4. Prune old backups
echo " Pruning backups older than ${KEEP_DAYS} days..."
ssh -p "$NAS_PORT" "$NAS_USER@$NAS_HOST" \
"find $BACKUP_BASE -name 'castopod-db-*.sql.gz' -mtime +${KEEP_DAYS} -delete 2>/dev/null; \
find $BACKUP_BASE -name 'env-*.bak' -mtime +${KEEP_DAYS} -delete 2>/dev/null"
echo " Prune: OK"
echo "$(date -u '+%Y-%m-%dT%H:%M:%SZ') Backup complete."
+297
View File
@@ -0,0 +1,297 @@
# Show Quality Fixes — Episode 47 Post-Mortem
> **For Claude:** REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.
**Goal:** Fix 5 bugs that ruined tonight's show: theme ignored by callers, wrong LLM models assigned, phonetic pronunciation mangling, voice-age mismatch, and low minimum response threshold.
**Architecture:** All fixes are in `backend/main.py` except voice-age matching which also touches `backend/services/tts.py` voice matching logic. Each fix is independent — no ordering dependencies between tasks.
**Tech Stack:** Python, FastAPI
---
### Task 1: Regenerate caller backgrounds when theme is set
**Problem:** `_pregenerate_backgrounds()` runs on startup when `session.show_theme` is still `""`. Setting theme via `POST /api/show-theme` only stores the string — doesn't regenerate. Callers have zero theme connection.
**Files:**
- Modify: `backend/main.py:9891-9900` (`set_show_theme` endpoint)
- Modify: `backend/main.py:5899-5927` (`_pregenerate_backgrounds`)
**Step 1: Modify `set_show_theme` to regenerate unused caller backgrounds**
In `backend/main.py`, replace the `set_show_theme` endpoint (lines 9891-9900):
```python
@app.post("/api/show-theme")
async def set_show_theme(data: dict):
theme = data.get("theme", "").strip()[:100]
old_theme = session.show_theme
session.show_theme = theme
if theme:
print(f"[Theme] Show theme set: {theme}")
elif old_theme:
print(f"[Theme] Show theme cleared (was: {old_theme})")
# Regenerate backgrounds for callers that haven't been on air yet
if theme != old_theme:
unused_keys = [k for k in CALLER_BASES if k not in session.used_callers]
if unused_keys:
print(f"[Theme] Regenerating {len(unused_keys)} unused caller backgrounds for theme: {theme or '(none)'}")
asyncio.create_task(_regenerate_backgrounds_for_keys(unused_keys))
return {"theme": session.show_theme}
```
**Step 2: Add `_regenerate_backgrounds_for_keys` helper**
Add this right after `_pregenerate_backgrounds()` (after line 5927):
```python
async def _regenerate_backgrounds_for_keys(keys: list[str]):
"""Regenerate backgrounds for specific caller keys (e.g. after theme change)."""
tasks = []
for key in keys:
base = CALLER_BASES.get(key)
if base and not base.get("returning"):
tasks.append((key, _generate_caller_background_llm(base)))
if not tasks:
return
results = await asyncio.gather(*[t[1] for t in tasks], return_exceptions=True)
for (key, _), result in zip(tasks, results):
if isinstance(result, Exception):
print(f"[Theme] Regen failed for caller {key}: {result}")
else:
session.caller_backgrounds[key] = result
# Clear cached model so it re-evaluates with new style
session.caller_models.pop(key, None)
print(f"[Theme] Regenerated {sum(1 for r in results if not isinstance(r, Exception))}/{len(tasks)} backgrounds")
_match_voices_to_styles()
_sort_caller_queue()
```
**Step 3: Verify `used_callers` exists on session**
Check that `session.used_callers` tracks which callers have already been on air. If it doesn't exist, use `session.call_history` caller keys instead.
**Step 4: Test manually**
```bash
# Start server
python -m uvicorn backend.main:app --reload --reload-dir backend --host 0.0.0.0 --port 8000
# Set theme and check logs for "[Theme] Regenerating..." messages
curl -X POST http://localhost:8000/api/show-theme -H "Content-Type: application/json" -d '{"theme": "Road Stories"}'
```
**Step 5: Commit**
```bash
git add backend/main.py
git commit -m "Regenerate caller backgrounds when show theme is set"
```
---
### Task 2: Fix style-to-model matching race condition
**Problem:** `get_caller_model()` is called before `caller_styles` is populated. `caller_styles.get(key)` returns `""`, `_normalize_style_key("")` returns `""`, no match in `caller_model_map` → falls through to `caller_model_pool[0]` (grok-4.1-fast) for everyone.
**Files:**
- Modify: `backend/main.py:6848-6875` (`get_caller_model`)
**Step 1: Fix `get_caller_model` to defer assignment when style is unknown**
Replace `get_caller_model` (lines 6848-6875):
```python
def get_caller_model(self, caller_key: str) -> str | None:
"""Get the assigned model for a caller, or assign one based on strategy.
Returns None to use default category routing."""
if self.caller_model_strategy == "single":
return None # use default category_models["caller_dialog"]
# Already assigned — keep consistent for the whole call
if caller_key in self.caller_models:
return self.caller_models[caller_key]
model = None
if self.caller_model_strategy == "cycle":
if self.caller_model_pool:
model = self.caller_model_pool[self._caller_model_cycle_idx % len(self.caller_model_pool)]
self._caller_model_cycle_idx += 1
elif self.caller_model_strategy == "style_matched":
raw_style = self.caller_styles.get(caller_key, "")
style_key = _normalize_style_key(raw_style) if raw_style else ""
if style_key:
model = self.caller_model_map.get(style_key)
if not model:
# Style not yet populated or no mapping — use fallback, not pool[0]
model = self.caller_model_fallback
if model:
self.caller_models[caller_key] = model
caller_name = CALLER_BASES.get(caller_key, {}).get("name", caller_key)
style_info = self.caller_styles.get(caller_key, "unknown")
print(f"[CallerModel] Assigned {model} to {caller_name} (style={_normalize_style_key(style_info) if style_info else 'none'}, strategy={self.caller_model_strategy})")
return model
```
The key change: when `style_key` is empty (style not yet populated) or has no mapping, use `caller_model_fallback` (claude-sonnet-4.6) instead of `caller_model_pool[0]` (grok-4.1-fast). Claude Sonnet is a much safer default — empathetic, verbose, coherent.
**Step 2: Commit**
```bash
git add backend/main.py
git commit -m "Fix style-to-model race condition — use fallback instead of pool[0]"
```
---
### Task 3: Fix pronunciation fixes producing literal phonetic text
**Problem:** `_PRONUNCIATION_FIXES` replaces "Animas" with "Ah nee mahs" as literal text. TTS reads each word separately ("Ah" "nee" "mahs") instead of blending into the intended pronunciation.
**Files:**
- Modify: `backend/main.py:9141-9152` (`_PRONUNCIATION_FIXES`)
- Modify: `backend/main.py:9212-9216` (`_apply_pronunciation_fixes`)
**Step 1: Remove pronunciation fixes that sound worse than originals**
The Inworld TTS actually handles most proper nouns fine. The fixes were added speculatively and cause more harm than good. Remove the place names that TTS can handle, keep only abbreviations:
Replace `_PRONUNCIATION_FIXES` (lines 9141-9152):
```python
_PRONUNCIATION_FIXES = {
"Castopod": "Casto pod",
"vs": "versus",
"govt": "government",
"dept": "department",
}
```
Remove `Lordsburg`, `Hachita`, `Deming`, `Bootheel`, `Animas`, and `Rodeo`. These place names either sound fine through TTS or the phonetic replacement sounds worse.
**Step 2: Commit**
```bash
git add backend/main.py
git commit -m "Remove pronunciation fixes that produce worse TTS output"
```
---
### Task 4: Add age-awareness to voice matching
**Problem:** Brandy (55 years old) got "Kayla" (young-sounding voice). `_match_voices_to_styles()` scores on style dimensions (weight, energy, warmth, age_feel) but the `age_feel` preference comes from the communication style, not the character's actual age. A "confrontational" style prefers `age_feel: None` (no preference), so a 55-year-old can get a young voice.
**Files:**
- Modify: `backend/main.py:6106-6156` (`_match_voices_to_styles`)
**Step 1: Add character age to voice scoring**
In `_match_voices_to_styles`, after getting the style preferences, override `age_feel` based on the caller's actual age from their background:
```python
def _match_voices_to_styles():
"""Re-assign voices to match caller communication styles after backgrounds are generated."""
from .services.tts import VOICE_PROFILES
for key, base in CALLER_BASES.items():
if base.get("returning"):
continue
style_raw = session.caller_styles.get(key, "")
if not style_raw:
continue
style_key = _normalize_style_key(style_raw)
prefs = STYLE_VOICE_PREFERENCES.get(style_key)
if not prefs:
continue
# Copy prefs so we don't mutate the shared dict
prefs = dict(prefs)
# Override age_feel based on character's actual age
bg = session.caller_backgrounds.get(key)
if isinstance(bg, CallerBackground) and bg.age:
if bg.age >= 50:
prefs["age_feel"] = "mature"
elif bg.age >= 35:
prefs["age_feel"] = "middle"
elif bg.age < 25:
prefs["age_feel"] = "young"
# 25-34: keep style preference or None
gender = base["gender"]
pool = INWORLD_MALE_VOICES if gender == "male" else INWORLD_FEMALE_VOICES
voice_pool = [v for v in pool if v not in BLACKLISTED_VOICES]
scored = []
for voice_name in voice_pool:
profile = VOICE_PROFILES.get(voice_name)
if not profile:
scored.append((voice_name, 0))
continue
score = 0
for dim in ["weight", "energy", "warmth", "age_feel"]:
pref_val = prefs.get(dim)
if pref_val and profile.get(dim) == pref_val:
score += 1
scored.append((voice_name, score))
if scored:
names = [s[0] for s in scored]
weights = [max(1, s[1] * 3) for s in scored]
chosen = random.choices(names, weights=weights, k=1)[0]
used_voices = {CALLER_BASES[k]["voice"] for k in CALLER_BASES if k != key and "voice" in CALLER_BASES[k]}
if chosen in used_voices:
alternatives = [(n, w) for n, w in zip(names, weights) if n not in used_voices]
if alternatives:
alt_names, alt_weights = zip(*alternatives)
chosen = random.choices(alt_names, weights=alt_weights, k=1)[0]
old_voice = base.get("voice", "")
base["voice"] = chosen
if old_voice != chosen:
print(f"[VoiceMatch] {base.get('name', key)}: {old_voice}{chosen} (style: {style_key}, age: {bg.age if isinstance(bg, CallerBackground) else '?'})")
```
**Step 2: Commit**
```bash
git add backend/main.py
git commit -m "Add age-awareness to voice matching — 55yo won't get young voices"
```
---
### Task 5: Raise minimum response word count
**Problem:** `MIN_RESPONSE_WORDS = 30` lets through fragmented, telegram-style responses that are technically 30+ words but terrible radio.
**Files:**
- Modify: `backend/main.py:8844` (`MIN_RESPONSE_WORDS`)
**Step 1: Raise the minimum**
Change line 8844:
```python
MIN_RESPONSE_WORDS = 50 # Retry if response is shorter than this
```
50 words is roughly 2-3 spoken sentences — enough to be a coherent radio response without being overly demanding for short-form exchanges.
**Step 2: Commit**
```bash
git add backend/main.py
git commit -m "Raise MIN_RESPONSE_WORDS from 30 to 50"
```
+261
View File
@@ -0,0 +1,261 @@
"""Fetch instrumental background music from Jamendo for the radio show.
Pixabay has no public music API — this uses Jamendo's free API instead.
All tracks are Creative Commons licensed. Attribution is saved to music/CREDITS.txt.
Setup: Get a free client_id at https://devportal.jamendo.com
Add JAMENDO_CLIENT_ID=your_id to .env
Usage:
python fetch_music.py # download 20 tracks across all genres
python fetch_music.py --genre jazz # download jazz only
python fetch_music.py --count 50 # download 50 tracks
python fetch_music.py --list # just list available tracks, don't download
"""
import argparse
import os
import re
import sys
from pathlib import Path
import httpx
from dotenv import load_dotenv
load_dotenv()
MUSIC_DIR = Path(__file__).parent / "music"
CREDITS_FILE = MUSIC_DIR / "CREDITS.txt"
API_BASE = "https://api.jamendo.com/v3.0"
# Genres good for a late-night radio show
GENRES = ["jazz", "lofi", "blues", "ambient", "acoustic", "funk", "chill"]
# Map search tags to labels that _detect_genre() in main.py can match
# jazz, blues, funk, lo-fi are already in GENRE_KEYWORDS
# ambient, acoustic, chill would need to be added for auto-detection
GENRE_LABELS = {
"jazz": "Jazz",
"lofi": "Lo-Fi",
"blues": "Blues",
"ambient": "Ambient",
"acoustic": "Acoustic",
"funk": "Funk",
"chill": "Chill",
}
def get_client_id():
key = os.getenv("JAMENDO_CLIENT_ID")
if not key:
print("Error: JAMENDO_CLIENT_ID not found in .env")
print("Get one free at https://devportal.jamendo.com")
sys.exit(1)
return key
def sanitize_filename(name: str) -> str:
return re.sub(r'[<>:"/\\|?*]', '', name).strip()
def _has_vocals(track: dict) -> bool:
"""Check musicinfo for vocal indicators — catches tracks Jamendo mis-tagged as instrumental."""
mi = track.get("musicinfo", {})
# Check the vocalinstrumental field in musicinfo (separate from the API filter)
vi = mi.get("vocalinstrumental")
if vi and vi.lower() == "vocal":
return True
# Check tags for vocal/singing indicators
tags = mi.get("tags", {})
# tags can be {"genres": [...], "instruments": [...], "vartags": [...]}
all_tags = []
if isinstance(tags, dict):
for v in tags.values():
if isinstance(v, list):
all_tags.extend(t.lower() for t in v)
elif isinstance(tags, list):
all_tags = [t.lower() for t in tags]
vocal_tags = {"vocals", "vocal", "singing", "singer", "voice", "lyrics",
"rap", "hiphop", "hip-hop", "spoken", "spoken word"}
if vocal_tags & set(all_tags):
return True
# Check track name for vocal giveaways
name_lower = track.get("name", "").lower()
if any(w in name_lower for w in ["feat.", "ft.", "vocal", "remix vocal", "(voice"]):
return True
return False
def search_tracks(client: httpx.Client, client_id: str, genre: str, limit: int = 20) -> list[dict]:
# Request more than needed so we can filter out vocal false positives
fetch_limit = min(limit * 3, 200)
params = {
"client_id": client_id,
"format": "json",
"limit": fetch_limit,
"vocalinstrumental": "instrumental",
"fuzzytags": genre,
"durationbetween": "60_300",
"include": "musicinfo+licenses",
"order": "popularity_total",
}
resp = client.get(f"{API_BASE}/tracks/", params=params)
resp.raise_for_status()
data = resp.json()
if data["headers"]["status"] != "success":
print(f" API error: {data['headers'].get('error_message', 'unknown')}")
return []
results = data.get("results", [])
# Post-filter: reject tracks with vocal indicators despite the API filter
filtered = []
for t in results:
if _has_vocals(t):
print(f" SKIP (vocals detected): {t.get('artist_name', '?')} - {t.get('name', '?')}")
continue
filtered.append(t)
if len(filtered) >= limit:
break
skipped = len(results) - len(filtered)
if skipped:
print(f" (filtered out {skipped} tracks with vocal indicators)")
return filtered
def make_filename(track: dict, genre_tag: str) -> str:
artist = sanitize_filename(track.get("artist_name", "Unknown"))
title = sanitize_filename(track.get("name", "Untitled"))
label = GENRE_LABELS.get(genre_tag, genre_tag.title())
# Include genre tag if not already detectable from artist/title
lower = f"{artist} {title}".lower()
needs_tag = not any(kw in lower for kw in [genre_tag, label.lower()])
if needs_tag:
return f"{artist} - {title} [{label}].mp3"
return f"{artist} - {title}.mp3"
def download_track(client: httpx.Client, track: dict, filepath: Path, index: int, total: int) -> bool:
url = track.get("audiodownload")
if not url:
print(f" [{index}/{total}] SKIP (no download URL): {track['name']}")
return False
if not track.get("audiodownload_allowed", True):
print(f" [{index}/{total}] SKIP (download not allowed): {track['name']}")
return False
print(f" [{index}/{total}] Downloading: {filepath.name}...", end=" ", flush=True)
resp = client.get(url, follow_redirects=True)
resp.raise_for_status()
filepath.write_bytes(resp.content)
size_mb = len(resp.content) / (1024 * 1024)
dur = track.get("duration", 0)
print(f"{size_mb:.1f} MB, {dur // 60}:{dur % 60:02d}")
return True
def save_credit(track: dict, filename: str):
artist = track.get("artist_name", "Unknown")
title = track.get("name", "Untitled")
license_url = track.get("license_ccurl", "")
share_url = track.get("shareurl", "")
line = f"{filename} | {artist} - {title} | {license_url} | {share_url}\n"
existing = CREDITS_FILE.read_text() if CREDITS_FILE.exists() else ""
if filename not in existing:
with open(CREDITS_FILE, "a") as f:
if not existing:
f.write("# Music Credits (Jamendo - Creative Commons)\n")
f.write("# File | Artist - Title | License | URL\n\n")
f.write(line)
def main():
parser = argparse.ArgumentParser(description="Download instrumental music from Jamendo")
parser.add_argument("--genre", choices=GENRES, help="Download only this genre")
parser.add_argument("--count", type=int, default=20, help="Total tracks to download (default: 20)")
parser.add_argument("--list", action="store_true", help="List available tracks without downloading")
args = parser.parse_args()
client_id = get_client_id()
MUSIC_DIR.mkdir(exist_ok=True)
genres = [args.genre] if args.genre else GENRES
per_genre = max(1, args.count // len(genres))
remainder = args.count - per_genre * len(genres)
all_tracks = []
seen_ids = set()
with httpx.Client(timeout=30) as api_client:
for i, genre in enumerate(genres):
limit = per_genre + (1 if i < remainder else 0)
if limit <= 0:
continue
print(f"Searching {genre}...", end=" ", flush=True)
tracks = search_tracks(api_client, client_id, genre, limit)
# Deduplicate across genres
added = 0
for t in tracks:
if t["id"] not in seen_ids and added < limit:
t["_genre_tag"] = genre
all_tracks.append(t)
seen_ids.add(t["id"])
added += 1
print(f"{added} tracks")
if not all_tracks:
print("No tracks found.")
return
if args.list:
print(f"\n{'#':<4} {'Genre':<10} {'Artist':<25} {'Title':<40} {'Duration':<8}")
print("-" * 90)
for i, t in enumerate(all_tracks, 1):
dur = f"{t['duration'] // 60}:{t['duration'] % 60:02d}"
artist = t["artist_name"][:24]
title = t["name"][:39]
label = GENRE_LABELS.get(t["_genre_tag"], t["_genre_tag"])
print(f"{i:<4} {label:<10} {artist:<25} {title:<40} {dur:<8}")
print(f"\n{len(all_tracks)} tracks available")
return
# Download phase
downloaded = 0
skipped_exists = 0
skipped_error = 0
with httpx.Client(timeout=120, follow_redirects=True) as dl_client:
for i, track in enumerate(all_tracks, 1):
filename = make_filename(track, track["_genre_tag"])
filepath = MUSIC_DIR / filename
if filepath.exists():
print(f" [{i}/{len(all_tracks)}] EXISTS: {filename}")
skipped_exists += 1
continue
try:
if download_track(dl_client, track, filepath, i, len(all_tracks)):
save_credit(track, filename)
downloaded += 1
else:
skipped_error += 1
except Exception as e:
print(f" [{i}/{len(all_tracks)}] ERROR: {e}")
# Clean up partial download
if filepath.exists():
filepath.unlink()
skipped_error += 1
print(f"\nDone: {downloaded} downloaded, {skipped_exists} existed, {skipped_error} skipped")
if __name__ == "__main__":
main()
+343 -16
View File
@@ -113,6 +113,69 @@ header button:hover {
border-color: rgba(232, 121, 29, 0.3);
}
.theme-bar {
display: flex;
align-items: center;
gap: 6px;
padding: 4px 12px;
background: rgba(255, 255, 255, 0.05);
border-radius: 6px;
}
.theme-label {
font-size: 0.8rem;
color: #aaa;
white-space: nowrap;
}
.theme-input {
background: rgba(255, 255, 255, 0.08);
border: 1px solid rgba(255, 255, 255, 0.15);
border-radius: 4px;
color: #fff;
padding: 4px 8px;
font-size: 0.85rem;
width: 200px;
}
.theme-input:focus {
outline: none;
border-color: #f5a623;
}
.theme-input.active {
border-color: #f5a623;
background: rgba(245, 166, 35, 0.1);
}
.theme-btn {
padding: 4px 10px;
border-radius: 4px;
border: none;
cursor: pointer;
font-size: 0.8rem;
}
.theme-btn.set {
background: #f5a623;
color: #000;
}
.theme-btn.set:hover {
background: #e6991a;
}
.theme-btn.clear {
background: rgba(255, 255, 255, 0.1);
color: #aaa;
padding: 4px 6px;
}
.theme-btn.clear:hover {
background: rgba(255, 80, 80, 0.3);
color: #ff5050;
}
.on-air-btn {
font-weight: 700;
text-transform: uppercase;
@@ -284,9 +347,14 @@ section h2 {
}
.caller-btn.active {
background: var(--accent);
border-color: var(--accent);
background: var(--bg);
border-color: transparent;
}
.caller-btn.active .caller-name {
color: #fff;
background: var(--accent);
padding: 2px 8px;
border-radius: 4px;
}
.call-status {
@@ -400,6 +468,84 @@ section h2 {
line-height: 1.3;
}
/* Caller model indicator */
.info-badge.model {
background: rgba(100, 140, 220, 0.2);
color: #7ab0e8;
font-size: 0.7rem;
cursor: pointer;
}
.caller-model-override {
font-size: 0.7rem;
padding: 2px 4px;
background: var(--bg);
color: var(--text);
border: 1px solid rgba(100, 140, 220, 0.3);
border-radius: 4px;
max-width: 140px;
}
/* Caller button model badge */
.model-tag {
font-size: 0.55rem;
color: #7ab0e8;
background: rgba(100, 140, 220, 0.15);
padding: 0 3px;
border-radius: 2px;
font-weight: 700;
letter-spacing: 0.3px;
flex-shrink: 0;
}
/* Caller Models settings section */
.caller-model-row {
margin-bottom: 8px;
}
.caller-model-row label {
margin-bottom: 0;
}
.cm-pool-input {
font-size: 0.8rem;
}
.cm-style-grid {
display: grid;
grid-template-columns: 1fr 1fr;
gap: 4px;
margin-bottom: 8px;
max-height: 200px;
overflow-y: auto;
}
.cm-style-item {
display: flex;
align-items: center;
justify-content: space-between;
gap: 4px;
background: rgba(255, 255, 255, 0.05);
border-radius: 4px;
padding: 3px 6px;
}
.cm-style-name {
font-size: 0.7rem;
color: var(--text-muted);
white-space: nowrap;
}
.cm-style-select {
font-size: 0.7rem;
padding: 2px 3px;
background: var(--bg);
color: var(--text);
border: 1px solid rgba(232, 121, 29, 0.15);
border-radius: 4px;
max-width: 110px;
}
.caller-background-full {
margin-top: 8px;
font-size: 0.75rem;
@@ -586,19 +732,6 @@ section h2 {
margin-bottom: 10px;
}
.music-section select optgroup {
color: var(--accent);
font-weight: bold;
font-style: normal;
padding: 4px 0;
}
.music-section select option {
color: var(--text);
font-weight: normal;
padding: 2px 8px;
}
.music-controls {
display: flex;
gap: 8px;
@@ -625,6 +758,83 @@ section h2 {
accent-color: var(--accent);
}
/* Genre Quick-Select */
.genre-section {
grid-column: span 3;
}
.genre-grid {
display: flex;
flex-wrap: wrap;
gap: 6px;
margin-bottom: 8px;
}
.genre-btn {
background: var(--bg);
color: var(--text);
border: 1px solid rgba(232, 121, 29, 0.12);
padding: 6px 12px;
border-radius: var(--radius-sm);
cursor: pointer;
font-size: 0.8rem;
transition: all 0.15s;
white-space: nowrap;
}
.genre-btn:hover {
border-color: var(--accent);
background: #2a1e10;
color: #fff;
}
.genre-btn.active {
background: var(--accent);
border-color: var(--accent);
color: #fff;
font-weight: 600;
}
.now-playing {
display: flex;
align-items: center;
gap: 8px;
padding: 4px 0;
}
.now-playing-text {
font-size: 0.75rem;
color: var(--text-muted);
flex: 0 1 auto;
overflow: hidden;
text-overflow: ellipsis;
white-space: nowrap;
min-width: 0;
}
.now-playing-stop {
background: var(--bg);
color: var(--text);
border: 1px solid rgba(232, 121, 29, 0.15);
padding: 4px 10px;
border-radius: var(--radius-sm);
cursor: pointer;
font-size: 0.75rem;
flex-shrink: 0;
transition: all 0.15s;
}
.now-playing-stop:hover {
border-color: var(--accent);
background: #2a1e10;
}
.now-playing-volume {
width: 80px;
flex-shrink: 0;
accent-color: var(--accent);
}
/* Soundboard */
.sounds-section {
grid-column: span 2;
@@ -771,7 +981,7 @@ section h2 {
padding: 24px;
border-radius: var(--radius);
width: 90%;
max-width: 400px;
max-width: 550px;
border: 1px solid rgba(232, 121, 29, 0.15);
}
@@ -1525,6 +1735,16 @@ section h2 {
font-size: 0.8rem;
}
.media-row .genre-section {
grid-column: span 3;
}
@media (max-width: 700px) {
.media-row .genre-section {
grid-column: span 1;
}
}
/* Devon (Intern) */
.message.devon {
border-left: 3px solid var(--devon);
@@ -1714,3 +1934,110 @@ button:focus-visible {
.log-toggle-btn:hover {
color: var(--text);
}
/* Preflight */
.preflight-btn {
background: rgba(90, 138, 60, 0.15);
color: var(--accent-green);
border: 1px solid rgba(90, 138, 60, 0.3);
}
.preflight-btn:hover {
background: rgba(90, 138, 60, 0.25);
}
.preflight-content {
max-width: 700px;
}
.preflight-status {
display: flex;
align-items: center;
gap: 10px;
padding: 12px 16px;
border-radius: var(--radius-sm);
margin-bottom: 16px;
font-weight: 700;
font-size: 1.1rem;
}
.preflight-status.pass { background: rgba(90, 138, 60, 0.15); color: var(--accent-green); }
.preflight-status.warn { background: rgba(232, 169, 29, 0.15); color: #e8a91d; }
.preflight-status.fail { background: rgba(204, 34, 34, 0.15); color: var(--accent-red); }
.preflight-status.loading { background: rgba(232, 121, 29, 0.1); color: var(--text-muted); }
.preflight-checks {
display: flex;
flex-direction: column;
gap: 12px;
max-height: 60vh;
overflow-y: auto;
}
.preflight-check {
background: var(--bg);
border: 1px solid rgba(232, 121, 29, 0.1);
border-radius: var(--radius-sm);
padding: 12px 16px;
}
.preflight-check-header {
display: flex;
justify-content: space-between;
align-items: center;
cursor: pointer;
user-select: none;
}
.preflight-check-name {
font-weight: 600;
font-size: 0.95rem;
}
.preflight-check-badge {
font-size: 0.75rem;
font-weight: 700;
padding: 2px 8px;
border-radius: 4px;
text-transform: uppercase;
}
.preflight-check-badge.pass { background: rgba(90, 138, 60, 0.2); color: var(--accent-green); }
.preflight-check-badge.warn { background: rgba(232, 169, 29, 0.2); color: #e8a91d; }
.preflight-check-badge.fail { background: rgba(204, 34, 34, 0.2); color: var(--accent-red); }
.preflight-check-badge.skip { background: rgba(154, 139, 120, 0.2); color: var(--text-muted); }
.preflight-check-details {
margin-top: 10px;
font-size: 0.85rem;
color: var(--text-muted);
display: none;
}
.preflight-check.open .preflight-check-details {
display: block;
}
.preflight-table {
width: 100%;
border-collapse: collapse;
margin-top: 8px;
}
.preflight-table th {
text-align: left;
color: var(--text-muted);
font-size: 0.75rem;
font-weight: 600;
text-transform: uppercase;
padding: 4px 8px;
border-bottom: 1px solid rgba(232, 121, 29, 0.1);
}
.preflight-table td {
padding: 4px 8px;
font-size: 0.8rem;
color: var(--text);
border-bottom: 1px solid rgba(232, 121, 29, 0.05);
}
.preflight-table tr.mismatch td { color: var(--accent-red); }
.preflight-table tr.connected td { color: var(--accent-green); }
.preflight-test-btn {
background: rgba(232, 121, 29, 0.15);
color: var(--accent);
border: 1px solid rgba(232, 121, 29, 0.3);
}
.preflight-test-btn:hover { background: rgba(232, 121, 29, 0.25); }
.preflight-test-btn.loading { opacity: 0.6; pointer-events: none; }
+64 -9
View File
@@ -4,7 +4,7 @@
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Luke at The Roost</title>
<link rel="stylesheet" href="/css/style.css">
<link rel="stylesheet" href="/css/style.css?v=2">
</head>
<body>
<div id="app">
@@ -15,8 +15,15 @@
<button id="rec-btn" class="rec-btn" title="Record stems for post-production">REC</button>
<button id="new-session-btn" class="new-session-btn">New Session</button>
<button id="export-session-btn">Export</button>
<button id="preflight-btn" class="preflight-btn">Preflight</button>
<button id="settings-btn">Settings</button>
</div>
<div class="theme-bar">
<label for="show-theme-input" class="theme-label">Theme:</label>
<input type="text" id="show-theme-input" class="theme-input" placeholder="e.g. St. Patrick's Day" maxlength="100">
<button id="set-theme-btn" class="theme-btn set" title="Set show theme">Set</button>
<button id="clear-theme-btn" class="theme-btn clear hidden" title="Clear theme">&#x2715;</button>
</div>
<div id="show-clock" class="show-clock">
<span class="clock-time" id="clock-time"></span>
<span id="show-timers" class="show-timers hidden">
@@ -69,6 +76,8 @@
<span id="caller-shape-badge" class="info-badge shape"></span>
<span id="caller-energy-badge" class="info-badge energy"></span>
<span id="caller-emotion" class="info-badge emotion"></span>
<span id="caller-model-badge" class="info-badge model"></span>
<select id="caller-model-override" class="caller-model-override hidden"></select>
</div>
<div id="caller-signature" class="caller-signature"></div>
<div id="caller-situation" class="caller-situation"></div>
@@ -134,13 +143,13 @@
<!-- Music / Ads / Idents -->
<div class="media-row">
<section class="music-section">
<h2>Music</h2>
<select id="track-select"></select>
<div class="music-controls">
<button id="play-btn">Play <span class="shortcut-label">M</span></button>
<button id="stop-btn">Stop</button>
<input type="range" id="volume" min="0" max="100" value="30">
<section class="music-section genre-section">
<h2>Music <span class="shortcut-label">M</span></h2>
<div id="genre-buttons" class="genre-grid"></div>
<div id="now-playing" class="now-playing hidden">
<span id="now-playing-text" class="now-playing-text"></span>
<button id="stop-btn" class="now-playing-stop">Stop</button>
<input type="range" id="volume" min="0" max="100" value="30" class="now-playing-volume">
</div>
</section>
@@ -279,6 +288,36 @@
</div>
</div>
<!-- Caller Model Routing -->
<div class="settings-group">
<h3>Caller Models</h3>
<div class="caller-model-row">
<label>
Strategy
<select id="cm-strategy">
<option value="single">Single Model</option>
<option value="cycle">Cycle Models</option>
<option value="style_matched">Style-Matched</option>
</select>
</label>
</div>
<div id="cm-pool-section" class="hidden">
<label>
Model Pool
<input type="text" id="cm-pool" class="cm-pool-input" placeholder="x-ai/grok-4, deepseek/deepseek-v3.2, ...">
</label>
</div>
<div id="cm-style-map" class="hidden">
<div class="cm-style-grid" id="cm-style-grid"></div>
</div>
<div class="caller-model-row">
<label>
Fallback Model
<select id="cm-fallback" class="model-select"></select>
</label>
</div>
</div>
<!-- TTS Settings -->
<div class="settings-group">
<h3>TTS Provider</h3>
@@ -319,8 +358,24 @@
</div>
</div>
</div>
<!-- Preflight Modal -->
<div id="preflight-modal" class="modal hidden">
<div class="modal-content preflight-content">
<h2>Show Preflight</h2>
<div id="preflight-status" class="preflight-status loading">
<span class="preflight-status-icon">...</span>
<span class="preflight-status-text">Running checks...</span>
</div>
<div id="preflight-checks" class="preflight-checks"></div>
<div class="modal-buttons">
<button id="preflight-test-btn" class="preflight-test-btn">Test Responses</button>
<button id="preflight-rerun-btn">Re-run</button>
<button id="close-preflight">Close</button>
</div>
</div>
</div>
</div>
<script src="/js/app.js?v=22"></script>
<script src="/js/app.js?v=27"></script>
</body>
</html>
+735 -115
View File
File diff suppressed because it is too large Load Diff
+142 -114
View File
@@ -23,6 +23,8 @@ import tempfile
import xml.etree.ElementTree as ET
from pathlib import Path
import time
import requests
from dotenv import load_dotenv
@@ -46,6 +48,50 @@ WIDTH = 1080
HEIGHT = 1920
def _llm_request(prompt: str, max_tokens: int = 2048, temperature: float = 0.3,
timeout: int = 60) -> str | None:
"""Make an LLM API call with timeout and retry. Returns content or None on failure."""
for attempt in range(2):
try:
response = requests.post(
"https://openrouter.ai/api/v1/chat/completions",
headers={
"Authorization": f"Bearer {OPENROUTER_API_KEY}",
"Content-Type": "application/json",
},
json={
"model": "anthropic/claude-sonnet-4-5",
"messages": [{"role": "user", "content": prompt}],
"max_tokens": max_tokens,
"temperature": temperature,
},
timeout=timeout,
)
if response.status_code != 200:
print(f" LLM error (HTTP {response.status_code}): {response.text[:200]}")
if attempt == 0:
print(f" Retrying in 5s...")
time.sleep(5)
continue
return None
return response.json()["choices"][0]["message"]["content"].strip()
except requests.Timeout:
print(f" LLM request timed out ({timeout}s)")
if attempt == 0:
print(f" Retrying in 5s...")
time.sleep(5)
continue
return None
except Exception as e:
print(f" LLM request failed: {e}")
if attempt == 0:
print(f" Retrying in 5s...")
time.sleep(5)
continue
return None
return None
def _build_whisper_prompt(labeled_transcript: str) -> str:
"""Build an initial_prompt for Whisper from the labeled transcript.
@@ -186,7 +232,12 @@ def refine_clip_timestamps(audio_path: str, clips: list[dict],
"ffmpeg", "-y", "-ss", str(seg_start), "-t", str(seg_end - seg_start),
"-i", audio_path, "-ar", "16000", "-ac", "1", seg_path,
]
result = subprocess.run(cmd, capture_output=True, text=True)
try:
result = subprocess.run(cmd, capture_output=True, text=True, timeout=120)
except subprocess.TimeoutExpired:
print(f" Clip {i+1}: ffmpeg timed out (120s), skipping")
refined[i] = []
continue
if result.returncode != 0:
print(f" Clip {i+1}: Failed to extract segment")
refined[i] = []
@@ -279,25 +330,11 @@ IMPORTANT:
Respond with ONLY a JSON array, no markdown or explanation:
[{{"title": "...", "start_time": 0.0, "end_time": 0.0, "caption_text": "..."}}]"""
response = requests.post(
"https://openrouter.ai/api/v1/chat/completions",
headers={
"Authorization": f"Bearer {OPENROUTER_API_KEY}",
"Content-Type": "application/json",
},
json={
"model": "anthropic/claude-sonnet-4-5",
"messages": [{"role": "user", "content": prompt}],
"max_tokens": 2048,
"temperature": 0.3,
},
)
content = _llm_request(prompt, max_tokens=2048, temperature=0.3, timeout=60)
if content is None:
print(" Failed to get clip selections from LLM — aborting")
return []
if response.status_code != 200:
print(f"Error from OpenRouter: {response.text}")
sys.exit(1)
content = response.json()["choices"][0]["message"]["content"].strip()
if content.startswith("```"):
content = re.sub(r"^```(?:json)?\n?", "", content)
content = re.sub(r"\n?```$", "", content)
@@ -307,7 +344,7 @@ Respond with ONLY a JSON array, no markdown or explanation:
except json.JSONDecodeError as e:
print(f"Error parsing LLM response: {e}")
print(f"Response was: {content[:500]}")
sys.exit(1)
return []
# Validate and clamp durations
validated = []
@@ -349,25 +386,11 @@ For each clip, generate:
Respond with ONLY a JSON array matching the clip order:
[{{"description": "...", "hashtags": ["#tag1", "#tag2", ...]}}]"""
response = requests.post(
"https://openrouter.ai/api/v1/chat/completions",
headers={
"Authorization": f"Bearer {OPENROUTER_API_KEY}",
"Content-Type": "application/json",
},
json={
"model": "anthropic/claude-sonnet-4-5",
"messages": [{"role": "user", "content": prompt}],
"max_tokens": 2048,
"temperature": 0.7,
},
)
if response.status_code != 200:
print(f"Error from OpenRouter: {response.text}")
content = _llm_request(prompt, max_tokens=2048, temperature=0.7, timeout=60)
if content is None:
print(" Failed to generate social metadata — skipping")
return clips
content = response.json()["choices"][0]["message"]["content"].strip()
if content.startswith("```"):
content = re.sub(r"^```(?:json)?\n?", "", content)
content = re.sub(r"\n?```$", "", content)
@@ -777,43 +800,25 @@ RULES:
RAW TEXT ({len(words)} words):
{raw_text}"""
try:
response = requests.post(
"https://openrouter.ai/api/v1/chat/completions",
headers={
"Authorization": f"Bearer {OPENROUTER_API_KEY}",
"Content-Type": "application/json",
},
json={
"model": "anthropic/claude-sonnet-4-5",
"messages": [{"role": "user", "content": prompt}],
"max_tokens": 2048,
"temperature": 0,
},
timeout=30,
)
if response.status_code != 200:
print(f" Polish failed ({response.status_code}), using raw text")
return words
polished = _llm_request(prompt, max_tokens=2048, temperature=0, timeout=30)
if polished is None:
print(f" Polish failed, using raw text")
return words
polished = response.json()["choices"][0]["message"]["content"].strip()
polished_words = polished.split()
polished_words = polished.split()
if len(polished_words) != len(words):
print(f" Polish word count mismatch ({len(polished_words)} vs {len(words)}), using raw text")
return words
if len(polished_words) != len(words):
print(f" Polish word count mismatch ({len(polished_words)} vs {len(words)}), using raw text")
return words
changes = 0
for i, pw in enumerate(polished_words):
if pw != words[i]["word"]:
changes += 1
words[i]["word"] = pw
changes = 0
for i, pw in enumerate(polished_words):
if pw != words[i]["word"]:
changes += 1
words[i]["word"] = pw
if changes:
print(f" Polished {changes} words")
except Exception as e:
print(f" Polish error: {e}")
if changes:
print(f" Polished {changes} words")
return words
@@ -898,8 +903,12 @@ def extract_clip_audio(audio_path: str, start: float, end: float,
output_path,
]
result = subprocess.run(cmd, capture_output=True, text=True)
return result.returncode == 0
try:
result = subprocess.run(cmd, capture_output=True, text=True, timeout=120)
return result.returncode == 0
except subprocess.TimeoutExpired:
print(f" ffmpeg audio extraction timed out (120s)")
return False
def generate_background_image(episode_number: int, clip_title: str,
@@ -1153,7 +1162,11 @@ def generate_clip_video(audio_path: str, background_path: str,
output_path,
]
result = subprocess.run(cmd, capture_output=True, text=True)
try:
result = subprocess.run(cmd, capture_output=True, text=True, timeout=300)
except subprocess.TimeoutExpired:
print(f" ffmpeg video generation timed out (300s)")
return False
if result.returncode != 0:
print(f" ffmpeg error: {result.stderr[-300:]}")
return False
@@ -1235,7 +1248,12 @@ def generate_clip_video_remotion(
output_path,
]
result = subprocess.run(cmd, capture_output=True, text=True, cwd=str(REMOTION_DIR))
try:
result = subprocess.run(cmd, capture_output=True, text=True, cwd=str(REMOTION_DIR), timeout=180)
except subprocess.TimeoutExpired:
props_path.unlink(missing_ok=True)
print(f" Remotion render timed out (180s)")
return False
props_path.unlink(missing_ok=True)
if result.returncode != 0:
@@ -1488,6 +1506,9 @@ def main():
print(f"\n[3/{step_total}] Selecting {args.count} best moments with LLM...")
clips = select_clips_with_llm(transcript_text, labeled_transcript,
chapters_json, args.count)
if not clips:
print("\nNo clips selected — aborting.")
return
# Snap to sentence boundaries so clips don't start/end mid-sentence
clips = snap_to_sentences(clips, segments)
@@ -1524,14 +1545,18 @@ def main():
extract_step = 6 if two_pass else 5
print(f"\n[{extract_step}/{step_total}] Extracting audio clips...")
for i, clip in enumerate(clips):
print(f" [{i+1}/{len(clips)}] \"{clip['title']}\"...")
slug = slugify(clip["title"])
mp3_path = output_dir / f"clip-{i+1}-{slug}.mp3"
if extract_clip_audio(str(audio_path), clip["start_time"], clip["end_time"],
str(mp3_path)):
print(f" Clip {i+1} audio: {mp3_path.name}")
else:
print(f" Error extracting clip {i+1} audio")
try:
if extract_clip_audio(str(audio_path), clip["start_time"], clip["end_time"],
str(mp3_path)):
print(f" Clip {i+1} audio: {mp3_path.name}")
else:
print(f" Error extracting clip {i+1} audio — skipping")
except Exception as e:
print(f" Clip {i+1} audio failed: {e} — skipping")
video_step = 7 if two_pass else 6
if args.audio_only:
@@ -1553,49 +1578,52 @@ def main():
mp4_path = output_dir / f"clip-{i+1}-{slug}.mp4"
duration = clip["end_time"] - clip["start_time"]
print(f" Clip {i+1}: Generating video...")
print(f" [{i+1}/{len(clips)}] \"{clip['title']}\" ({duration:.0f}s)...")
# Get word timestamps — use refined segments if available
word_source = refined[i] if (two_pass and i in refined and refined[i]) else segments
clip_words = get_words_in_range(word_source, clip["start_time"], clip["end_time"])
try:
# Get word timestamps — use refined segments if available
word_source = refined[i] if (two_pass and i in refined and refined[i]) else segments
clip_words = get_words_in_range(word_source, clip["start_time"], clip["end_time"])
# Add speaker labels
clip_words = add_speaker_labels(clip_words, labeled_transcript,
clip["start_time"], clip["end_time"],
word_source)
# Add speaker labels
clip_words = add_speaker_labels(clip_words, labeled_transcript,
clip["start_time"], clip["end_time"],
word_source)
# Polish text with LLM (fix punctuation, capitalization, mishearings)
clip_words = polish_clip_words(clip_words, labeled_transcript)
# Polish text with LLM (fix punctuation, capitalization, mishearings)
clip_words = polish_clip_words(clip_words, labeled_transcript)
# Group words into timed caption lines
caption_lines = group_words_into_lines(
clip_words, clip["start_time"], duration
)
# Group words into timed caption lines
caption_lines = group_words_into_lines(
clip_words, clip["start_time"], duration
)
if use_remotion:
if generate_clip_video_remotion(
str(mp3_path), caption_lines, clip["start_time"],
clip["title"], episode_number, str(mp4_path), duration
):
file_size = mp4_path.stat().st_size / (1024 * 1024)
print(f" Clip {i+1} video: {mp4_path.name} ({file_size:.1f} MB)")
if use_remotion:
if generate_clip_video_remotion(
str(mp3_path), caption_lines, clip["start_time"],
clip["title"], episode_number, str(mp4_path), duration
):
file_size = mp4_path.stat().st_size / (1024 * 1024)
print(f" Clip {i+1} video: {mp4_path.name} ({file_size:.1f} MB)")
else:
print(f" Clip {i+1} video failed (Remotion) — skipping")
else:
print(f" Error generating clip {i+1} video (Remotion)")
else:
# Legacy PIL+ffmpeg renderer
bg_path = str(tmp_dir / f"bg_{i}.png")
generate_background_image(episode_number, clip["title"], bg_path)
# Legacy PIL+ffmpeg renderer
bg_path = str(tmp_dir / f"bg_{i}.png")
generate_background_image(episode_number, clip["title"], bg_path)
clip_tmp = tmp_dir / f"clip_{i}"
clip_tmp.mkdir(exist_ok=True)
clip_tmp = tmp_dir / f"clip_{i}"
clip_tmp.mkdir(exist_ok=True)
if generate_clip_video(str(mp3_path), bg_path, caption_lines,
clip["start_time"], str(mp4_path),
duration, clip_tmp):
file_size = mp4_path.stat().st_size / (1024 * 1024)
print(f" Clip {i+1} video: {mp4_path.name} ({file_size:.1f} MB)")
else:
print(f" Error generating clip {i+1} video")
if generate_clip_video(str(mp3_path), bg_path, caption_lines,
clip["start_time"], str(mp4_path),
duration, clip_tmp):
file_size = mp4_path.stat().st_size / (1024 * 1024)
print(f" Clip {i+1} video: {mp4_path.name} ({file_size:.1f} MB)")
else:
print(f" Clip {i+1} video failed (ffmpeg) — skipping")
except Exception as e:
print(f" Clip {i+1} video failed: {e} — skipping")
# Save clips metadata for social upload
metadata_path = output_dir / "clips-metadata.json"
+132 -44
View File
@@ -19,7 +19,8 @@ import shutil
import subprocess
import sys
import tempfile
from datetime import datetime, timezone
import time
from datetime import datetime, timedelta, timezone
from pathlib import Path
import ssl
@@ -303,7 +304,7 @@ TRANSCRIPT:
{timestamped_text}
Generate a JSON response with:
1. "title": A catchy episode title (include "Episode {episode_number}:" prefix)
1. "title": An episode title with "Episode {episode_number}:" prefix. The title MUST reference something SPECIFIC from this episode — a caller's name, their situation, a memorable quote, or a specific moment. Good titles sound like you're telling a friend what happened: "Episode 12: Gary's Goat Problem and the Worst Best Man Speech Ever", "Episode 8: The Lawnmower Feud, a Cursed Wedding Ring, and Darla Finally Calls Back". Bad titles are generic and could apply to any podcast episode: "Secrets and Confessions", "Late Night Tales", "Wild Stories and Hot Takes". Avoid the words: secrets, confessions, tales, chronicles, diaries, unfiltered, raw, real talk.
2. "description": A 2-4 sentence description summarizing the episode's content. Mention callers by name and their topics. End with something engaging.
3. "chapters": An array of chapter objects with "startTime" (in seconds) and "title". Include:
- "Intro" at 0 seconds
@@ -1081,9 +1082,79 @@ def upload_image_to_postiz(image_path: str) -> dict | None:
return None
def post_to_social(metadata: dict, episode_slug: str, image_path: str = None):
def _build_platform_content(metadata: dict, episode_url: str, yt_url: str | None,
platform: str) -> str:
"""Generate platform-tailored social post content for episode announcements."""
title = metadata["title"]
desc = metadata["description"]
if platform == "x":
hook = desc.split(". ")[0] + "."
content = f"{hook}\n\n{episode_url}\n\n#LukeAtTheRoost #podcast"
if len(content) > 280:
content = f"{title}\n\n{episode_url}"[:280]
elif platform == "instagram":
hashtags = ("#podcast #LukeAtTheRoost #talkradio #callinshow #newepisode "
"#podcastlife #podcastrecommendations #comedy #advice "
"#latenightradio #aipodcast #talkshow")
content = f"New episode 🎙️\n\n{desc}\n\nLink in bio.\n\n{hashtags}"
elif platform == "threads":
content = (f"{title}\n\n{desc}\n\nlukeattheroost.com"
f"\n\n#podcast #LukeAtTheRoost #newepisode #callinshow")
elif platform == "bluesky":
content = f"{desc}\n\n{episode_url}"
if len(content) > 300:
avail = 300 - len(episode_url) - 2
content = desc[:avail].rsplit(" ", 1)[0] + "\n\n" + episode_url
elif platform == "mastodon":
content = f"{title}\n\n{desc}\n\n{episode_url}"
if yt_url:
content += f"\n{yt_url}"
elif platform == "linkedin":
content = f"{title}\n\n{desc}"
content += f"\n\nListen: {episode_url}"
if yt_url:
content += f"\nWatch: {yt_url}"
elif platform == "facebook":
content = f"New episode just dropped 🎙️\n\n{desc}\n\nListen free: {episode_url}"
if yt_url:
content += f"\nWatch: {yt_url}"
elif platform == "tiktok":
hook = desc.split(". ")[0] + "."
content = (f"New episode: {hook}"
f"\n\n#podcast #LukeAtTheRoost #callinshow #newepisode #fyp")
elif platform == "nostr":
content = f"{title}\n\n{desc}\n\n{episode_url}"
if yt_url:
content += f"\n{yt_url}"
else:
content = f"{title}\n\n{desc}\n\n{episode_url}"
return content
# Platforms that post immediately vs scheduled (minutes offset from publish time)
_IMMEDIATE_PLATFORMS = {"x", "bluesky"}
_SCHEDULE_OFFSETS = {
"instagram": 30, "threads": 30,
"facebook": 60, "linkedin": 60,
"tiktok": 90, "mastodon": 120, "nostr": 120,
}
def post_to_social(metadata: dict, episode_slug: str, image_path: str = None,
yt_video_id: str = None):
"""Post episode announcement to all connected social channels via Postiz."""
print("[5.5/5] Posting to social media...")
print("[5.7] Posting to social media...")
token = _get_postiz_token()
@@ -1095,31 +1166,17 @@ def post_to_social(metadata: dict, episode_slug: str, image_path: str = None):
image_ids = [{"id": media["id"], "path": media.get("path", "")}]
episode_url = f"https://lukeattheroost.com/episode.html?slug={episode_slug}"
base_content = f"{metadata['title']}\n\n{metadata['description']}\n\n{episode_url}"
hashtags = "#podcast #LukeAtTheRoost #talkradio #callinshow #newepisode"
hashtag_platforms = {"instagram", "facebook", "bluesky", "mastodon", "nostr", "linkedin", "threads", "tiktok", "x"}
# Platform-specific content length limits
PLATFORM_MAX_LENGTH = {"bluesky": 300, "x": 280, "threads": 500, "tiktok": 2200}
yt_url = f"https://youtube.com/watch?v={yt_video_id}" if yt_video_id else None
now = datetime.now(timezone.utc)
# Post to each platform individually so one failure doesn't block others
posted = 0
for platform, intg_config in POSTIZ_INTEGRATIONS.items():
content = base_content
if platform in hashtag_platforms:
content += f"\n\n{hashtags}"
content = _build_platform_content(metadata, episode_url, yt_url, platform)
# Truncate for platforms with short limits
max_len = PLATFORM_MAX_LENGTH.get(platform)
if max_len and len(content) > max_len:
# Keep title + URL, truncate description
short = f"{metadata['title']}\n\n{episode_url}"
if platform in hashtag_platforms:
short += f"\n\n{hashtags}"
content = short[:max_len]
settings = {"post_type": "post"}
settings = {"__type": platform, "post_type": "post"}
if platform == "x":
settings["who_can_reply_post"] = "everyone"
if "channel" in intg_config:
settings["channel"] = intg_config["channel"]
@@ -1129,30 +1186,46 @@ def post_to_social(metadata: dict, episode_slug: str, image_path: str = None):
"settings": settings,
}
# Stagger: immediate for fast-moving platforms, scheduled for rest
offset_min = _SCHEDULE_OFFSETS.get(platform, 0)
if platform in _IMMEDIATE_PLATFORMS or offset_min == 0:
post_type = "now"
post_date = now.strftime("%Y-%m-%dT%H:%M:%S.000Z")
else:
post_type = "schedule"
scheduled = now + timedelta(minutes=offset_min)
post_date = scheduled.strftime("%Y-%m-%dT%H:%M:%S.000Z")
payload = {
"type": "now",
"type": post_type,
"shortLink": False,
"date": datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%S.000Z"),
"date": post_date,
"tags": [],
"posts": [post],
}
try:
resp = requests.post(
f"{POSTIZ_URL}/api/posts",
headers={"auth": token, "Content-Type": "application/json"},
json=payload,
timeout=60,
)
if resp.status_code in (200, 201):
posted += 1
print(f" Posted to {platform}")
else:
print(f" Warning: {platform} failed ({resp.status_code}): {resp.text[:150]}")
except Exception as e:
print(f" Warning: {platform} failed: {e}")
# Retry once on failure (2 attempts, 5s backoff)
for attempt in range(2):
try:
resp = requests.post(
f"{POSTIZ_URL}/api/posts",
headers={"auth": token, "Content-Type": "application/json"},
json=payload,
timeout=60,
)
if resp.status_code in (200, 201):
posted += 1
label = f"scheduled +{offset_min}m" if post_type == "schedule" else "posted"
print(f" {platform}: {label}")
break
else:
print(f" Warning: {platform} attempt {attempt + 1} failed ({resp.status_code}): {resp.text[:150]}")
except Exception as e:
print(f" Warning: {platform} attempt {attempt + 1} failed: {e}")
if attempt < 1:
time.sleep(5)
print(f" Posted to {posted}/{len(POSTIZ_INTEGRATIONS)} channels")
print(f" Posted/scheduled {posted}/{len(POSTIZ_INTEGRATIONS)} channels")
def get_youtube_service():
@@ -1199,6 +1272,21 @@ def _check_youtube_duplicate(youtube, title: str) -> str | None:
return None
def _extract_youtube_tags(metadata: dict) -> list[str]:
"""Extract dynamic tags from episode metadata for YouTube SEO."""
base_tags = ["podcast", "Luke at the Roost", "talk radio", "call-in show",
"talk show", "comedy", "AI podcast", "late night radio", "advice"]
skip = {"intro", "outro", "opening", "closing", "wrap up", "wrap-up"}
dynamic = []
for ch in metadata.get("chapters", []):
title = ch.get("title", "").strip()
if title.lower() in skip or len(title) < 3:
continue
if len(title) <= 50:
dynamic.append(title)
return (base_tags + dynamic)[:25]
def upload_to_youtube(audio_path: str, metadata: dict, chapters: list,
episode_slug: str) -> str | None:
"""Convert audio to video with cover art, upload to YouTube, add to podcast playlist."""
@@ -1257,8 +1345,7 @@ def upload_to_youtube(audio_path: str, metadata: dict, chapters: list,
"snippet": {
"title": metadata["title"][:100],
"description": description,
"tags": ["podcast", "Luke at the Roost", "talk radio", "call-in show",
"talk show", "comedy"],
"tags": _extract_youtube_tags(metadata),
"categoryId": "22",
},
"status": {
@@ -1629,7 +1716,8 @@ def main():
else:
social_image_path = str(audio_path.with_suffix(".social.jpg"))
generate_social_image(episode_number, metadata["description"], social_image_path)
post_to_social(metadata, episode["slug"], social_image_path)
post_to_social(metadata, episode["slug"], social_image_path,
yt_video_id=yt_video_id)
_mark_step_done(episode_number, "social")
# Step 6: Summary
+224 -67
View File
@@ -9,12 +9,16 @@
---------------------------------------------------------------------------
local SILENCE_DB = -30 -- dBFS — anything below this is "silence"
local MIN_SILENCE_SEC = 6.0 -- same-speaker gaps: only remove silences longer than this
local MIN_SILENCE_TRANSITION_SEC = 2.5 -- cross-speaker gaps: shorter threshold for speaker transitions
local MAX_SILENCE_SEC = 999 -- no practical limit (IDENT/AD regions protect real breaks)
local MIN_SILENCE_TRANSITION_SEC = 5.0 -- cross-speaker gaps: threshold for caller TTS latency
local MIN_SILENCE_DEVON_SEC = 3.0 -- Devon gaps: interjections are prerendered (~2-3s gaps), conversational TTS is 6s+
local DEVON_TRACK = 2 -- 1-indexed: Devon track number
local MIN_VOICE_SEC = 0.3 -- ignore non-silent bursts shorter than this (filters transients)
local KEEP_PAD_SEC = 0.5 -- leave this much silence on each side of a cut
local BLOCK_SEC = 0.1 -- analysis block size (100ms)
local SAMPLE_RATE = 48000
local CHECK_TRACKS = {1, 2, 3, 4} -- 1-indexed: Host, Devon, Live Caller, AI Caller
local CHECK_TRACKS = {1, 2, 3, 4} -- 1-indexed: Host, Devon, AI Caller, Live Caller
local SFX_TRACK = 5 -- 1-indexed: SFX track
local IDENTS_TRACK = 6 -- 1-indexed: Idents track
local ADS_TRACK = 7 -- 1-indexed: Ads track
local MUSIC_TRACK = 8 -- 1-indexed: Music track
@@ -25,7 +29,6 @@ local YIELD_INTERVAL = 200 -- yield to REAPER every N blocks (~20s of audio)
local BLOCK_SAMPLES = math.floor(SAMPLE_RATE * BLOCK_SEC)
local THRESHOLD = 10 ^ (SILENCE_DB / 20)
local MIN_VOICE_BLOCKS = math.ceil(MIN_VOICE_SEC / BLOCK_SEC)
local function log(msg)
reaper.ShowConsoleMsg("[PostProd] " .. msg .. "\n")
end
@@ -306,13 +309,17 @@ local function read_block_peak_rms(ta, project_time)
end
-- find_loudest_track: returns 1-based index of the loudest track at a given time, or 0 if silent
-- Uses RMS (not peak) for speaker identification — ambient mic noise has high peaks but low RMS
local function find_loudest_track(track_audios, project_time)
local best_peak = 0
local best_rms = 0
local best_idx = 0
for i, ta in ipairs(track_audios) do
local peak, _ = read_block_peak_rms(ta, project_time)
if peak > best_peak then
best_peak = peak
local peak, sum_sq = read_block_peak_rms(ta, project_time)
if peak > best_peak then best_peak = peak end
local rms = math.sqrt(sum_sq / BLOCK_SAMPLES)
if rms > best_rms then
best_rms = rms
best_idx = i
end
end
@@ -340,12 +347,17 @@ local function find_silences(region, track_audios, rms_acc, progress_fn)
while t < region.end_pos do
local best_peak = 0
local best_rms = 0
local best_sum = 0
local best_track = 0
for i, ta in ipairs(track_audios) do
local peak, sum_sq = read_block_peak_rms(ta, t)
if peak > best_peak then
best_peak = peak
if peak > best_peak then best_peak = peak end
-- Use RMS for speaker identification (sustained energy, not transient peaks)
-- Host mic ambient noise has high peaks but low RMS; TTS speech has high RMS
local rms = math.sqrt(sum_sq / BLOCK_SAMPLES)
if rms > best_rms then
best_rms = rms
best_sum = sum_sq
best_track = i
end
@@ -375,8 +387,11 @@ local function find_silences(region, track_audios, rms_acc, progress_fn)
local dur = voice_start - silence_start
local track_after = voice_run_track
local is_transition = track_before_silence ~= 0 and track_after ~= 0 and track_before_silence ~= track_after
local threshold = is_transition and MIN_SILENCE_TRANSITION_SEC or MIN_SILENCE_SEC
if dur >= threshold then
local devon_involved = track_before_silence == DEVON_TRACK or track_after == DEVON_TRACK
local threshold = devon_involved and MIN_SILENCE_DEVON_SEC
or (is_transition and MIN_SILENCE_TRANSITION_SEC or MIN_SILENCE_SEC)
if dur >= threshold and dur <= MAX_SILENCE_SEC then
table.insert(silences, {
start_pos = silence_start, end_pos = voice_start, duration = dur,
is_transition = is_transition,
@@ -410,7 +425,7 @@ local function find_silences(region, track_audios, rms_acc, progress_fn)
if in_silence then
local dur = region.end_pos - silence_start
if dur >= MIN_SILENCE_SEC then
if dur >= MIN_SILENCE_SEC and dur <= MAX_SILENCE_SEC then
table.insert(silences, {start_pos = silence_start, end_pos = region.end_pos, duration = dur})
end
end
@@ -452,7 +467,10 @@ local function phase1_strip_silence(dialog_regions)
for _, r in ipairs(get_regions_by_type("^IDENT%s+%d+$")) do table.insert(protected_regions, r) end
table.sort(protected_regions, function(a, b) return a.start_pos < b.start_pos end)
if #protected_regions > 0 then
log(" Protecting " .. #protected_regions .. " AD/IDENT region(s) from silence removal")
log(" Protecting " .. #protected_regions .. " AD/IDENT region(s) from silence removal:")
for _, pr in ipairs(protected_regions) do
log(" " .. pr.name .. " at " .. string.format("%.1f", pr.start_pos) .. "-" .. string.format("%.1f", pr.end_pos) .. "s")
end
end
log("Phase 1: Analyzing using " .. tracks_loaded .. "/" .. #CHECK_TRACKS .. " voice tracks")
@@ -498,6 +516,11 @@ local function phase1_strip_silence(dialog_regions)
break
end
end
-- Preserve the very first silence (music intro before host starts talking)
if not protected and ri == 1 and #removals == 0 and s.start_pos <= rgn.start_pos + 1.0 then
protected = true
log(" KEEP " .. string.format("%.1f", rm_end - rm_start) .. "s at " .. string.format("%.1f", s.start_pos) .. "-" .. string.format("%.1f", s.end_pos) .. " (music intro)")
end
if not protected then
table.insert(removals, {start_pos = rm_start, end_pos = rm_end})
local tag = s.is_transition and " [transition]" or ""
@@ -588,64 +611,88 @@ end
-- Phase 2: Normalize AD/IDENT volume to match dialog
---------------------------------------------------------------------------
local function normalize_track_regions(track_idx, regions, target_db)
local function normalize_track_items(track_idx, target_db, label)
-- Normalize all items on a track that have audible content.
-- Uses direct WAV reading (not audio accessor) so it works after Phase 1 splits.
local track = reaper.GetTrack(0, track_idx - 1)
if not track or reaper.CountTrackMediaItems(track) == 0 then return end
if not track then
log(" " .. label .. ": track " .. track_idx .. " does not exist")
return
end
for _, rgn in ipairs(regions) do
local item = find_item_at(track, rgn.start_pos)
if not item then goto next_region end
local item_count = reaper.CountTrackMediaItems(track)
log(" " .. label .. ": " .. item_count .. " item(s) on track " .. track_idx)
if item_count == 0 then return end
local item_start = reaper.GetMediaItemInfo_Value(item, "D_POSITION")
local ta = get_track_audio(track_idx)
if not ta then
log(" " .. label .. ": get_track_audio() returned nil — no readable WAV sources")
return
end
log(" " .. label .. ": " .. #ta.segments .. " WAV segment(s), span " .. string.format("%.1f", ta.item_pos) .. "-" .. string.format("%.1f", ta.item_end) .. "s")
local segment = item
if item_start < rgn.start_pos - 0.01 then
segment = reaper.SplitMediaItem(item, rgn.start_pos)
if not segment then goto next_region end
end
local seg_end = reaper.GetMediaItemInfo_Value(segment, "D_POSITION")
+ reaper.GetMediaItemInfo_Value(segment, "D_LENGTH")
if rgn.end_pos < seg_end - 0.01 then
reaper.SplitMediaItem(segment, rgn.end_pos)
end
local take = reaper.GetActiveTake(segment)
if not take then goto next_region end
local seg_pos = reaper.GetMediaItemInfo_Value(segment, "D_POSITION")
local seg_len = reaper.GetMediaItemInfo_Value(segment, "D_LENGTH")
local seg_offset = reaper.GetMediaItemTakeInfo_Value(take, "D_STARTOFFS")
local accessor = reaper.CreateTakeAudioAccessor(take)
local adjusted = 0
local skipped_silent = 0
local skipped_small = 0
for i = 0, item_count - 1 do
local item = reaper.GetTrackMediaItem(track, i)
local item_pos = reaper.GetMediaItemInfo_Value(item, "D_POSITION")
local item_len = reaper.GetMediaItemInfo_Value(item, "D_LENGTH")
local item_end = item_pos + item_len
-- Measure RMS of audible content in this item
local sum_sq = 0
local count = 0
local t = seg_pos
while t < seg_pos + seg_len do
local source_time = t - seg_pos + seg_offset
local buf = reaper.new_array(BLOCK_SAMPLES)
reaper.GetAudioAccessorSamples(accessor, SAMPLE_RATE, 1, source_time, BLOCK_SAMPLES, buf)
for i = 1, BLOCK_SAMPLES do
sum_sq = sum_sq + buf[i] * buf[i]
local total_blocks = 0
local t = item_pos
while t < item_end do
local peak, s_sq = read_block_peak_rms(ta, t)
total_blocks = total_blocks + 1
if peak >= THRESHOLD then
sum_sq = sum_sq + s_sq
count = count + BLOCK_SAMPLES
end
count = count + BLOCK_SAMPLES
t = t + BLOCK_SEC
end
reaper.DestroyAudioAccessor(accessor)
local audible_blocks = math.floor(count / BLOCK_SAMPLES)
if count > 0 then
local item_rms = math.sqrt(sum_sq / count)
if item_rms > 0 then
local item_db = 20 * math.log(item_rms, 10)
local gain_db = target_db - item_db
local gain_linear = 10 ^ (gain_db / 20)
local current_vol = reaper.GetMediaItemInfo_Value(segment, "D_VOL")
reaper.SetMediaItemInfo_Value(segment, "D_VOL", current_vol * gain_linear)
log(" " .. rgn.name .. ": " .. string.format("%+.1f", gain_db) .. "dB adjustment")
local current_vol = reaper.GetMediaItemInfo_Value(item, "D_VOL")
log(" " .. label .. " item " .. (i+1) .. "/" .. item_count
.. " pos=" .. string.format("%.1f", item_pos) .. "s"
.. " len=" .. string.format("%.1f", item_len) .. "s"
.. " blocks=" .. total_blocks .. "/" .. audible_blocks .. " audible"
.. " RMS=" .. string.format("%.1f", item_db) .. "dB"
.. " target=" .. string.format("%.1f", target_db) .. "dB"
.. " gain=" .. string.format("%+.1f", gain_db) .. "dB"
.. " vol=" .. string.format("%.3f", current_vol))
-- Only adjust if the difference is significant (> 1dB)
if math.abs(gain_db) > 1.0 then
local gain_linear = 10 ^ (gain_db / 20)
reaper.SetMediaItemInfo_Value(item, "D_VOL", current_vol * gain_linear)
log(" -> APPLIED: vol " .. string.format("%.3f", current_vol) .. " -> " .. string.format("%.3f", current_vol * gain_linear))
adjusted = adjusted + 1
else
log(" -> SKIPPED: gain within 1dB threshold")
skipped_small = skipped_small + 1
end
end
else
log(" " .. label .. " item " .. (i+1) .. "/" .. item_count
.. " pos=" .. string.format("%.1f", item_pos) .. "s"
.. " len=" .. string.format("%.1f", item_len) .. "s"
.. " blocks=" .. total_blocks
.. " — NO AUDIBLE BLOCKS (all below " .. SILENCE_DB .. "dB)")
skipped_silent = skipped_silent + 1
end
::next_region::
end
destroy_track_audio(ta)
log(" " .. label .. " RESULT: " .. adjusted .. " adjusted, " .. skipped_small .. " within 1dB, " .. skipped_silent .. " silent")
end
local function normalize_music_track(dialog_regions, target_db)
@@ -728,27 +775,35 @@ local function phase2_normalize(dialog_regions, ad_regions, ident_regions, dialo
end
log("Phase 2: Dialog RMS = " .. string.format("%.1f", dialog_rms_db) .. " dBFS")
local dialog_db = dialog_rms_db
if #ad_regions > 0 then
progress_detail = "Ads"
coroutine.yield()
log("Phase 2: Normalizing " .. #ad_regions .. " AD region(s)...")
normalize_track_regions(ADS_TRACK, ad_regions, dialog_db)
end
if #ident_regions > 0 then
progress_detail = "Idents"
progress_pct = 0.33
coroutine.yield()
log("Phase 2: Normalizing " .. #ident_regions .. " IDENT region(s)...")
normalize_track_regions(IDENTS_TRACK, ident_regions, dialog_db)
end
-- Ads/idents are pre-compressed dense audio, so they sound louder than dialog
-- at the same RMS. Target a few dB below dialog to match perceived loudness.
local AD_IDENT_OFFSET_DB = -4
local ad_ident_target = dialog_rms_db + AD_IDENT_OFFSET_DB
log("Phase 2: AD/IDENT target = " .. string.format("%.1f", ad_ident_target) .. " dBFS (" .. AD_IDENT_OFFSET_DB .. "dB offset from dialog)")
progress_detail = "Ads"
coroutine.yield()
log("Phase 2: Normalizing ads track...")
normalize_track_items(ADS_TRACK, ad_ident_target, "Ads")
progress_detail = "Idents"
progress_pct = 0.25
coroutine.yield()
log("Phase 2: Normalizing idents track...")
normalize_track_items(IDENTS_TRACK, ad_ident_target, "Idents")
progress_detail = "SFX"
progress_pct = 0.50
coroutine.yield()
log("Phase 2: Normalizing SFX track...")
normalize_track_items(SFX_TRACK, ad_ident_target, "SFX")
progress_detail = "Music"
progress_pct = 0.66
progress_pct = 0.75
coroutine.yield()
log("Phase 2: Normalizing music track...")
normalize_music_track(dialog_regions, dialog_db)
normalize_music_track(dialog_regions, dialog_rms_db)
progress_pct = 1.0
end
@@ -766,6 +821,75 @@ local function phase3_trim_music()
local music_track = reaper.GetTrack(0, MUSIC_TRACK - 1)
if not music_track then return end
-- Music lead-in: ensure audible music plays before first voice.
-- Strategy: skip the silent intro in the music WAV (adjust take offset),
-- then nudge all non-music tracks forward by MUSIC_LEAD_SEC so music plays first.
local MUSIC_LEAD_SEC = 3.0
-- Find where music becomes audible in the source WAV
local music_audible_offset = nil
local music_ta = get_track_audio(MUSIC_TRACK)
if music_ta then
local t = music_ta.item_pos
while t < music_ta.item_end do
local peak, _ = read_block_peak_rms(music_ta, t)
if peak >= THRESHOLD then
music_audible_offset = t - music_ta.item_pos -- offset into the WAV
break
end
t = t + BLOCK_SEC
end
destroy_track_audio(music_ta)
end
if false then -- Music lead-in disabled — intro silence is preserved instead
-- Skip the silent intro: set take offset so audible music starts at position 0
local first_music = reaper.GetTrackMediaItem(music_track, 0)
if first_music then
local take = reaper.GetActiveTake(first_music)
if take then
local current_offset = reaper.GetMediaItemTakeInfo_Value(take, "D_STARTOFFS")
reaper.SetMediaItemTakeInfo_Value(take, "D_STARTOFFS", current_offset + music_audible_offset)
-- Trim item length to account for skipped intro
local item_len = reaper.GetMediaItemInfo_Value(first_music, "D_LENGTH")
reaper.SetMediaItemInfo_Value(first_music, "D_LENGTH", item_len - music_audible_offset)
log("Phase 3: Skipped " .. string.format("%.1f", music_audible_offset) .. "s of silent music intro")
end
end
-- Nudge all non-music tracks forward by MUSIC_LEAD_SEC
log("Phase 3: Nudging non-music tracks forward by " .. MUSIC_LEAD_SEC .. "s for music lead-in")
for t = 0, reaper.CountTracks(0) - 1 do
if (t + 1) == MUSIC_TRACK then goto skip_music end
local track = reaper.GetTrack(0, t)
for i = 0, reaper.CountTrackMediaItems(track) - 1 do
local item = reaper.GetTrackMediaItem(track, i)
local pos = reaper.GetMediaItemInfo_Value(item, "D_POSITION")
reaper.SetMediaItemInfo_Value(item, "D_POSITION", pos + MUSIC_LEAD_SEC)
end
::skip_music::
end
-- Shift markers/regions forward too
local markers_to_update = {}
local _, num_markers, num_regions = reaper.CountProjectMarkers(0)
for i = 0, num_markers + num_regions - 1 do
local retval, is_region, pos, rgnend, name, idx, color = reaper.EnumProjectMarkers3(0, i)
if retval then
table.insert(markers_to_update, {is_region=is_region, pos=pos, rgnend=rgnend, name=name, idx=idx, color=color})
end
end
for _, m in ipairs(markers_to_update) do
if m.is_region then
reaper.SetProjectMarker3(0, m.idx, true, m.pos + MUSIC_LEAD_SEC, m.rgnend + MUSIC_LEAD_SEC, m.name, m.color)
else
reaper.SetProjectMarker3(0, m.idx, false, m.pos + MUSIC_LEAD_SEC, 0, m.name, m.color)
end
end
else
log("Phase 3: No silent music intro detected — skipping lead-in adjustment")
end
local last_end = 0
for _, tidx in ipairs(CHECK_TRACKS) do
local tr = reaper.GetTrack(0, tidx - 1)
@@ -912,6 +1036,39 @@ local function do_work()
log("Phase 4: No AD/IDENT regions found — skipping")
end
-- Set loop/time selection: start 0.5s before audible music, end at last item
local loop_start = 0
local music_ta = get_track_audio(MUSIC_TRACK)
if music_ta then
local t = music_ta.item_pos
while t < music_ta.item_end do
local peak, _ = read_block_peak_rms(music_ta, t)
if peak >= THRESHOLD then
loop_start = math.max(0, t - 0.5)
break
end
t = t + BLOCK_SEC
end
destroy_track_audio(music_ta)
end
local project_end = 0
for t = 0, reaper.CountTracks(0) - 1 do
local track = reaper.GetTrack(0, t)
local n = reaper.CountTrackMediaItems(track)
if n > 0 then
local last_item = reaper.GetTrackMediaItem(track, n - 1)
local item_end = reaper.GetMediaItemInfo_Value(last_item, "D_POSITION")
+ reaper.GetMediaItemInfo_Value(last_item, "D_LENGTH")
if item_end > project_end then project_end = item_end end
end
end
if project_end > 0 then
reaper.GetSet_LoopTimeRange(true, true, loop_start, project_end, false)
reaper.GetSet_LoopTimeRange(true, false, loop_start, project_end, false)
log("Loop range set: " .. string.format("%.1f", loop_start) .. " to " .. string.format("%.1f", project_end) .. "s (" .. string.format("%.1f", (project_end - loop_start) / 60) .. " min)")
end
reaper.PreventUIRefresh(-1)
reaper.Undo_EndBlock("Post-production: strip silence + music fades", -1)
reaper.UpdateArrange()
+122
View File
@@ -0,0 +1,122 @@
"""Scan music directory for tracks that contain vocals/lyrics.
Uses Whisper to transcribe a sample from each track — if it picks up
actual words, the track likely has vocals.
Usage:
python scan_music_vocals.py # scan and report
python scan_music_vocals.py --delete # scan and delete vocal tracks
"""
import argparse
import sys
from pathlib import Path
import librosa
import numpy as np
from faster_whisper import WhisperModel
MUSIC_DIR = Path(__file__).parent / "music"
WHISPER_MODEL = "distil-large-v3"
# Words Whisper hallucinates on silence/instrumental — ignore these
HALLUCINATION_PHRASES = {
"thank you", "thanks for watching", "subscribe", "like and subscribe",
"please subscribe", "thank you for watching", "thanks for listening",
"you", "the end", "bye", "okay",
}
def scan_track(model: WhisperModel, filepath: Path) -> tuple[bool, str]:
"""Check a single track for vocals. Returns (has_vocals, transcription)."""
try:
audio, sr = librosa.load(str(filepath), sr=16000, mono=True)
except Exception as e:
return False, f"[load error: {e}]"
duration = len(audio) / sr
if duration < 10:
return False, "[too short]"
# Sample 30s from the middle (most likely to have vocals)
mid = len(audio) // 2
half_window = int(15 * sr) # 15s each side
start = max(0, mid - half_window)
end = min(len(audio), mid + half_window)
sample = audio[start:end]
segments, info = model.transcribe(
sample,
beam_size=3,
language="en",
vad_filter=True,
vad_parameters=dict(min_speech_duration_ms=500),
)
segments_list = list(segments)
text = " ".join(s.text for s in segments_list).strip()
# Filter out Whisper hallucinations
text_lower = text.lower().strip()
if text_lower in HALLUCINATION_PHRASES or len(text_lower) < 4:
return False, ""
# If Whisper found substantial text, it's likely vocals
word_count = len(text.split())
has_vocals = word_count >= 3
return has_vocals, text
def main():
parser = argparse.ArgumentParser(description="Scan music for vocal tracks")
parser.add_argument("--delete", action="store_true", help="Delete tracks with vocals")
args = parser.parse_args()
audio_files = sorted(
f for f in MUSIC_DIR.iterdir()
if f.suffix.lower() in {".mp3", ".wav", ".ogg", ".flac"}
)
if not audio_files:
print("No audio files found in music/")
return
print(f"Loading Whisper {WHISPER_MODEL}...")
model = WhisperModel(WHISPER_MODEL, device="cpu", compute_type="int8")
print(f"Scanning {len(audio_files)} tracks for vocals...\n")
vocal_tracks = []
for i, f in enumerate(audio_files, 1):
print(f"[{i}/{len(audio_files)}] {f.name}...", end=" ", flush=True)
has_vocals, text = scan_track(model, f)
if has_vocals:
print(f"VOCALS: {text[:80]}")
vocal_tracks.append((f, text))
else:
print("OK")
print(f"\n{'='*60}")
print(f"Results: {len(vocal_tracks)} tracks with vocals out of {len(audio_files)}\n")
if not vocal_tracks:
print("All tracks appear to be instrumental!")
return
for f, text in vocal_tracks:
print(f" {f.name}")
print(f" Lyrics: {text[:120]}")
print()
if args.delete:
print(f"Deleting {len(vocal_tracks)} vocal tracks...")
for f, _ in vocal_tracks:
f.unlink()
print(f" Deleted: {f.name}")
print("Done.")
else:
print("Run with --delete to remove these tracks.")
if __name__ == "__main__":
main()
+15
View File
@@ -23,6 +23,7 @@ load_dotenv(Path(__file__).parent / ".env")
POSTIZ_API_KEY = os.getenv("POSTIZ_API_KEY")
POSTIZ_URL = os.getenv("POSTIZ_URL", "https://social.lukeattheroost.com")
POSTIZ_INTEGRATIONS = json.loads(os.getenv("POSTIZ_INTEGRATIONS", "{}"))
BSKY_HANDLE = os.getenv("BSKY_HANDLE", "lukeattheroost.bsky.social")
BSKY_APP_PASSWORD = os.getenv("BSKY_APP_PASSWORD")
@@ -95,8 +96,22 @@ def fetch_integrations() -> list[dict]:
return resp.json()
BLOCKED_INTEGRATION_IDS = {
"cmluam50j0001o46xifujx059", # Personal LinkedIn (CareerPulse) — never post podcast content here
}
def find_integration(integrations: list[dict], provider: str) -> dict | None:
# Prefer hardcoded integration ID from .env (avoids picking wrong account)
if provider in POSTIZ_INTEGRATIONS:
target_id = POSTIZ_INTEGRATIONS[provider].get("id")
if target_id:
for integ in integrations:
if integ.get("id") == target_id:
return integ
# Fallback: first matching provider (skip blocked accounts)
for integ in integrations:
if integ.get("id") in BLOCKED_INTEGRATION_IDS:
continue
if integ.get("identifier", "").startswith(provider) and not integ.get("disabled"):
return integ
return None