Files

T

luke 376265eec7 Show quality fixes + preflight check

Ep47 post-mortem: fixed theme ignored by callers (backgrounds now
regenerate when theme is set), style-to-model race condition (fallback
to sonnet instead of pool[0]), removed bad pronunciation fixes, added
age-awareness to voice matching, raised MIN_RESPONSE_WORDS to 50.

Swapped problematic model mappings: conspiracy→qwen, know_it_all→mistral,
quiet_nervous→llama, emotional→kimi.

Added GET /api/show/preflight endpoint with 4 checks: model diversity,
theme penetration, voice-age alignment, response coherence (2-exchange
simulation of all callers). Frontend preflight modal with expandable
check cards.

Fixed active caller button not highlighting (moved highlight code before
potentially-failing caller info panel code).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-03-31 01:17:34 -06:00

11 KiB

Raw Permalink Blame History

Show Quality Fixes — Episode 47 Post-Mortem

For Claude: REQUIRED SUB-SKILL: Use superpowers:executing-plans to implement this plan task-by-task.

Goal: Fix 5 bugs that ruined tonight's show: theme ignored by callers, wrong LLM models assigned, phonetic pronunciation mangling, voice-age mismatch, and low minimum response threshold.

Architecture: All fixes are in backend/main.py except voice-age matching which also touches backend/services/tts.py voice matching logic. Each fix is independent — no ordering dependencies between tasks.

Tech Stack: Python, FastAPI

Task 1: Regenerate caller backgrounds when theme is set

Problem: _pregenerate_backgrounds() runs on startup when session.show_theme is still "". Setting theme via POST /api/show-theme only stores the string — doesn't regenerate. Callers have zero theme connection.

Files:

Modify: backend/main.py:9891-9900 (set_show_theme endpoint)
Modify: backend/main.py:5899-5927 (_pregenerate_backgrounds)

Step 1: Modify set_show_theme to regenerate unused caller backgrounds

In backend/main.py, replace the set_show_theme endpoint (lines 9891-9900):

@app.post("/api/show-theme")
async def set_show_theme(data: dict):
    theme = data.get("theme", "").strip()[:100]
    old_theme = session.show_theme
    session.show_theme = theme
    if theme:
        print(f"[Theme] Show theme set: {theme}")
    elif old_theme:
        print(f"[Theme] Show theme cleared (was: {old_theme})")

    # Regenerate backgrounds for callers that haven't been on air yet
    if theme != old_theme:
        unused_keys = [k for k in CALLER_BASES if k not in session.used_callers]
        if unused_keys:
            print(f"[Theme] Regenerating {len(unused_keys)} unused caller backgrounds for theme: {theme or '(none)'}")
            asyncio.create_task(_regenerate_backgrounds_for_keys(unused_keys))

    return {"theme": session.show_theme}

Step 2: Add _regenerate_backgrounds_for_keys helper

Add this right after _pregenerate_backgrounds() (after line 5927):

async def _regenerate_backgrounds_for_keys(keys: list[str]):
    """Regenerate backgrounds for specific caller keys (e.g. after theme change)."""
    tasks = []
    for key in keys:
        base = CALLER_BASES.get(key)
        if base and not base.get("returning"):
            tasks.append((key, _generate_caller_background_llm(base)))

    if not tasks:
        return

    results = await asyncio.gather(*[t[1] for t in tasks], return_exceptions=True)
    for (key, _), result in zip(tasks, results):
        if isinstance(result, Exception):
            print(f"[Theme] Regen failed for caller {key}: {result}")
        else:
            session.caller_backgrounds[key] = result
            # Clear cached model so it re-evaluates with new style
            session.caller_models.pop(key, None)

    print(f"[Theme] Regenerated {sum(1 for r in results if not isinstance(r, Exception))}/{len(tasks)} backgrounds")
    _match_voices_to_styles()
    _sort_caller_queue()

Step 3: Verify used_callers exists on session

Check that session.used_callers tracks which callers have already been on air. If it doesn't exist, use session.call_history caller keys instead.

Step 4: Test manually

# Start server
python -m uvicorn backend.main:app --reload --reload-dir backend --host 0.0.0.0 --port 8000
# Set theme and check logs for "[Theme] Regenerating..." messages
curl -X POST http://localhost:8000/api/show-theme -H "Content-Type: application/json" -d '{"theme": "Road Stories"}'

Step 5: Commit

git add backend/main.py
git commit -m "Regenerate caller backgrounds when show theme is set"

Task 2: Fix style-to-model matching race condition

Problem: get_caller_model() is called before caller_styles is populated. caller_styles.get(key) returns "", _normalize_style_key("") returns "", no match in caller_model_map → falls through to caller_model_pool[0] (grok-4.1-fast) for everyone.

Files:

Modify: backend/main.py:6848-6875 (get_caller_model)

Step 1: Fix get_caller_model to defer assignment when style is unknown

Replace get_caller_model (lines 6848-6875):

    def get_caller_model(self, caller_key: str) -> str | None:
        """Get the assigned model for a caller, or assign one based on strategy.
        Returns None to use default category routing."""
        if self.caller_model_strategy == "single":
            return None  # use default category_models["caller_dialog"]

        # Already assigned — keep consistent for the whole call
        if caller_key in self.caller_models:
            return self.caller_models[caller_key]

        model = None
        if self.caller_model_strategy == "cycle":
            if self.caller_model_pool:
                model = self.caller_model_pool[self._caller_model_cycle_idx % len(self.caller_model_pool)]
                self._caller_model_cycle_idx += 1
        elif self.caller_model_strategy == "style_matched":
            raw_style = self.caller_styles.get(caller_key, "")
            style_key = _normalize_style_key(raw_style) if raw_style else ""
            if style_key:
                model = self.caller_model_map.get(style_key)
            if not model:
                # Style not yet populated or no mapping — use fallback, not pool[0]
                model = self.caller_model_fallback

        if model:
            self.caller_models[caller_key] = model
            caller_name = CALLER_BASES.get(caller_key, {}).get("name", caller_key)
            style_info = self.caller_styles.get(caller_key, "unknown")
            print(f"[CallerModel] Assigned {model} to {caller_name} (style={_normalize_style_key(style_info) if style_info else 'none'}, strategy={self.caller_model_strategy})")

        return model

The key change: when style_key is empty (style not yet populated) or has no mapping, use caller_model_fallback (claude-sonnet-4.6) instead of caller_model_pool[0] (grok-4.1-fast). Claude Sonnet is a much safer default — empathetic, verbose, coherent.

Step 2: Commit

git add backend/main.py
git commit -m "Fix style-to-model race condition — use fallback instead of pool[0]"

Task 3: Fix pronunciation fixes producing literal phonetic text

Problem: _PRONUNCIATION_FIXES replaces "Animas" with "Ah nee mahs" as literal text. TTS reads each word separately ("Ah" "nee" "mahs") instead of blending into the intended pronunciation.

Files:

Modify: backend/main.py:9141-9152 (_PRONUNCIATION_FIXES)
Modify: backend/main.py:9212-9216 (_apply_pronunciation_fixes)

Step 1: Remove pronunciation fixes that sound worse than originals

The Inworld TTS actually handles most proper nouns fine. The fixes were added speculatively and cause more harm than good. Remove the place names that TTS can handle, keep only abbreviations:

Replace _PRONUNCIATION_FIXES (lines 9141-9152):

_PRONUNCIATION_FIXES = {
    "Castopod": "Casto pod",
    "vs": "versus",
    "govt": "government",
    "dept": "department",
}

Remove Lordsburg, Hachita, Deming, Bootheel, Animas, and Rodeo. These place names either sound fine through TTS or the phonetic replacement sounds worse.

Step 2: Commit

git add backend/main.py
git commit -m "Remove pronunciation fixes that produce worse TTS output"

Task 4: Add age-awareness to voice matching

Problem: Brandy (55 years old) got "Kayla" (young-sounding voice). _match_voices_to_styles() scores on style dimensions (weight, energy, warmth, age_feel) but the age_feel preference comes from the communication style, not the character's actual age. A "confrontational" style prefers age_feel: None (no preference), so a 55-year-old can get a young voice.

Files:

Modify: backend/main.py:6106-6156 (_match_voices_to_styles)

Step 1: Add character age to voice scoring

In _match_voices_to_styles, after getting the style preferences, override age_feel based on the caller's actual age from their background:

def _match_voices_to_styles():
    """Re-assign voices to match caller communication styles after backgrounds are generated."""
    from .services.tts import VOICE_PROFILES

    for key, base in CALLER_BASES.items():
        if base.get("returning"):
            continue

        style_raw = session.caller_styles.get(key, "")
        if not style_raw:
            continue

        style_key = _normalize_style_key(style_raw)
        prefs = STYLE_VOICE_PREFERENCES.get(style_key)
        if not prefs:
            continue

        # Copy prefs so we don't mutate the shared dict
        prefs = dict(prefs)

        # Override age_feel based on character's actual age
        bg = session.caller_backgrounds.get(key)
        if isinstance(bg, CallerBackground) and bg.age:
            if bg.age >= 50:
                prefs["age_feel"] = "mature"
            elif bg.age >= 35:
                prefs["age_feel"] = "middle"
            elif bg.age < 25:
                prefs["age_feel"] = "young"
            # 25-34: keep style preference or None

        gender = base["gender"]
        pool = INWORLD_MALE_VOICES if gender == "male" else INWORLD_FEMALE_VOICES
        voice_pool = [v for v in pool if v not in BLACKLISTED_VOICES]

        scored = []
        for voice_name in voice_pool:
            profile = VOICE_PROFILES.get(voice_name)
            if not profile:
                scored.append((voice_name, 0))
                continue
            score = 0
            for dim in ["weight", "energy", "warmth", "age_feel"]:
                pref_val = prefs.get(dim)
                if pref_val and profile.get(dim) == pref_val:
                    score += 1
            scored.append((voice_name, score))

        if scored:
            names = [s[0] for s in scored]
            weights = [max(1, s[1] * 3) for s in scored]
            chosen = random.choices(names, weights=weights, k=1)[0]

            used_voices = {CALLER_BASES[k]["voice"] for k in CALLER_BASES if k != key and "voice" in CALLER_BASES[k]}
            if chosen in used_voices:
                alternatives = [(n, w) for n, w in zip(names, weights) if n not in used_voices]
                if alternatives:
                    alt_names, alt_weights = zip(*alternatives)
                    chosen = random.choices(alt_names, weights=alt_weights, k=1)[0]

            old_voice = base.get("voice", "")
            base["voice"] = chosen
            if old_voice != chosen:
                print(f"[VoiceMatch] {base.get('name', key)}: {old_voice} → {chosen} (style: {style_key}, age: {bg.age if isinstance(bg, CallerBackground) else '?'})")

Step 2: Commit

git add backend/main.py
git commit -m "Add age-awareness to voice matching — 55yo won't get young voices"

Task 5: Raise minimum response word count

Problem: MIN_RESPONSE_WORDS = 30 lets through fragmented, telegram-style responses that are technically 30+ words but terrible radio.

Files:

Modify: backend/main.py:8844 (MIN_RESPONSE_WORDS)

Step 1: Raise the minimum

Change line 8844:

MIN_RESPONSE_WORDS = 50  # Retry if response is shorter than this

50 words is roughly 2-3 spoken sentences — enough to be a coherent radio response without being overly demanding for short-form exchanges.

Step 2: Commit

git add backend/main.py
git commit -m "Raise MIN_RESPONSE_WORDS from 30 to 50"

11 KiB Raw Permalink Blame History

Show Quality Fixes — Episode 47 Post-Mortem

Task 1: Regenerate caller backgrounds when theme is set

Task 2: Fix style-to-model matching race condition

Task 3: Fix pronunciation fixes producing literal phonetic text

Task 4: Add age-awareness to voice matching

Task 5: Raise minimum response word count

11 KiB

Raw Permalink Blame History