ai-podcast/backend/services/llm.py at de5577e58245099a7ad1c43ad419aff3a0bb316a

Files

tcpsyn aa3899b1fc Harden LLM: model fallback chain, reuse client, remove fighting timeouts

- Primary model gets 15s, then auto-falls back through gemini-flash,
  gpt-4o-mini, llama-3.1-8b (10s each)
- Always returns a response — canned in-character line as last resort
- Reuse httpx client instead of creating new one per request
- Remove asyncio.timeout wrappers that were killing requests before
  the LLM service could try fallbacks

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

2026-02-06 22:07:39 -07:00

7.2 KiB

Raw Blame History

View Raw

7.2 KiB Raw Blame History

7.2 KiB

Raw Blame History