- Add 30s timeout to all frontend fetch calls (safeFetch)
- Add 20s asyncio.timeout around lock+LLM in chat, ai-respond, auto-respond
- Reduce OpenRouter timeout from 60s to 25s
- Reduce Inworld TTS timeout from 60s to 25s
- Return graceful fallback responses on timeout instead of hanging
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove gpt-4o-realtime (WebSocket-only) from OpenRouter models
- Increase OpenRouter timeout to 60s and max_tokens to 150
- Handle empty LLM responses
- Fix publish_episode.py for current Castopod API fields
- Add port conflict check and graceful shutdown to run.sh
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Replace aggressive sentence-count limiting with ensure_complete_thought()
which only trims if the LLM was actually cut off mid-sentence
- Softer prompt guidance for natural brevity instead of rigid sentence count
- max_tokens at 100 as natural length cap
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- max_tokens back to 150 so LLM can finish thoughts
- New limit_sentences() keeps only first 2 complete sentences
- Never cuts mid-sentence — always ends at punctuation
- Applied to both chat and auto-respond paths
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Reduce max_tokens from 100 to 75 for shorter output
- Add truncate_to_complete_sentence() to trim at last punctuation
- Applied to both chat and auto-respond paths
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Much stronger prompt language: "no more than 2 sentences EVER"
- Added "DO NOT ramble" instruction
- Reduced max_tokens back to 100 as hard limit
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Increase max_tokens from 100 to 150 to avoid mid-sentence truncation
- Tighten prompt to 1-2 short sentences with emphasis on completing them
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>