Fix unnatural response cutoffs

- Replace aggressive sentence-count limiting with ensure_complete_thought() which only trims if the LLM was actually cut off mid-sentence - Softer prompt guidance for natural brevity instead of rigid sentence count - max_tokens at 100 as natural length cap Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-05 17:18:22 -07:00
parent 9d4b8a0d22
commit a1c94a3682
2 changed files with 16 additions and 27 deletions
--- a/backend/services/llm.py
+++ b/backend/services/llm.py
@@ -124,7 +124,7 @@ class LLMService:
                        json={
                            "model": self.openrouter_model,
                            "messages": messages,
-                            "max_tokens": 150,
+                            "max_tokens": 100,
                        },
                    )
                    response.raise_for_status()