Fix live caller audio latency and choppiness

- Reduce capture chunk from 4096 to 640 samples (256ms → 40ms)
- Replace BufferSource scheduling with AudioWorklet playback ring buffer
- Add 80ms jitter buffer with linear interpolation upsampling
- Reduce host mic and live caller stream blocksizes from 4096/2048 to 1024
- Replace librosa.resample with numpy interpolation in send_audio_to_caller

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-02-05 16:32:27 -07:00
parent ab36ad8d5b
commit 7aed4d9c34
5 changed files with 89 additions and 27 deletions

View File

@@ -119,9 +119,12 @@ class CallerService:
try:
if sample_rate != 16000:
import numpy as np
import librosa
audio = np.frombuffer(pcm_data, dtype=np.int16).astype(np.float32) / 32768.0
audio = librosa.resample(audio, orig_sr=sample_rate, target_sr=16000)
ratio = 16000 / sample_rate
out_len = int(len(audio) * ratio)
indices = (np.arange(out_len) / ratio).astype(int)
indices = np.clip(indices, 0, len(audio) - 1)
audio = audio[indices]
pcm_data = (audio * 32767).astype(np.int16).tobytes()
await ws.send_bytes(pcm_data)
except Exception as e: