AI Voicemail Assistant
An Italian-language voicemail agent that screens incoming calls and streams a live transcript to a paired mobile app. Built solo, end-to-end, deployed on a personal VPS.
Stack: Python, FastAPI, OpenAI Realtime API, Silero VAD (PyTorch), Twilio, React Native, Redis, Loki, Grafana — behind a hardened Docker setup and a Caddy reverse proxy.
What made it interesting:
Two interchangeable conversation engines behind a unified interface — OpenAI Realtime (~$0.08/min) and a custom STT+LLM+TTS pipeline (~$0.02/min, 4× cheaper) — selectable per deployment via env and per-user toggle. Both share the same call orchestrator, post-call pipeline, and per-call cost ledger.
Audio behavior required serious tuning: turn-taking, echo suppression, barge-in detection, VAD thresholds — diagnosed through end-to-end testing with real phone calls, not synthetic benchmarks. The underlying DSP (Silero VAD via PyTorch, FFT-based echo suppression, dual-threshold barge-in windows) is real.
Production hardening on a one-person codebase: read-only Docker FS, dropped capabilities, non-root containers, multi-tier Redis rate limiting (IP/call/phone/user/device/global) with sliding windows and a circuit breaker. Security-focused CI/CD: Bandit, Semgrep, Safety, pip-audit, Trufflehog, Trivy, Codecov.
EU AI Act disclosure enforced in the greeting and at the audio gate. GDPR-compliant 7-day audio retention.