[{"content":"Over the past year and a half I\u0026rsquo;ve been the sole engineer on the retrieval and evaluation stack of a chatbot deployed nationally for Italian Public Administration — a system called Camilla that helps civil servants navigate public tenders.\nIt processes real queries from real users, in a regulated context, with no room to quietly paper over the failures. That experience forced clarity on questions I don\u0026rsquo;t often see written up: how do you evaluate an agentic system in a way that actually catches what breaks in production? How do you build hybrid retrieval over a corpus that\u0026rsquo;s inconsistent by design? How do you ship an AI product to the Italian PA under EU AI Act disclosure requirements?\nI wrote about the architecture and the governance decisions — the ones that turned out to matter more than the ML choices — in a piece for Agenda Digitale.\nCamilla, l\u0026rsquo;assistente AI che sfida la burocrazia nella PA →\n","permalink":"https://pietroferragamo.com/blog/camilla-pa-chatbot/","summary":"Published on Agenda Digitale: architecture and governance choices behind a national-scale chatbot for Italian Public Administration.","title":"Camilla, l'assistente AI che sfida la burocrazia nella PA"},{"content":"I\u0026rsquo;m a Senior GenAI Engineer at CSI Piemonte, Italy\u0026rsquo;s largest regional in-house IT provider. I build LLM systems deployed at national scale for Italian Public Administration — the kind of systems where evaluation and governance matter as much as the ML.\nMy current work centers on a tender-assistance chatbot called Camilla: I designed and own its semantic retrieval layer (100K+ queries in six months), its behavioral evaluation framework (167 adversarial test scenarios, LLM-as-a-judge), and the internal MCP server that composes its agent tools. I care a lot about the unglamorous parts — latency budgeting, cost tracking, compliance with EU AI Act and GDPR, keeping things working when the corpus is messy and the stakes are real.\nAlongside this I\u0026rsquo;m completing an MSc in Computer Science (ML specialization) at Georgia Tech, maintaining a 4.0 GPA across courses in ML, AI, NLP, and robotics.\nOlder background: years as a business analyst working with geospatial systems and Oracle databases for public-sector clients. Turns out knowing what bad data looks like is useful when you\u0026rsquo;re building retrieval systems over it.\nOn the personal side: I hold a diploma in classical guitar from the Conservatorio G. Verdi in Turin, where I studied for eight years. The patience transfers.\n","permalink":"https://pietroferragamo.com/about/","summary":"About me","title":"About"},{"content":"Senior GenAI Engineer — LLM systems, agentic architectures, evaluation \u0026amp; governance\nTurin, Italy · pietro.ferragamo.121@gmail.com · linkedin.com/in/pietro-ferragamo · github.com/pietro121orteip\nApplied AI engineer with a mathematical background, building LLM systems deployed at national scale for Italian Public Administration. Sole author of the semantic retrieval layer (100K+ production queries), the behavioral evaluation framework, and the internal MCP server behind a public-sector tender-assistance chatbot. Currently completing an MSc in Computer Science (ML specialization) at Georgia Tech alongside a senior IC role at CSI Piemonte, with deep interest in evaluation, governance, and safety of agentic AI systems in production.\nTechnical skills LLM \u0026amp; agentic stack — LangChain, LangGraph, Model Context Protocol (MCP), Azure OpenAI, prompt engineering, tool-use design, LLM-as-a-judge, RAG architectures, agentic orchestration, behavioral evaluation \u0026amp; red-teaming\nEvaluation \u0026amp; governance — Multi-rubric LLM judges, regression suites, behavioral probe taxonomies (prompt injection, fairness, temporal integrity, tool-use verification), latency/cost/quality tradeoffs, EU AI Act and GDPR considerations in production AI, observability for AI services\nRetrieval \u0026amp; data — Hybrid retrieval (dense + sparse + trigram), Weaviate, Redis, pgvector, entity extraction, metadata-driven retrieval, embedding-collection quality analysis\nLanguages \u0026amp; infra — Python, SQL (PostgreSQL advanced, Oracle OCP-certified), FastAPI, Docker, Bash, Git; PyTorch (regression/classification, NLP); CI/CD (Jenkins, GitHub Actions); observability (Grafana, Prometheus, Loki, Promtail)\nCloud \u0026amp; deployment — Azure (Azure OpenAI, container deployments); Helm-chart authorship in progress\nMathematical foundations — Stochastic processes, probability, optimization, Bayesian filtering (Kalman, particle filters, SLAM)\nExperience CSI-Piemonte — AI Engineer, R\u0026amp;D — Turin, Italy Sep 2024 – Present\nR\u0026amp;D on LLM-based systems for Italy\u0026rsquo;s largest regional in-house IT provider. Sole engineer on the retrieval, evaluation, and tool-composition layers of a national-scale tender-assistance chatbot deployed for Italian Public Administration. Stack: Python, Azure OpenAI, LangChain/LangGraph, Weaviate, PostgreSQL, FastAPI, Docker.\nSemantic retrieval engine for national-scale tender assistance — 101K production queries over 6 months, 2.3s average end-to-end latency\nDesigned and built from scratch as the core retrieval tool of a chatbot deployed nationally over Italian public tenders. LLM-based entity extraction with deterministic fallback, relational expansion over hand-curated dictionaries (ISTAT geographic hierarchy, degree equivalences, institutional taxonomies) declared as PostgreSQL business rules. Hybrid filtering combines exact match, trigram similarity, and Weaviate dense retrieval; exposed as a FastAPI service and registered as an agent tool. Defect rate on the order of one client-reported miss per several weeks of live traffic; internal regression suite of several hundred annotated queries at 97% pass rate.\nBehavioral evaluation \u0026amp; governance framework for LLM agentic systems\nTask-oriented evaluator built end-to-end (PostgreSQL, Python, FastAPI workers, Docker, shell CLI). DAG-based scheduler with dependency rules declared in PostgreSQL; workers poll a queue function that resolves the implicit DAG at dispatch time, enabling horizontal scaling with correct ordering. LLM-as-a-judge with task-specific multi-rubric evaluation across heterogeneous types: pseudo-ground-truth verification, dynamic multi-turn scope-switching conversations, search-engine behavior checks, embedding-collection quality tests. Tool-use verification probes enforce a desired-tool contract per turn and flag claims produced without invoking the corresponding retrieval tool.\nBehavioral probe suite: 167 multi-turn adversarial scenarios covering prompt injection \u0026amp; jailbreak resistance, information-boundary protection, fairness \u0026amp; non-discrimination, temporal integrity, language-policy enforcement, abuse resilience, recovery to baseline post-attack. Self-designed taxonomy derived from production interactions.\nUnified MCP server for agent tool composition\nConsolidated fragmented per-use-case tools into a single internal MCP server (STDIO), exposing agent tools as compositions of reusable primitives (memory ops; retrieval ops: semantic search, exact metadata filter, metadata discovery). Tools declared in YAML configuration, not code — new agent tools ship without touching the server.\nCSI-Piemonte — Senior Business Analyst — Cuneo, Italy Sep 2022 – Sep 2024\nLead analyst on geospatial components of regional IT systems administering 1B+ € of EU/national/regional funds. Drove the region above EU geolocalization thresholds, avoiding partial funds decommitment. Functional and technical analysis, testing, and quantitative reporting for the Integrated Administration and Control System (IACS) and related EU financial-statistics requirements.\naizoOn Technology Consulting — Business Data Analyst — Cuneo, Italy Nov 2016 – Sep 2022\nFunctional analysis and PL/SQL development (Oracle 11, including Oracle Spatial) for public-sector clients. Use-case design, stored-procedure development and tuning, software testing, and SQL-based reporting for state officials and EU regulatory authorities.\nPersonal projects AI voicemail assistant — Python, FastAPI, OpenAI, Silero VAD, Twilio, React Native; Hetzner VPS — GitHub\nSolo end-to-end build of an Italian-language voicemail agent that screens incoming calls and streams a live transcript to a paired mobile app. Two interchangeable streaming-conversation engines (OpenAI Realtime ~$0.08/min vs. custom STT+LLM+TTS pipeline ~$0.02/min) behind a unified interface. EU AI Act disclosure enforced in greeting and at the audio gate; GDPR-compliant 7-day audio retention. Hardened Docker, multi-tier Redis rate limiting, security-focused CI/CD (Bandit, Semgrep, Safety, Trufflehog, Trivy).\nPublications \u0026amp; writing Camilla, l\u0026rsquo;assistente AI che sfida la burocrazia nella PA — agendadigitale.eu, April 2026 — Architecture and governance choices behind the national PA chatbot: hybrid retrieval, automated evaluation pipelines, LLM-as-a-judge methodology. [Forthcoming] Architectural extensibility of AI assistants via internal MCP servers — agendadigitale.eu [In drafting] Red-teaming and behavioral evaluation of agentic AI systems in regulated contexts — agendadigitale.eu Optimal Control of an Ecogenetic Model — International Journal of Applied and Computational Mathematics, 2017 — Co-authored peer-reviewed article. Education Georgia Institute of Technology — MSc Computer Science, ML specialization — Aug 2024 – Apr 2027 (in progress) 4.0 GPA across 6/10 completed courses. Completed: Machine Learning, Artificial Intelligence, Knowledge-Based AI, AI for Robotics, Ethics in AI, Human-Computer Interaction. In progress: Natural Language Processing. Planned: Deep Learning, Reinforcement Learning, Graduate Algorithms.\nUniversità degli Studi di Torino — MSc Mathematics — 2014–2016 Final grade: 110L/110 e menzione. Thesis: Levy Processes in Finance.\nUniversità degli Studi di Torino — BSc Mathematics — 2010–2014\nConservatorio \u0026ldquo;G. Verdi\u0026rdquo; di Torino — Diploma, Classical Guitar — 2001–2008\nLanguages: Italian (native), English (C2 — IELTS Academic Band 8.5, January 2024).\n","permalink":"https://pietroferragamo.com/cv/","summary":"Curriculum Vitae","title":"CV"},{"content":"AI Voicemail Assistant GitHub →\nAn Italian-language voicemail agent that screens incoming calls and streams a live transcript to a paired mobile app. Built solo, end-to-end, deployed on a personal VPS.\nStack: Python, FastAPI, OpenAI Realtime API, Silero VAD (PyTorch), Twilio, React Native, Redis, Loki, Grafana — behind a hardened Docker setup and a Caddy reverse proxy.\nWhat made it interesting:\nTwo interchangeable conversation engines behind a unified interface — OpenAI Realtime (~$0.08/min) and a custom STT+LLM+TTS pipeline (~$0.02/min, 4× cheaper) — selectable per deployment via env and per-user toggle. Both share the same call orchestrator, post-call pipeline, and per-call cost ledger.\nAudio behavior required serious tuning: turn-taking, echo suppression, barge-in detection, VAD thresholds — diagnosed through end-to-end testing with real phone calls, not synthetic benchmarks. The underlying DSP (Silero VAD via PyTorch, FFT-based echo suppression, dual-threshold barge-in windows) is real.\nProduction hardening on a one-person codebase: read-only Docker FS, dropped capabilities, non-root containers, multi-tier Redis rate limiting (IP/call/phone/user/device/global) with sliding windows and a circuit breaker. Security-focused CI/CD: Bandit, Semgrep, Safety, pip-audit, Trufflehog, Trivy, Codecov.\nEU AI Act disclosure enforced in the greeting and at the audio gate. GDPR-compliant 7-day audio retention.\n","permalink":"https://pietroferragamo.com/projects/","summary":"Things I\u0026rsquo;ve built","title":"Projects"}]