CV | Pietro Ferragamo

Applied AI Engineer — LLM evaluation, behavioral testing & production AI systems

Turin, Italy · [email protected] · linkedin.com/in/pietro-ferragamo · github.com/pietro121orteip

Applied AI engineer with an MSc in Mathematics: I build and evaluate LLM systems for production. My center of gravity is evaluation, the methodologies and infrastructure that make model quality measurable and iteration robust. Sole author of the behavioral evaluation framework, the two-stage retrieval architecture (100K+ production queries), and the internal MCP server behind a national-scale tender-assistance chatbot for Italian Public Administration. Currently completing a second MSc, in Computer Science (ML specialization), at Georgia Tech alongside a senior IC role at CSI Piemonte.

Experience

CSI-Piemonte — AI Engineer, R&D — Turin, Italy

Sep 2024 – Present

R&D on LLM-based systems at one of Italy’s largest in-house IT providers for the public sector. I own the evaluation, retrieval, and tool-composition layers of the chatbot above (Camilla). Stack: Python, Azure OpenAI, LangChain/LangGraph, Weaviate, PostgreSQL, FastAPI, Docker.

Behavioral evaluation & governance framework for LLM agentic systems

200+ adversarial scenarios

~100/day tenders tested

OWASP LLM Top 10 taxonomy

Evaluation methodology designed end-to-end: multi-rubric LLM-as-a-judge across heterogeneous test types (pseudo-ground-truth, dynamic multi-turn scope switching, search-engine behavior, embedding-collection quality), with grounding enforced by a per-turn required-tool contract.
Multi-turn adversarial probe suite for institutional role preservation under pressure: prompt injection and jailbreaks, information boundaries, fairness and non-discrimination, temporal integrity, post-attack recovery; self-designed taxonomy, derived from production interactions and integrated with the OWASP Top 10 for LLM Applications.
Entity extraction/verification: a dedicated task re-extracts every tender’s metadata from the source text (regex first, LLM last) and diffs it against what production serves, escalating only the genuinely ambiguous cases to the judge.
Deliberately minimal orchestration: PostgreSQL as the sole contact point, DAG dependencies resolved in SQL, horizontally scaled FastAPI workers, clean draining and cheap restarts; cost and latency tracked per cycle in the service dashboard.

Full write-up on the project page.

Two-stage retrieval architecture for national-scale tender assistance

101K production queries

2.3s avg e2e latency

97% regression pass rate

Built from scratch around one separation: deciding which tender the user means (LLM entity extraction, relational expansion over curated dictionaries declared as PostgreSQL rules, exact filters, lexical reranking) vs answering about it (Weaviate hybrid retrieval, BM25 plus embeddings, scoped to the pinned tender).
Flat latency across the full corpus (~10,000 tenders): B-tree filtered candidates using LLM-inferred user intent, trigram similarity over a GIN-indexed materialized view, index building shifted to ingestion time; recall never drops to zero by design, every failure degrades to full-text search instead of erroring.
High reliability: one client-reported miss every several weeks of live traffic; a several-hundred-query annotated regression suite gates every change before production. End-to-end ownership: schema, dictionaries, API, deployment, monitoring.

Full write-up on the project page.

Internal MCP server for agent tool composition

4–5d → ½d agent build time

~10 lines of YAML per tool

0 code to ship a new tool

Consolidated fragmented per-client tools into a single internal MCP server (STDIO): reusable primitives maintained and security-reviewed centrally, composed into agent-scoped tools in YAML.
Orchestrator-agnostic by construction; the authoring bottleneck moved from developers to domain experts, with a back-office UI on the way. Sole-byline article on Agenda Digitale (June 2026).

Full write-up on the project page.

Also: every system above ships cost- and latency-budgeted (model tier chosen per task and evaluated quantitatively; per-product cost dashboards). RAG assistants built and tuned over four further knowledge bases (technical documentation, regional health, education, support), owning ingestion, retrieval tuning, prompting, and client-facing specification and demos; migrated a legacy LangChain stack to a LangGraph agentic architecture with short/long-term memory.

CSI-Piemonte — Senior Business Analyst — Cuneo, Italy

Sep 2022 – Sep 2024

Lead analyst on geospatial components of regional IT systems administering 1B+ € of EU/national/regional funds. Drove the region above EU geolocalization thresholds, helping Regione Piemonte avoid partial funds decommitment. Specification, feasibility and demos with PA clients; software testing across the stack; quantitative reporting for EU regulatory authorities (Integrated Administration and Control System, IACS).

aizoOn Technology Consulting — Business Data Analyst — Cuneo, Italy

Nov 2016 – Sep 2022

PL/SQL development (Oracle 11, Oracle Spatial) for public-sector clients: use-case design, stored-procedure development and tuning, software testing, SQL-based reporting for state officials and EU regulatory authorities.

Publications & writing

Camilla, l’assistente AI che sfida la burocrazia nella PA — agendadigitale.eu, 30 April 2026 — Sole-byline article on architecture and governance of the national PA chatbot at CSI: hybrid retrieval, automated evaluation, LLM-as-a-judge methodology.
Assistenti AI nella PA: come estenderli con strumenti riusabili — agendadigitale.eu, 22 June 2026 — Sole-byline article on extending production AI assistants via an internal MCP server: reusable primitives, declarative (YAML) tool composition, and the governance/auditability tradeoffs of agentic vs. wide-context approaches in the public sector.
[Forthcoming, July 2026] Red-teaming and behavioral evaluation of agentic AI systems in regulated contexts — agendadigitale.eu.
Optimal Control of an Ecogenetic Model — International Journal of Applied and Computational Mathematics, 2017 — Co-authored peer-reviewed article on optimal control of ecogenetic dynamics.

Education

Georgia Institute of Technology — MSc Computer Science, Machine Learning specialization — Aug 2024 – Apr 2027 (in progress) 4.0 GPA across 6/10 completed courses. Completed: Machine Learning, Artificial Intelligence, Knowledge-Based AI, AI for Robotics, Ethics in AI, Human-Computer Interaction. Planned: Natural Language Processing (in progress), Deep Learning, Reinforcement Learning, Graduate Algorithms.

Università degli Studi di Torino — MSc Mathematics — 2014–2016 Final grade: 110L/110 e menzione. Focus: stochastic processes, probability, optimization. Thesis: Levy Processes in Finance (Gaussian vs Levy models for asset-price time series).

Università degli Studi di Torino — BSc Mathematics — 2010–2014

Conservatorio “G. Verdi” di Torino — Diploma, Classical Guitar — 2001–2008

Languages: Italian (native), English (C2 — IELTS Academic Band 8.5, January 2024).

Personal projects

AI voicemail assistant

Python, FastAPI, OpenAI, Twilio, React Native, Redis, Docker — GitHub

2 conversation engines

<1 s to first audio

4–5× cheaper than Realtime

AGPL-licensed phone agent (backend plus a paired mobile app), built entirely by me, that answers calls on the user’s behalf: natural dialogue with the caller, live topic notifications streamed to the app (where you can let it continue, take over, or hang up), and post-call transcription, summary, classification, and entity extraction.

Cost-optionality by design: two interchangeable streaming-conversation engines behind a unified interface, OpenAI Realtime (audio-to-audio, more expensive) or a custom STT+LLM+TTS pipeline (4–5× cheaper, under a second to first audio, at the price of extra complexity around turn detection and barge-in).
Working and deployable, not a product: a formal security review with a blocking MUST-FIX list gated that claim; EU AI Act disclosure enforced as a state of the barge-in machine rather than a disclaimer, GDPR-compliant retention, hardened containers, multi-tier rate limiting, security-oriented CI.

Full write-up on the project page.

Technical skills

LLM systems — LangChain/LangGraph, Model Context Protocol (MCP), Azure OpenAI, RAG architectures, agentic tool design, prompt engineering

Evaluation — LLM-as-a-judge and rubric design, adversarial probe taxonomies and red-teaming (OWASP LLM Top 10), regression suites, evaluation cost/latency budgeting

Data & infra — Python, PostgreSQL (advanced), PL/SQL (Oracle OCP-certified), Weaviate, Redis, FastAPI, Docker, CI/CD (GitHub Actions, Jenkins), observability (Grafana, Loki, Prometheus); PyTorch (coursework, ongoing in NLP)

Foundations & workflow — Stochastic processes, probability, optimization; EU AI Act and GDPR in production AI (retention, endpoint selection, mandatory disclosure); daily agentic AI-assisted development (Claude Code, GitHub Copilot)

Experience#

CSI-Piemonte — AI Engineer, R&D — Turin, Italy#

Behavioral evaluation & governance framework for LLM agentic systems#

Two-stage retrieval architecture for national-scale tender assistance#

Internal MCP server for agent tool composition#

CSI-Piemonte — Senior Business Analyst — Cuneo, Italy#

aizoOn Technology Consulting — Business Data Analyst — Cuneo, Italy#

Publications & writing#

Education#

Personal projects#

AI voicemail assistant#

Technical skills#

Experience

CSI-Piemonte — AI Engineer, R&D — Turin, Italy

Behavioral evaluation & governance framework for LLM agentic systems

Two-stage retrieval architecture for national-scale tender assistance

Internal MCP server for agent tool composition

CSI-Piemonte — Senior Business Analyst — Cuneo, Italy

aizoOn Technology Consulting — Business Data Analyst — Cuneo, Italy

Publications & writing

Education

Personal projects

AI voicemail assistant

Technical skills