Senior GenAI Engineer — LLM systems, agentic architectures, evaluation & governance

Turin, Italy · [email protected] · linkedin.com/in/pietro-ferragamo · github.com/pietro121orteip

Applied AI engineer with a mathematical background, building LLM systems deployed at national scale for Italian Public Administration. Sole author of the semantic retrieval layer (100K+ production queries), the behavioral evaluation framework, and the internal MCP server behind a public-sector tender-assistance chatbot. Currently completing an MSc in Computer Science (ML specialization) at Georgia Tech alongside a senior IC role at CSI Piemonte, with deep interest in evaluation, governance, and safety of agentic AI systems in production.


Technical skills

LLM & agentic stack — LangChain, LangGraph, Model Context Protocol (MCP), Azure OpenAI, prompt engineering, tool-use design, LLM-as-a-judge, RAG architectures, agentic orchestration, behavioral evaluation & red-teaming

Evaluation & governance — Multi-rubric LLM judges, regression suites, behavioral probe taxonomies (prompt injection, fairness, temporal integrity, tool-use verification), latency/cost/quality tradeoffs, EU AI Act and GDPR considerations in production AI, observability for AI services

Retrieval & data — Hybrid retrieval (dense + sparse + trigram), Weaviate, Redis, pgvector, entity extraction, metadata-driven retrieval, embedding-collection quality analysis

Languages & infra — Python, SQL (PostgreSQL advanced, Oracle OCP-certified), FastAPI, Docker, Bash, Git; PyTorch (regression/classification, NLP); CI/CD (Jenkins, GitHub Actions); observability (Grafana, Prometheus, Loki, Promtail)

Cloud & deployment — Azure (Azure OpenAI, container deployments); Helm-chart authorship in progress

Mathematical foundations — Stochastic processes, probability, optimization, Bayesian filtering (Kalman, particle filters, SLAM)


Experience

CSI-Piemonte — AI Engineer, R&D — Turin, Italy

Sep 2024 – Present

R&D on LLM-based systems for Italy’s largest regional in-house IT provider. Sole engineer on the retrieval, evaluation, and tool-composition layers of a national-scale tender-assistance chatbot deployed for Italian Public Administration. Stack: Python, Azure OpenAI, LangChain/LangGraph, Weaviate, PostgreSQL, FastAPI, Docker.

Semantic retrieval engine for national-scale tender assistance — 101K production queries over 6 months, 2.3s average end-to-end latency

Designed and built from scratch as the core retrieval tool of a chatbot deployed nationally over Italian public tenders. LLM-based entity extraction with deterministic fallback, relational expansion over hand-curated dictionaries (ISTAT geographic hierarchy, degree equivalences, institutional taxonomies) declared as PostgreSQL business rules. Hybrid filtering combines exact match, trigram similarity, and Weaviate dense retrieval; exposed as a FastAPI service and registered as an agent tool. Defect rate on the order of one client-reported miss per several weeks of live traffic; internal regression suite of several hundred annotated queries at 97% pass rate.

Behavioral evaluation & governance framework for LLM agentic systems

Task-oriented evaluator built end-to-end (PostgreSQL, Python, FastAPI workers, Docker, shell CLI). DAG-based scheduler with dependency rules declared in PostgreSQL; workers poll a queue function that resolves the implicit DAG at dispatch time, enabling horizontal scaling with correct ordering. LLM-as-a-judge with task-specific multi-rubric evaluation across heterogeneous types: pseudo-ground-truth verification, dynamic multi-turn scope-switching conversations, search-engine behavior checks, embedding-collection quality tests. Tool-use verification probes enforce a desired-tool contract per turn and flag claims produced without invoking the corresponding retrieval tool.

Behavioral probe suite: 167 multi-turn adversarial scenarios covering prompt injection & jailbreak resistance, information-boundary protection, fairness & non-discrimination, temporal integrity, language-policy enforcement, abuse resilience, recovery to baseline post-attack. Self-designed taxonomy derived from production interactions.

Unified MCP server for agent tool composition

Consolidated fragmented per-use-case tools into a single internal MCP server (STDIO), exposing agent tools as compositions of reusable primitives (memory ops; retrieval ops: semantic search, exact metadata filter, metadata discovery). Tools declared in YAML configuration, not code — new agent tools ship without touching the server.


CSI-Piemonte — Senior Business Analyst — Cuneo, Italy

Sep 2022 – Sep 2024

Lead analyst on geospatial components of regional IT systems administering 1B+ € of EU/national/regional funds. Drove the region above EU geolocalization thresholds, avoiding partial funds decommitment. Functional and technical analysis, testing, and quantitative reporting for the Integrated Administration and Control System (IACS) and related EU financial-statistics requirements.


aizoOn Technology Consulting — Business Data Analyst — Cuneo, Italy

Nov 2016 – Sep 2022

Functional analysis and PL/SQL development (Oracle 11, including Oracle Spatial) for public-sector clients. Use-case design, stored-procedure development and tuning, software testing, and SQL-based reporting for state officials and EU regulatory authorities.


Personal projects

AI voicemail assistant — Python, FastAPI, OpenAI, Silero VAD, Twilio, React Native; Hetzner VPS — GitHub

Solo end-to-end build of an Italian-language voicemail agent that screens incoming calls and streams a live transcript to a paired mobile app. Two interchangeable streaming-conversation engines (OpenAI Realtime ~$0.08/min vs. custom STT+LLM+TTS pipeline ~$0.02/min) behind a unified interface. EU AI Act disclosure enforced in greeting and at the audio gate; GDPR-compliant 7-day audio retention. Hardened Docker, multi-tier Redis rate limiting, security-focused CI/CD (Bandit, Semgrep, Safety, Trufflehog, Trivy).


Publications & writing

  • Camilla, l’assistente AI che sfida la burocrazia nella PAagendadigitale.eu, April 2026 — Architecture and governance choices behind the national PA chatbot: hybrid retrieval, automated evaluation pipelines, LLM-as-a-judge methodology.
  • [Forthcoming] Architectural extensibility of AI assistants via internal MCP servers — agendadigitale.eu
  • [In drafting] Red-teaming and behavioral evaluation of agentic AI systems in regulated contexts — agendadigitale.eu
  • Optimal Control of an Ecogenetic ModelInternational Journal of Applied and Computational Mathematics, 2017 — Co-authored peer-reviewed article.

Education

Georgia Institute of Technology — MSc Computer Science, ML specialization — Aug 2024 – Apr 2027 (in progress) 4.0 GPA across 6/10 completed courses. Completed: Machine Learning, Artificial Intelligence, Knowledge-Based AI, AI for Robotics, Ethics in AI, Human-Computer Interaction. In progress: Natural Language Processing. Planned: Deep Learning, Reinforcement Learning, Graduate Algorithms.

Università degli Studi di Torino — MSc Mathematics — 2014–2016 Final grade: 110L/110 e menzione. Thesis: Levy Processes in Finance.

Università degli Studi di Torino — BSc Mathematics — 2010–2014

Conservatorio “G. Verdi” di Torino — Diploma, Classical Guitar — 2001–2008

Languages: Italian (native), English (C2 — IELTS Academic Band 8.5, January 2024).