What AI Actually Costs: 27 Sessions of Real Data

Lorenzo Girardi published on 2026-07-26

Introduction
The Question Nobody Answers
The Dataset: 27 Sessions, 61 Days
Where the Money Actually Goes
- Cache Read Is the Bill
- The Whale Session
- Head and Tail
The Model Mix Problem
Output Compression: The Lever Everyone Reaches For First
The Velocity Half of the Question
- km/l Is Not the Metric. km/€ Is.
- The Efficiency Gap Between People
A Framework for Getting the Data
- Step 1: Parse Your Own Logs
- Step 2: The Ticket Protocol
- Step 3: The Metrics That Matter
From Measurement to a Development Framework
- Rule 1: One Ticket, One Session
- Rule 2: Route the Model to the Task
- Rule 3: Compress the Output
- Rule 4: Build Skills, Not Prompts
The Maturity Path
The Honest Caveats
Conclusion
Reflections

Here we are. Every conversation about AI in engineering right now runs on vibes. It’s faster. It’s cheaper. It’s transformative. Ask for a number and the room goes quiet.

The Safe Zone: Where AI Actually Belongs in Business Processes

Lorenzo Girardi published on 2026-07-24

Introduction
The Problem With Urgency
The Safe Zone Framework
The Case: Algolia Usage Exporter
- Before / After
- How It’s Built
- The Skill Framework: Three Skills, Not One Prompt
- From API Schema to Natural-Language Contract
- What It Exposes
- Numbers That Weren’t Visible Before
Why the Pattern Replicates
The Honest Challenges
- AI Amplifies Whatever Maturity Already Exists
- Tech Debt Doesn’t Disappear
- Creating Is Easy. Deploying on Shared Platform Isn’t
- Ownership Is Still Informal
- The Escalation Risk: Safe Today, Critical Tomorrow
A Maturity Model, Not a Tech Roadmap
Security Considerations
Conclusion
Reflections

Here we are. I recently put together an internal presentation on introducing AI into business processes, and the hardest part wasn’t the technology. It was convincing people to look in the opposite direction of where they instinctively point AI.

I Made Claude Code Talk Like a Caveman for 61 Days, Then Did the Math

Lorenzo Girardi published on 2026-07-19

Introduction
The Problem: Verbose Agents Cost Real Money
Enter Caveman Mode
The Question I Actually Wanted Answered
Method: Mining 27 Session Logs
The Results, Per Model
Where the Money Actually Goes
How Big Were These Sessions, Really?
The Fable 5 Shift
From a Homelab Toy to an Enterprise Line Item
Reflections
Conclusion

Strange… an AI assistant that talks less should cost less. Obvious, right? I wanted a number, not a vibe.

Who Is Your AI Agent Acting For? RFC 8693 On-Behalf-Of Delegation

Lorenzo Girardi published on 2026-07-18

Introduction
The Problem: agents are anonymous proxies
Enter RFC 8693: Token Exchange, On-Behalf-Of
The Architecture
The Identity Flow, Step by Step
Token Anatomy
Inside the JWT: Claims, Exchange Mechanics, Group-Based Permissions
Where Authorization Actually Happens
Observability: Watching Delegation Happen
Security Properties
Conclusion
Reflections

Here we are. Everyone is wiring AI agents to real systems — Kubernetes clusters, CI pipelines, internal APIs — and almost nobody is asking the boring question first: when the agent calls a tool, who is it?

/images/Screenshot 2026-06-07 at 13.10.19.jpg

Running a Local LLM on AMD Radeon 780M — gfx1103, ROCm, and the GPU That Wasn't Supposed to Work

Lorenzo Girardi published on 2026-06-07

The Machine
The Problem: gfx1103 Doesn’t Exist
GTT Memory — 24 GB for Free
The ROCm Stack
Getting GPU Inference Working
Optimizing: The Hidden GPU Clock Problem
Benchmarks — Every Configuration Tested
The Surprising Finding: CPU Beats GPU on Generation
The Real Bottleneck: Single-Channel RAM
The Breakthrough: MoE on CPU
Monitoring with Collectd and Grafana
What the Dashboard Actually Shows
Lessons Learned

I wanted a local AI box. Not a cloud API with latency and per-token billing. Not a GPU workstation that sounds like a jet engine. A quiet mini-PC that runs a capable model at home, on my desk, forever, for free.

AI Agentic Development Changes Who Builds Software — and That's an Infrastructure Problem

Lorenzo Girardi published on 2026-06-06

The Shift Is Already Happening
The Problem Nobody Prepared For
The Design Principle: Safe by Default
The Platform Contract — What Every App Must Be
- Supported Languages and Base Images
- Supported Components
- T-shirt Sizing
- Port Contract
- Health Endpoints
- Secrets: Sealed, Always
- Images: Commit SHA, Never Latest
- Network: Default Deny, Every Time
The Three-Tier Monitoring Contract
- System Tier — Automatic
- Framework Tier — App Metrics
- Business Tier — What the App Actually Does
The Review Gate — 35+ Checks Before Deploy
The Helm Chart — Five Questions, Full Platform
The CI/CD Pipeline — AI App Meets GitOps
The Skill Pipeline — AI Onboarding an AI App
The RACI Collapse
Conclusion

Here we are. Somewhere in the last twelve months, something quietly changed.

/images/monitoring-contentful-usage-with-prometheus-exporter/image-contentful.png

Monitoring Contentful Usage — Building a Prometheus Exporter Because the UI Won't Tell You

Lorenzo Girardi published on 2026-06-02

Introduction
The Problem
The Architecture
How It Works
CLI Mode — One-Shot Reports
Prometheus Mode — Continuous Monitoring
Deploying on Kubernetes with Helm
Grafana Dashboards
Security Considerations
Conclusion
Reflections

Here we are. If you’ve ever managed a Contentful space at scale — I mean real scale, with thousands of entries, a dozen environments, and a team that publishes hourly — you’ve hit the wall. The Contentful web app shows you… not much. A few dashboard widgets, some high-level numbers, but nothing you can export, alert on, or trend over time.

/images/Gemini_Generated_Image_u9oq1ru9oq1ru9oq.jpg

Algolia prometheus exporter

Lorenzo Girardi published on 2026-05-26

Algolia Usage Exporter — A Case Study in AI-Assisted Tooling

What is this?

A lightweight Prometheus exporter that collects usage and infrastructure metrics from the Algolia search API and exposes them at /metrics for Prometheus / Datadog scraping.

It runs as a standalone HTTP server (single binary via Docker/Podman), requires minimal configuration (just two API keys), and exports metrics like:

Usage statistics: search operations, records, processing time, QPS, write operations
Infrastructure metrics: CPU, RAM, SSD utilization, build times (Premium plan only)
Health signals: scrape success status, timestamps

The entire project lives in a single Python package with ~50KB of code, a Helm chart for Kubernetes deployment, and full test coverage across unit, integration, and e2e layers.

Stargate LLM Gateway

Lorenzo Girardi published on 2026-05-24

Stargate LLM Gateway

OpenAI-compatible LLM Gateway for AWS Bedrock with user management, usage tracking, and cost monitoring. Powered by LiteLLM.

Overview
Architecture
Quick Start
User Management
Security
Monitoring & Observability
Documentation Index
API Reference

Overview

flowchart LR subgraph Clients cc[Claude Code] api[API Clients] end subgraph Gateway["LiteLLM Gateway"] auth[API Key Auth] budget[Budget Manager] router[Model Router] metrics[Prometheus Metrics] end subgraph Storage pg[(PostgreSQL)] end subgraph Bedrock["AWS Bedrock"] haiku[Claude Haiku 4.5] sonnet[Claude Sonnet 4.5] opus[Claude Opus 4.5] end cc --> auth api --> auth auth --> budget budget --> router router --> haiku & sonnet & opus budget <--> pg router --> metrics style haiku fill:#90EE90 style sonnet fill:#FFD700 style opus fill:#FF6B6B

Features

Feature	Status	Description
OpenAI-compatible API	✅	Drop-in replacement for Claude Code
User Management	✅	Create users with budgets and model restrictions
API Key Generation	✅	Per-user API keys with limits
Usage Tracking	✅	Token and cost tracking per user
Prometheus Metrics	✅	Full observability stack
Langfuse Tracing	✅	LLM call logging and analysis
Streaming Support	✅	SSE for real-time responses
Admin Endpoint Security	✅	Blocked from public access

Architecture

Request Flow

sequenceDiagram participant Client as Claude Code participant CF as CloudFront participant LiteLLM as LiteLLM Proxy participant PG as PostgreSQL participant Bedrock as AWS Bedrock participant VM as Victoria Metrics Client->>CF: POST /v1/chat/completions CF->>LiteLLM: Forward request LiteLLM->>LiteLLM: Validate API Key LiteLLM->>PG: Check User Budget alt Budget OK LiteLLM->>Bedrock: InvokeModel (SigV4) Bedrock-->>LiteLLM: Response + tokens LiteLLM->>PG: Update spend LiteLLM->>LiteLLM: Record metrics LiteLLM-->>Client: 200 OK else Budget Exceeded LiteLLM-->>Client: 429 Budget Exceeded end VM->>LiteLLM: Scrape /metrics/

Deployment Architecture

flowchart TB subgraph Internet clients[Claude Code / API Clients] end subgraph AWS["AWS Cloud (us-west-1)"] cf[CloudFront Admin endpoints blocked] subgraph VPC["VPC 10.10.0.0/16"] subgraph Public["Public Subnets"] alb[Application Load Balancer] nat[NAT Instance] end subgraph Private["Private Subnets"] subgraph ECS["ECS Fargate Cluster"] litellm[LiteLLM Proxy 1 vCPU, 2GB] langfuse[Langfuse 0.5 vCPU, 1GB] grafana[Grafana 0.25 vCPU, 0.5GB] victoria[Victoria Metrics 0.25 vCPU, 0.5GB] end rds[(RDS PostgreSQL db.t4g.micro)] end end bedrock[AWS Bedrock] secrets[Secrets Manager] end clients --> cf cf --> alb alb --> litellm alb --> grafana alb --> langfuse litellm --> bedrock litellm <--> rds litellm --> langfuse langfuse <--> rds victoria --> litellm grafana --> victoria litellm -.-> secrets style cf fill:#ffcccc,stroke:#ff0000 style rds fill:#336791,color:#fff style langfuse fill:#e6f3ff

Security Model

flowchart TB subgraph Public["Public Access (CloudFront)"] chat["/v1/chat/completions ✅"] models["/v1/models ✅"] health["/health/* ✅"] metrics["/metrics/ ✅"] end subgraph Blocked["Blocked from CloudFront (403)"] user["/user/* ❌"] key["/key/* ❌"] model["/model/* ❌"] spend["/spend/* ❌"] end subgraph Internal["Internal Access Only (ALB)"] user_alb["/user/* ✅"] key_alb["/key/* ✅"] model_alb["/model/* ✅"] spend_alb["/spend/* ✅"] end cf[CloudFront] --> Public cf -.->|403 Forbidden| Blocked alb[ALB Direct] --> Internal style Blocked fill:#ffcccc style Internal fill:#ccffcc

Quick Start

For Claude Code Users

Add these environment variables to your shell:

/images/redefining-ux-for-the-age-of-conversational-ai/chat-driven-user-experience.jpg

Redefining E-commerce for the Age of Conversational AI

Lorenzo Girardi published on 2025-08-31

Introduction

This article began as a classic exploration into JSON-LD and its impact on structured data for e-commerce. However, as research and experimentation progressed, a fundamental realization emerged: what’s actually changing is the entire user experience paradigm. The way end-users interact with digital products is shifting beyond traditional web and app models — toward dynamic, conversational discovery driven by AI.

For years, the typical customer journey followed one of several well-established routes:

What AI Actually Costs: 27 Sessions of Real Data

Table of Contents

The Safe Zone: Where AI Actually Belongs in Business Processes

Table of Contents

I Made Claude Code Talk Like a Caveman for 61 Days, Then Did the Math

Table of Contents

Who Is Your AI Agent Acting For? RFC 8693 On-Behalf-Of Delegation

Table of Contents

Running a Local LLM on AMD Radeon 780M — gfx1103, ROCm, and the GPU That Wasn't Supposed to Work

Table of Contents

AI Agentic Development Changes Who Builds Software — and That's an Infrastructure Problem

Table of Contents

Monitoring Contentful Usage — Building a Prometheus Exporter Because the UI Won't Tell You

Table of Contents

Algolia prometheus exporter

Algolia Usage Exporter — A Case Study in AI-Assisted Tooling

What is this?

Stargate LLM Gateway

Stargate LLM Gateway

Table of Contents

Overview

Features

Architecture

Request Flow

Deployment Architecture

Security Model

Quick Start

For Claude Code Users

Redefining E-commerce for the Age of Conversational AI

Introduction