All Posts - Lorenzo's Blog

Monitoring Contentful Usage — Building a Prometheus Exporter Because the UI Won't Tell You

Lorenzo Girardi — Tue, 02 Jun 2026 00:00:00 +0000

Introduction
The Problem
The Architecture
How It Works
CLI Mode — One-Shot Reports
Prometheus Mode — Continuous Monitoring
Deploying on Kubernetes with Helm
Grafana Dashboards
Security Considerations
Conclusion
Reflections

Here we are. If you’ve ever managed a Contentful space at scale — I mean real scale, with thousands of entries, a dozen environments, and a team that publishes hourly — you’ve hit the wall. The Contentful web app shows you… not much. A few dashboard widgets, some high-level numbers, but nothing you can export, alert on, or trend over time.

Algolia prometheus exporter

Lorenzo Girardi — Tue, 26 May 2026 00:00:00 +0000

Algolia Usage Exporter — A Case Study in AI-Assisted Tooling

What is this?

A lightweight Prometheus exporter that collects usage and infrastructure metrics from the Algolia search API and exposes them at /metrics for Prometheus / Datadog scraping.

It runs as a standalone HTTP server (single binary via Docker/Podman), requires minimal configuration (just two API keys), and exports metrics like:

Usage statistics: search operations, records, processing time, QPS, write operations
Infrastructure metrics: CPU, RAM, SSD utilization, build times (Premium plan only)
Health signals: scrape success status, timestamps

The entire project lives in a single Python package with ~50KB of code, a Helm chart for Kubernetes deployment, and full test coverage across unit, integration, and e2e layers.

Stargate LLM Gateway

Lorenzo Girardi — Sun, 24 May 2026 00:00:00 +0000

Stargate LLM Gateway

OpenAI-compatible LLM Gateway for AWS Bedrock with user management, usage tracking, and cost monitoring. Powered by LiteLLM.

Overview
Architecture
Quick Start
User Management
Security
Monitoring & Observability
Documentation Index
API Reference

Overview

flowchart LR subgraph Clients cc[Claude Code] api[API Clients] end subgraph Gateway["LiteLLM Gateway"] auth[API Key Auth] budget[Budget Manager] router[Model Router] metrics[Prometheus Metrics] end subgraph Storage pg[(PostgreSQL)] end subgraph Bedrock["AWS Bedrock"] haiku[Claude Haiku 4.5] sonnet[Claude Sonnet 4.5] opus[Claude Opus 4.5] end cc --> auth api --> auth auth --> budget budget --> router router --> haiku & sonnet & opus budget <--> pg router --> metrics style haiku fill:#90EE90 style sonnet fill:#FFD700 style opus fill:#FF6B6B

Features

Feature	Status	Description
OpenAI-compatible API	✅	Drop-in replacement for Claude Code
User Management	✅	Create users with budgets and model restrictions
API Key Generation	✅	Per-user API keys with limits
Usage Tracking	✅	Token and cost tracking per user
Prometheus Metrics	✅	Full observability stack
Langfuse Tracing	✅	LLM call logging and analysis
Streaming Support	✅	SSE for real-time responses
Admin Endpoint Security	✅	Blocked from public access

Architecture

Request Flow

sequenceDiagram participant Client as Claude Code participant CF as CloudFront participant LiteLLM as LiteLLM Proxy participant PG as PostgreSQL participant Bedrock as AWS Bedrock participant VM as Victoria Metrics Client->>CF: POST /v1/chat/completions CF->>LiteLLM: Forward request LiteLLM->>LiteLLM: Validate API Key LiteLLM->>PG: Check User Budget alt Budget OK LiteLLM->>Bedrock: InvokeModel (SigV4) Bedrock-->>LiteLLM: Response + tokens LiteLLM->>PG: Update spend LiteLLM->>LiteLLM: Record metrics LiteLLM-->>Client: 200 OK else Budget Exceeded LiteLLM-->>Client: 429 Budget Exceeded end VM->>LiteLLM: Scrape /metrics/

Deployment Architecture

flowchart TB subgraph Internet clients[Claude Code / API Clients] end subgraph AWS["AWS Cloud (us-west-1)"] cf[CloudFront
Admin endpoints blocked] subgraph VPC["VPC 10.10.0.0/16"] subgraph Public["Public Subnets"] alb[Application Load Balancer] nat[NAT Instance] end subgraph Private["Private Subnets"] subgraph ECS["ECS Fargate Cluster"] litellm[LiteLLM Proxy
1 vCPU, 2GB] langfuse[Langfuse
0.5 vCPU, 1GB] grafana[Grafana
0.25 vCPU, 0.5GB] victoria[Victoria Metrics
0.25 vCPU, 0.5GB] end rds[(RDS PostgreSQL
db.t4g.micro)] end end bedrock[AWS Bedrock] secrets[Secrets Manager] end clients --> cf cf --> alb alb --> litellm alb --> grafana alb --> langfuse litellm --> bedrock litellm <--> rds litellm --> langfuse langfuse <--> rds victoria --> litellm grafana --> victoria litellm -.-> secrets style cf fill:#ffcccc,stroke:#ff0000 style rds fill:#336791,color:#fff style langfuse fill:#e6f3ff

Security Model

flowchart TB subgraph Public["Public Access (CloudFront)"] chat["/v1/chat/completions ✅"] models["/v1/models ✅"] health["/health/* ✅"] metrics["/metrics/ ✅"] end subgraph Blocked["Blocked from CloudFront (403)"] user["/user/* ❌"] key["/key/* ❌"] model["/model/* ❌"] spend["/spend/* ❌"] end subgraph Internal["Internal Access Only (ALB)"] user_alb["/user/* ✅"] key_alb["/key/* ✅"] model_alb["/model/* ✅"] spend_alb["/spend/* ✅"] end cf[CloudFront] --> Public cf -.->|403 Forbidden| Blocked alb[ALB Direct] --> Internal style Blocked fill:#ffcccc style Internal fill:#ccffcc

Quick Start

For Claude Code Users

Add these environment variables to your shell:

Redefining E-commerce for the Age of Conversational AI

Lorenzo Girardi — Sun, 31 Aug 2025 00:00:00 +0000

Introduction

This article began as a classic exploration into JSON-LD and its impact on structured data for e-commerce. However, as research and experimentation progressed, a fundamental realization emerged: what’s actually changing is the entire user experience paradigm. The way end-users interact with digital products is shifting beyond traditional web and app models — toward dynamic, conversational discovery driven by AI.

For years, the typical customer journey followed one of several well-established routes:

Migrating Homelab from VMware ESXi to Proxmox: A New Era

Lorenzo Girardi — Mon, 28 Apr 2025 00:00:00 +0000

Introduction

For years, VMware ESXi was the foundation of my homelab — stable, dependable, and familiar. Then Broadcom acquired VMware, and the writing was on the wall.

The free ESXi license disappeared. Support for consumer-grade hardware like MiniPCs and NUCs became problematic. The platform that had “just worked” for years was now actively working against the homelab use case.

Enter Proxmox — an open-source virtualization platform built on Debian Linux, offering KVM, LXC, ZFS, and native clustering without a licensing fee.

Building a Scalable Image CDN with MinIO, imgproxy, and Cloudflare

Lorenzo Girardi — Thu, 24 Apr 2025 00:00:00 +0000

Intro

In today’s digital landscape, efficiently serving images is critical for website performance. Users expect fast-loading, responsive websites, and images often account for the majority of a page’s weight. In this article, I’ll walk you through building a powerful, scalable image CDN using open-source tools that you can deploy in your own infrastructure.

The Architecture

Our image CDN consists of three main components:

MinIO — An S3-compatible object storage backend that stores original images
imgproxy — A fast and secure image processing service that resizes and optimizes images on-the-fly
Cloudflare — Providing CDN capabilities through Cloudflare Tunnel

Websocket, Cloudflare Tunnel, Apache httpd and a Bit of Security

Lorenzo Girardi — Thu, 03 Aug 2023 00:00:00 +0000

The Infrastructure Overview

Exposing home lab services to the internet can be both necessary and risky. Traditional methods — port forwarding, VPNs, reverse proxies with open inbound ports — come with their own set of challenges. This is where Cloudflare Tunnel (formerly Argo Tunnel) comes in as an elegant solution.

In this article, I’ll walk you through how I’ve implemented a secure infrastructure using Cloudflare Tunnel with WebSocket support, running on a Kubernetes cluster with Apache HTTPD as a reverse proxy. This setup allows me to securely expose internal services without opening ports on my residential firewall.

HPA vs Rate-limit

Lorenzo Girardi — Tue, 14 Feb 2023 00:00:00 +0000

INTRO

Strange… we are using HPA to increase availability and introducing rate limiting to reduce it?

Well, let’s create the context.

This analysis is based on specific assumptions:

Cloud environment
Dynamic infrastructure
Minimum resources available

HPA

In Kubernetes, a HorizontalPodAutoscaler automatically updates a workload resource (Deployment, StatefulSet) to match demand.

Patterns

Type	Behaviour
Slow and temporary	Daily fluctuations, peaking during the day and troughing at night
Rapid and temporary	Short bursts from poorly-behaved downstream services
Slow and persistent	Request volume slowly increases as the product sees adoption
Rapid and persistent	Abrupt shift from low to high volumes — e.g. called by batch jobs

Ideal Practice

Type	Ideal Practice
Slow and temporary	HPA should add and remove pods as necessary
Rapid and temporary	HPA should NOT modify pod count — leave headroom for brief spikes
Slow and persistent	HPA should add and remove pods as necessary
Rapid and persistent	Leave headroom; HPA adds pods quickly to restore target utilization

Rate Limit

A rate limit is the number of API calls an app or user can make within a given time period. If this limit is exceeded — or if CPU or time limits are exceeded — the app may be throttled. Throttled requests fail.

Application Rate Limit

Lorenzo Girardi — Sat, 11 Feb 2023 00:00:00 +0000

I needed to implement rate limiting within an application for reasons I’ll get into in a follow-up post. When you start thinking about this, you basically have two paths:

Logic embedded directly in the application code
A sidecar container that handles the rate limiting role

Both work. Both have trade-offs. Let me go through each one.

The Code Way

This is the simpler approach on the surface, but it comes with some annoying limitations. It can only be reused for applications in the same programming language. And adding rate limiting logic inside the application creates a secondary role — meaning the request interceptor will consume CPU and may produce false metrics if you’re not tracking it carefully.

YA VPN Service in Kubernetes

Lorenzo Girardi — Mon, 18 Jul 2022 00:00:00 +0000

Why

I had my beloved IPsec setup based on strongswan running in Kubernetes for a while — you can read about that here. It worked fine. I wasn’t looking to change it. Then a colleague pointed out WireGuard’s overhead numbers and I got curious enough to evaluate it myself.

WireGuard is a modern VPN protocol that lives in the Linux kernel. It’s designed to be simple, fast, and have a minimal attack surface compared to IPsec or OpenVPN. The numbers people throw around are impressive, but I wanted to see them in practice.

All Posts - Lorenzo's Blog

Monitoring Contentful Usage — Building a Prometheus Exporter Because the UI Won't Tell You

Table of Contents

Algolia prometheus exporter

Algolia Usage Exporter — A Case Study in AI-Assisted Tooling

What is this?

Stargate LLM Gateway

Stargate LLM Gateway

Table of Contents

Overview

Features

Architecture

Request Flow

Deployment Architecture

Security Model

Quick Start

For Claude Code Users

Redefining E-commerce for the Age of Conversational AI

Introduction

Migrating Homelab from VMware ESXi to Proxmox: A New Era

Introduction

Building a Scalable Image CDN with MinIO, imgproxy, and Cloudflare

Intro

The Architecture

Websocket, Cloudflare Tunnel, Apache httpd and a Bit of Security

The Infrastructure Overview

HPA vs Rate-limit

INTRO

HPA

Patterns

Ideal Practice

Rate Limit

Application Rate Limit

The Code Way

YA VPN Service in Kubernetes

Why