Monitoring Contentful Usage — Building a Prometheus Exporter Because the UI Won't Tell You

Table of Contents

  • Introduction
  • The Problem
  • The Architecture
  • How It Works
  • CLI Mode — One-Shot Reports
  • Prometheus Mode — Continuous Monitoring
  • Deploying on Kubernetes with Helm
  • Grafana Dashboards
  • Security Considerations
  • Conclusion
  • Reflections

Here we are. If you’ve ever managed a Contentful space at scale — I mean real scale, with thousands of entries, a dozen environments, and a team that publishes hourly — you’ve hit the wall. The Contentful web app shows you… not much. A few dashboard widgets, some high-level numbers, but nothing you can export, alert on, or trend over time.

Algolia prometheus exporter

Algolia Usage Exporter — A Case Study in AI-Assisted Tooling

What is this?

A lightweight Prometheus exporter that collects usage and infrastructure metrics from the Algolia search API and exposes them at /metrics for Prometheus / Datadog scraping.

It runs as a standalone HTTP server (single binary via Docker/Podman), requires minimal configuration (just two API keys), and exports metrics like:

  • Usage statistics: search operations, records, processing time, QPS, write operations
  • Infrastructure metrics: CPU, RAM, SSD utilization, build times (Premium plan only)
  • Health signals: scrape success status, timestamps

The entire project lives in a single Python package with ~50KB of code, a Helm chart for Kubernetes deployment, and full test coverage across unit, integration, and e2e layers.

Stargate LLM Gateway

Stargate LLM Gateway

https://github.com/lorenzogirardi/gw_llm/actions/workflows/ci.yml/badge.svg https://github.com/lorenzogirardi/gw_llm/actions/workflows/terraform-plan.yml/badge.svg

OpenAI-compatible LLM Gateway for AWS Bedrock with user management, usage tracking, and cost monitoring. Powered by LiteLLM.

Table of Contents


Overview

flowchart LR subgraph Clients cc[Claude Code] api[API Clients] end subgraph Gateway["LiteLLM Gateway"] auth[API Key Auth] budget[Budget Manager] router[Model Router] metrics[Prometheus Metrics] end subgraph Storage pg[(PostgreSQL)] end subgraph Bedrock["AWS Bedrock"] haiku[Claude Haiku 4.5] sonnet[Claude Sonnet 4.5] opus[Claude Opus 4.5] end cc --> auth api --> auth auth --> budget budget --> router router --> haiku & sonnet & opus budget <--> pg router --> metrics style haiku fill:#90EE90 style sonnet fill:#FFD700 style opus fill:#FF6B6B

Features

FeatureStatusDescription
OpenAI-compatible APIDrop-in replacement for Claude Code
User ManagementCreate users with budgets and model restrictions
API Key GenerationPer-user API keys with limits
Usage TrackingToken and cost tracking per user
Prometheus MetricsFull observability stack
Langfuse TracingLLM call logging and analysis
Streaming SupportSSE for real-time responses
Admin Endpoint SecurityBlocked from public access

Architecture

Request Flow

sequenceDiagram participant Client as Claude Code participant CF as CloudFront participant LiteLLM as LiteLLM Proxy participant PG as PostgreSQL participant Bedrock as AWS Bedrock participant VM as Victoria Metrics Client->>CF: POST /v1/chat/completions CF->>LiteLLM: Forward request LiteLLM->>LiteLLM: Validate API Key LiteLLM->>PG: Check User Budget alt Budget OK LiteLLM->>Bedrock: InvokeModel (SigV4) Bedrock-->>LiteLLM: Response + tokens LiteLLM->>PG: Update spend LiteLLM->>LiteLLM: Record metrics LiteLLM-->>Client: 200 OK else Budget Exceeded LiteLLM-->>Client: 429 Budget Exceeded end VM->>LiteLLM: Scrape /metrics/

Deployment Architecture

flowchart TB subgraph Internet clients[Claude Code / API Clients] end subgraph AWS["AWS Cloud (us-west-1)"] cf[CloudFront<br/>Admin endpoints blocked] subgraph VPC["VPC 10.10.0.0/16"] subgraph Public["Public Subnets"] alb[Application Load Balancer] nat[NAT Instance] end subgraph Private["Private Subnets"] subgraph ECS["ECS Fargate Cluster"] litellm[LiteLLM Proxy<br/>1 vCPU, 2GB] langfuse[Langfuse<br/>0.5 vCPU, 1GB] grafana[Grafana<br/>0.25 vCPU, 0.5GB] victoria[Victoria Metrics<br/>0.25 vCPU, 0.5GB] end rds[(RDS PostgreSQL<br/>db.t4g.micro)] end end bedrock[AWS Bedrock] secrets[Secrets Manager] end clients --> cf cf --> alb alb --> litellm alb --> grafana alb --> langfuse litellm --> bedrock litellm <--> rds litellm --> langfuse langfuse <--> rds victoria --> litellm grafana --> victoria litellm -.-> secrets style cf fill:#ffcccc,stroke:#ff0000 style rds fill:#336791,color:#fff style langfuse fill:#e6f3ff

Security Model

flowchart TB subgraph Public["Public Access (CloudFront)"] chat["/v1/chat/completions ✅"] models["/v1/models ✅"] health["/health/* ✅"] metrics["/metrics/ ✅"] end subgraph Blocked["Blocked from CloudFront (403)"] user["/user/* ❌"] key["/key/* ❌"] model["/model/* ❌"] spend["/spend/* ❌"] end subgraph Internal["Internal Access Only (ALB)"] user_alb["/user/* ✅"] key_alb["/key/* ✅"] model_alb["/model/* ✅"] spend_alb["/spend/* ✅"] end cf[CloudFront] --> Public cf -.->|403 Forbidden| Blocked alb[ALB Direct] --> Internal style Blocked fill:#ffcccc style Internal fill:#ccffcc

Quick Start

For Claude Code Users

Add these environment variables to your shell:

Redefining E-commerce for the Age of Conversational AI

Introduction

This article began as a classic exploration into JSON-LD and its impact on structured data for e-commerce. However, as research and experimentation progressed, a fundamental realization emerged: what’s actually changing is the entire user experience paradigm. The way end-users interact with digital products is shifting beyond traditional web and app models — toward dynamic, conversational discovery driven by AI.

For years, the typical customer journey followed one of several well-established routes:

Migrating Homelab from VMware ESXi to Proxmox: A New Era

Introduction

For years, VMware ESXi was the foundation of my homelab — stable, dependable, and familiar. Then Broadcom acquired VMware, and the writing was on the wall.

The free ESXi license disappeared. Support for consumer-grade hardware like MiniPCs and NUCs became problematic. The platform that had “just worked” for years was now actively working against the homelab use case.

Enter Proxmox — an open-source virtualization platform built on Debian Linux, offering KVM, LXC, ZFS, and native clustering without a licensing fee.

Building a Scalable Image CDN with MinIO, imgproxy, and Cloudflare

Intro

In today’s digital landscape, efficiently serving images is critical for website performance. Users expect fast-loading, responsive websites, and images often account for the majority of a page’s weight. In this article, I’ll walk you through building a powerful, scalable image CDN using open-source tools that you can deploy in your own infrastructure.

The Architecture

Our image CDN consists of three main components:

  1. MinIO — An S3-compatible object storage backend that stores original images
  2. imgproxy — A fast and secure image processing service that resizes and optimizes images on-the-fly
  3. Cloudflare — Providing CDN capabilities through Cloudflare Tunnel

/images/building-a-scalable-image-cdn-with-minio-imgproxy-and-cloudflare/Screenshot-2025-04-24-at-19.42.20.png

Websocket, Cloudflare Tunnel, Apache httpd and a Bit of Security

The Infrastructure Overview

Exposing home lab services to the internet can be both necessary and risky. Traditional methods — port forwarding, VPNs, reverse proxies with open inbound ports — come with their own set of challenges. This is where Cloudflare Tunnel (formerly Argo Tunnel) comes in as an elegant solution.

In this article, I’ll walk you through how I’ve implemented a secure infrastructure using Cloudflare Tunnel with WebSocket support, running on a Kubernetes cluster with Apache HTTPD as a reverse proxy. This setup allows me to securely expose internal services without opening ports on my residential firewall.

HPA vs Rate-limit

INTRO

Strange… we are using HPA to increase availability and introducing rate limiting to reduce it?

Well, let’s create the context.

This analysis is based on specific assumptions:

  • Cloud environment
  • Dynamic infrastructure
  • Minimum resources available

HPA

In Kubernetes, a HorizontalPodAutoscaler automatically updates a workload resource (Deployment, StatefulSet) to match demand.

Patterns

TypeBehaviour
Slow and temporaryDaily fluctuations, peaking during the day and troughing at night
Rapid and temporaryShort bursts from poorly-behaved downstream services
Slow and persistentRequest volume slowly increases as the product sees adoption
Rapid and persistentAbrupt shift from low to high volumes — e.g. called by batch jobs

Ideal Practice

TypeIdeal Practice
Slow and temporaryHPA should add and remove pods as necessary
Rapid and temporaryHPA should NOT modify pod count — leave headroom for brief spikes
Slow and persistentHPA should add and remove pods as necessary
Rapid and persistentLeave headroom; HPA adds pods quickly to restore target utilization

Rate Limit

A rate limit is the number of API calls an app or user can make within a given time period. If this limit is exceeded — or if CPU or time limits are exceeded — the app may be throttled. Throttled requests fail.

Application Rate Limit

I needed to implement rate limiting within an application for reasons I’ll get into in a follow-up post. When you start thinking about this, you basically have two paths:

  1. Logic embedded directly in the application code
  2. A sidecar container that handles the rate limiting role

Both work. Both have trade-offs. Let me go through each one.

The Code Way

This is the simpler approach on the surface, but it comes with some annoying limitations. It can only be reused for applications in the same programming language. And adding rate limiting logic inside the application creates a secondary role — meaning the request interceptor will consume CPU and may produce false metrics if you’re not tracking it carefully.

YA VPN Service in Kubernetes

Why

I had my beloved IPsec setup based on strongswan running in Kubernetes for a while — you can read about that here. It worked fine. I wasn’t looking to change it. Then a colleague pointed out WireGuard’s overhead numbers and I got curious enough to evaluate it myself.

WireGuard is a modern VPN protocol that lives in the Linux kernel. It’s designed to be simple, fast, and have a minimal attack surface compared to IPsec or OpenVPN. The numbers people throw around are impressive, but I wanted to see them in practice.