Contents

Stargate LLM Gateway

Stargate LLM Gateway

https://github.com/lorenzogirardi/gw_llm/actions/workflows/ci.yml/badge.svg https://github.com/lorenzogirardi/gw_llm/actions/workflows/terraform-plan.yml/badge.svg

OpenAI-compatible LLM Gateway for AWS Bedrock with user management, usage tracking, and cost monitoring. Powered by LiteLLM.

Table of Contents


Overview

flowchart LR subgraph Clients cc[Claude Code] api[API Clients] end subgraph Gateway["LiteLLM Gateway"] auth[API Key Auth] budget[Budget Manager] router[Model Router] metrics[Prometheus Metrics] end subgraph Storage pg[(PostgreSQL)] end subgraph Bedrock["AWS Bedrock"] haiku[Claude Haiku 4.5] sonnet[Claude Sonnet 4.5] opus[Claude Opus 4.5] end cc --> auth api --> auth auth --> budget budget --> router router --> haiku & sonnet & opus budget <--> pg router --> metrics style haiku fill:#90EE90 style sonnet fill:#FFD700 style opus fill:#FF6B6B

Features

FeatureStatusDescription
OpenAI-compatible APIDrop-in replacement for Claude Code
User ManagementCreate users with budgets and model restrictions
API Key GenerationPer-user API keys with limits
Usage TrackingToken and cost tracking per user
Prometheus MetricsFull observability stack
Langfuse TracingLLM call logging and analysis
Streaming SupportSSE for real-time responses
Admin Endpoint SecurityBlocked from public access

Architecture

Request Flow

sequenceDiagram participant Client as Claude Code participant CF as CloudFront participant LiteLLM as LiteLLM Proxy participant PG as PostgreSQL participant Bedrock as AWS Bedrock participant VM as Victoria Metrics Client->>CF: POST /v1/chat/completions CF->>LiteLLM: Forward request LiteLLM->>LiteLLM: Validate API Key LiteLLM->>PG: Check User Budget alt Budget OK LiteLLM->>Bedrock: InvokeModel (SigV4) Bedrock-->>LiteLLM: Response + tokens LiteLLM->>PG: Update spend LiteLLM->>LiteLLM: Record metrics LiteLLM-->>Client: 200 OK else Budget Exceeded LiteLLM-->>Client: 429 Budget Exceeded end VM->>LiteLLM: Scrape /metrics/

Deployment Architecture

flowchart TB subgraph Internet clients[Claude Code / API Clients] end subgraph AWS["AWS Cloud (us-west-1)"] cf[CloudFront<br/>Admin endpoints blocked] subgraph VPC["VPC 10.10.0.0/16"] subgraph Public["Public Subnets"] alb[Application Load Balancer] nat[NAT Instance] end subgraph Private["Private Subnets"] subgraph ECS["ECS Fargate Cluster"] litellm[LiteLLM Proxy<br/>1 vCPU, 2GB] langfuse[Langfuse<br/>0.5 vCPU, 1GB] grafana[Grafana<br/>0.25 vCPU, 0.5GB] victoria[Victoria Metrics<br/>0.25 vCPU, 0.5GB] end rds[(RDS PostgreSQL<br/>db.t4g.micro)] end end bedrock[AWS Bedrock] secrets[Secrets Manager] end clients --> cf cf --> alb alb --> litellm alb --> grafana alb --> langfuse litellm --> bedrock litellm <--> rds litellm --> langfuse langfuse <--> rds victoria --> litellm grafana --> victoria litellm -.-> secrets style cf fill:#ffcccc,stroke:#ff0000 style rds fill:#336791,color:#fff style langfuse fill:#e6f3ff

Security Model

flowchart TB subgraph Public["Public Access (CloudFront)"] chat["/v1/chat/completions ✅"] models["/v1/models ✅"] health["/health/* ✅"] metrics["/metrics/ ✅"] end subgraph Blocked["Blocked from CloudFront (403)"] user["/user/* ❌"] key["/key/* ❌"] model["/model/* ❌"] spend["/spend/* ❌"] end subgraph Internal["Internal Access Only (ALB)"] user_alb["/user/* ✅"] key_alb["/key/* ✅"] model_alb["/model/* ✅"] spend_alb["/spend/* ✅"] end cf[CloudFront] --> Public cf -.->|403 Forbidden| Blocked alb[ALB Direct] --> Internal style Blocked fill:#ffcccc style Internal fill:#ccffcc

Quick Start

For Claude Code Users

Add these environment variables to your shell:

export ANTHROPIC_BASE_URL="https://d18l8nt8fin3hz.cloudfront.net"
export ANTHROPIC_API_KEY="<your-api-key>"

Then use Claude Code normally - requests will route through the gateway.

Test API Access

# Test with curl
curl -X POST "$ANTHROPIC_BASE_URL/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $ANTHROPIC_API_KEY" \
  -d '{
    "model": "claude-haiku-4-5",
    "messages": [{"role": "user", "content": "Hello!"}],
    "max_tokens": 50
  }'

https://res.cloudinary.com/ethzero/image/upload/v1769078071/ai/stargate-llm/stargate-claude-code-test.png

View Dashboards

DashboardURL
Grafana (Metrics)https://d18l8nt8fin3hz.cloudfront.net/grafana
Langfuse (Traces)https://d18l8nt8fin3hz.cloudfront.net/langfuse/

User Management

Using the Management Script

cd scripts/

# Show help
./litellm-users.sh help

# Create a user with budget
./litellm-users.sh create-user --email [email protected] --budget 50 --duration monthly

# Create API key with model restrictions
./litellm-users.sh create-key --alias "haiku-only" --models '["claude-haiku-4-5"]' --budget 10

# List users and keys
./litellm-users.sh list-users
./litellm-users.sh list-keys

Available Models

Model NameDescriptionStatus
claude-haiku-4-5Fast, cost-effective✅ Working
claude-sonnet-4-5Balanced performance⚠️ Pending AWS approval
claude-opus-4-5Most capable⚠️ Pending AWS approval

Model-User Restrictions

flowchart TB subgraph Users admin[Admin<br/>All models, unlimited] dev[Developer<br/>Haiku + Sonnet, $50/mo] intern[Intern<br/>Haiku only, $10/mo] end subgraph Models haiku[Claude Haiku 4.5] sonnet[Claude Sonnet 4.5] opus[Claude Opus 4.5] end admin --> haiku & sonnet & opus dev --> haiku & sonnet intern --> haiku style admin fill:#ff9999 style dev fill:#99ff99 style intern fill:#9999ff

See User Management Guide for full documentation.


Security

Admin Endpoint Protection

Admin endpoints are blocked from CloudFront and only accessible via ALB direct (from VPC):

EndpointCloudFrontALB Direct
/v1/chat/completions✅ Works✅ Works
/v1/models✅ Works✅ Works
/user/*❌ 403 Forbidden✅ Works
/key/*❌ 403 Forbidden✅ Works
/model/*❌ 403 Forbidden✅ Works
/spend/*❌ 403 Forbidden✅ Works

Managing Users from VPC

# Via ALB direct (from VPC/VPN)
curl -X POST "http://kong-llm-gateway-poc-xxx.us-west-1.elb.amazonaws.com/user/new" \
  -H "Authorization: Bearer $MASTER_KEY" \
  -d '{"user_email":"[email protected]","max_budget":10}'

# Via CloudFront (blocked)
curl -X POST "https://d18l8nt8fin3hz.cloudfront.net/user/new"
# Returns: 403 Forbidden - Admin endpoints not accessible via CloudFront

Monitoring & Observability

Observability Stack (Verified ✅)

flowchart TB subgraph LiteLLM["LiteLLM Proxy"] api[API Handler] prom[Prometheus Callback] lf_cb[Langfuse Callback] end subgraph Metrics["Victoria Metrics"] scrape[Scraper] store[(TSDB)] end subgraph Tracing["Langfuse"] traces[(Traces)] generations[(Generations)] end subgraph Dashboards["Grafana"] llm_dash[LLM Usage Dashboard] infra_dash[Infrastructure Dashboard] end api --> prom & lf_cb prom -->|/metrics/| scrape lf_cb --> traces scrape --> store llm_dash --> store llm_dash -.->|correlate| traces infra_dash --> store style Tracing fill:#e6f3ff style Metrics fill:#f5f5f5

Dashboards

DashboardPurposeURLStatus
Grafana - LLM UsageToken tracking, costs, latency/grafana✅ Working
Grafana - InfrastructureECS, ALB, CloudFront, Langfuse, Victoria Metrics/grafana✅ Working
Langfuse - TracesRequest/response logging/langfuse/✅ Working
Langfuse - GenerationsLLM call analysis/langfuse/✅ Working

Grafana - Infrastructure Metrics

https://res.cloudinary.com/ethzero/image/upload/v1769078071/ai/stargate-llm/stargate-grafana-infrastructure-metrics.png

Grafana - Token Usage & Costs

https://res.cloudinary.com/ethzero/image/upload/v1769078071/ai/stargate-llm/stargate-grafana-token-metrics.png

Grafana - Architecture Flow

https://res.cloudinary.com/ethzero/image/upload/v1769078071/ai/stargate-llm/stargate-grafana-architecture-metrics.png

Grafana - PostgreSQL Metrics

https://res.cloudinary.com/ethzero/image/upload/v1769078071/ai/stargate-llm/stargate-grafana-sql-metrics.png

Langfuse - LLM Traces

https://res.cloudinary.com/ethzero/image/upload/v1769078071/ai/stargate-llm/stargate-langfuse-metrics.png

https://res.cloudinary.com/ethzero/image/upload/v1769078071/ai/stargate-llm/stargate-langfuse-trace.png

Metrics & Traces Integration

Grafana dashboards correlate Prometheus metrics with Langfuse traces:

Data SourceMetricsUse Case
Victoria Metricslitellm_proxy_total_requests_metricRequest counts per user
Victoria Metricslitellm_total_tokens_metricToken usage tracking
Victoria Metricslitellm_spend_metricCost monitoring
LangfuseTracesFull request/response logging
LangfuseGenerationsLLM call details and metadata

Key Metrics

# Total requests
sum(litellm_proxy_total_requests_metric_total)

# Total tokens by model
sum(litellm_total_tokens_metric_total) by (model)

# Total spend by user
sum(litellm_spend_metric_total) by (user)

# Latency P95
histogram_quantile(0.95, sum(rate(litellm_llm_api_latency_metric_bucket[5m])) by (le))

Cost Tracking

ModelInput ($/1M tokens)Output ($/1M tokens)
Claude Haiku 4.5$0.80$4.00
Claude Sonnet 4.5$3.00$15.00
Claude Opus 4.5$15.00$75.00

Documentation Index

Architecture & Design

DocumentDescription
C4 ArchitectureSystem context, containers, components diagrams

Observability

DocumentDescription
Langfuse IntegrationLLM tracing, call logging, and analysis

Operations

DocumentDescription
User ManagementCreate users, API keys, budgets
High Error Rate RunbookTroubleshooting 5xx errors
High Latency RunbookDiagnosing slow requests
Token Quota ExceededManaging token usage

Configuration

DocumentDescription
Claude Code ConfigSetup guide for Claude Code users
API Examplescurl commands for all endpoints
Proxy & Token MeteringHow the gateway and token tracking works

Infrastructure

DirectoryDescription
infra/terraformTerraform modules and environments
infra/grafanaGrafana dashboards and provisioning
infra/README.mdLocal development with Docker Compose
scripts/User management scripts

GitHub Actions

WorkflowDescription
ci.ymlCI pipeline: security, validate, build, deploy
terraform-plan.ymlTerraform plan on PRs
ecs-restart.ymlRestart ECS services
ecs-logs.ymlDownload container logs
ecs-status.ymlView cluster status
ecs-scale.ymlScale ECS services
ecs-exec.ymlExecute commands in containers
rds-status.ymlDatabase status and metrics

API Reference

Endpoints

EndpointMethodAccessDescription
/v1/chat/completionsPOSTPublicChat completion (OpenAI-compatible)
/v1/modelsGETPublicList available models
/health/livelinessGETPublicHealth check
/metrics/GETPublicPrometheus metrics
/user/**InternalUser management
/key/**InternalAPI key management

Request Format

{
  "model": "claude-haiku-4-5",
  "messages": [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hello!"}
  ],
  "max_tokens": 1024,
  "temperature": 0.7,
  "stream": false
}

Authentication

Include API key in Authorization header:

Authorization: Bearer <your-api-key>

Error Responses

CodeDescription
401Invalid or missing API key
403Admin endpoint blocked (use ALB)
429Rate limit or budget exceeded
500Internal server error
502Bedrock service unavailable

Development

Local Development

Run the full stack locally with Docker Compose:

cd infra
cp .env.example .env
# Edit .env with your AWS credentials

docker-compose up -d

Local services:

CI/CD Pipeline

flowchart LR subgraph PR["Pull Request"] scan1[Security Scan] tf_plan[Terraform Plan] end subgraph Main["Push to Main"] scan2[Security Scan] trivy[Trivy Scan] checkov[Checkov Scan] validate[TF Validate] changes{Grafana<br/>changed?} build[Build Grafana] deploy[Deploy POC] end scan1 --> tf_plan scan2 --> trivy --> checkov --> validate --> changes changes -->|Yes| build --> deploy changes -->|No| skip[Skip]
WorkflowDescriptionTrigger
ci.ymlSecurity scan, Terraform validate, build, deployPush to main
terraform-plan.ymlTerraform plan with PR commentPull requests

https://res.cloudinary.com/ethzero/image/upload/v1769078071/ai/stargate-llm/stargate-github-pipeline.png

https://res.cloudinary.com/ethzero/image/upload/v1769078071/ai/stargate-llm/stargate-github-actions.png

Operational Workflows (Manual)

WorkflowDescriptionInputs
ecs-restart.ymlRestart ECS servicesservice (or all)
ecs-logs.ymlDownload container logsservice, duration, filter
ecs-status.ymlView cluster status-
ecs-scale.ymlScale services (0-3 tasks)service, count
ecs-exec.ymlExecute debug commandsservice, command
rds-status.ymlDatabase metrics-

Note: Operational workflows require poc environment approval for security.

Estimated Costs

ComponentMonthly Cost
ECS Fargate (LiteLLM + Langfuse + Grafana + Victoria)~$55
RDS PostgreSQL (db.t4g.micro)~$12
NAT Instance (t3.nano)~$3
CloudFront~$1-5
Total Fixed~$70-75
Bedrock UsagePay per use

License

MIT License - See LICENSE for details.