Stargate LLM Gateway

OpenAI-compatible LLM Gateway for AWS Bedrock with user management, usage tracking, and cost monitoring. Powered by LiteLLM.
Table of Contents
Overview
flowchart LR
subgraph Clients
cc[Claude Code]
api[API Clients]
end
subgraph Gateway["LiteLLM Gateway"]
auth[API Key Auth]
budget[Budget Manager]
router[Model Router]
metrics[Prometheus Metrics]
end
subgraph Storage
pg[(PostgreSQL)]
end
subgraph Bedrock["AWS Bedrock"]
haiku[Claude Haiku 4.5]
sonnet[Claude Sonnet 4.5]
opus[Claude Opus 4.5]
end
cc --> auth
api --> auth
auth --> budget
budget --> router
router --> haiku & sonnet & opus
budget <--> pg
router --> metrics
style haiku fill:#90EE90
style sonnet fill:#FFD700
style opus fill:#FF6B6B
Features
| Feature | Status | Description |
|---|
| OpenAI-compatible API | ✅ | Drop-in replacement for Claude Code |
| User Management | ✅ | Create users with budgets and model restrictions |
| API Key Generation | ✅ | Per-user API keys with limits |
| Usage Tracking | ✅ | Token and cost tracking per user |
| Prometheus Metrics | ✅ | Full observability stack |
| Langfuse Tracing | ✅ | LLM call logging and analysis |
| Streaming Support | ✅ | SSE for real-time responses |
| Admin Endpoint Security | ✅ | Blocked from public access |
Architecture
Request Flow
sequenceDiagram
participant Client as Claude Code
participant CF as CloudFront
participant LiteLLM as LiteLLM Proxy
participant PG as PostgreSQL
participant Bedrock as AWS Bedrock
participant VM as Victoria Metrics
Client->>CF: POST /v1/chat/completions
CF->>LiteLLM: Forward request
LiteLLM->>LiteLLM: Validate API Key
LiteLLM->>PG: Check User Budget
alt Budget OK
LiteLLM->>Bedrock: InvokeModel (SigV4)
Bedrock-->>LiteLLM: Response + tokens
LiteLLM->>PG: Update spend
LiteLLM->>LiteLLM: Record metrics
LiteLLM-->>Client: 200 OK
else Budget Exceeded
LiteLLM-->>Client: 429 Budget Exceeded
end
VM->>LiteLLM: Scrape /metrics/
Deployment Architecture
flowchart TB
subgraph Internet
clients[Claude Code / API Clients]
end
subgraph AWS["AWS Cloud (us-west-1)"]
cf[CloudFront<br/>Admin endpoints blocked]
subgraph VPC["VPC 10.10.0.0/16"]
subgraph Public["Public Subnets"]
alb[Application Load Balancer]
nat[NAT Instance]
end
subgraph Private["Private Subnets"]
subgraph ECS["ECS Fargate Cluster"]
litellm[LiteLLM Proxy<br/>1 vCPU, 2GB]
langfuse[Langfuse<br/>0.5 vCPU, 1GB]
grafana[Grafana<br/>0.25 vCPU, 0.5GB]
victoria[Victoria Metrics<br/>0.25 vCPU, 0.5GB]
end
rds[(RDS PostgreSQL<br/>db.t4g.micro)]
end
end
bedrock[AWS Bedrock]
secrets[Secrets Manager]
end
clients --> cf
cf --> alb
alb --> litellm
alb --> grafana
alb --> langfuse
litellm --> bedrock
litellm <--> rds
litellm --> langfuse
langfuse <--> rds
victoria --> litellm
grafana --> victoria
litellm -.-> secrets
style cf fill:#ffcccc,stroke:#ff0000
style rds fill:#336791,color:#fff
style langfuse fill:#e6f3ff
Security Model
flowchart TB
subgraph Public["Public Access (CloudFront)"]
chat["/v1/chat/completions ✅"]
models["/v1/models ✅"]
health["/health/* ✅"]
metrics["/metrics/ ✅"]
end
subgraph Blocked["Blocked from CloudFront (403)"]
user["/user/* ❌"]
key["/key/* ❌"]
model["/model/* ❌"]
spend["/spend/* ❌"]
end
subgraph Internal["Internal Access Only (ALB)"]
user_alb["/user/* ✅"]
key_alb["/key/* ✅"]
model_alb["/model/* ✅"]
spend_alb["/spend/* ✅"]
end
cf[CloudFront] --> Public
cf -.->|403 Forbidden| Blocked
alb[ALB Direct] --> Internal
style Blocked fill:#ffcccc
style Internal fill:#ccffcc
Quick Start
For Claude Code Users
Add these environment variables to your shell: