<rss xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title>Llm - Tag - Lorenzo's Blog</title><link>https://www.k8s.it/tags/llm/</link><description>Llm - Tag - Lorenzo's Blog</description><generator>Hugo -- gohugo.io</generator><language>en</language><lastBuildDate>Sun, 24 May 2026 00:00:00 +0000</lastBuildDate><atom:link href="https://www.k8s.it/tags/llm/" rel="self" type="application/rss+xml"/><item><title>Stargate LLM Gateway</title><link>https://www.k8s.it/posts/stargate-llm-gateway/</link><pubDate>Sun, 24 May 2026 00:00:00 +0000</pubDate><author>Lorenzo Girardi</author><guid>https://www.k8s.it/posts/stargate-llm-gateway/</guid><description><![CDATA[<div class="featured-image">
                <img src="/images/herostargate.png" referrerpolicy="no-referrer">
            </div><h1 id="stargate-llm-gateway">Stargate LLM Gateway</h1>
<p>
</p>
<p>OpenAI-compatible LLM Gateway for AWS Bedrock with user management, usage tracking, and cost monitoring. Powered by LiteLLM.</p>
<h2 id="table-of-contents">Table of Contents</h2>
<ul>
<li><a href="#overview" rel="">Overview</a></li>
<li><a href="#architecture" rel="">Architecture</a></li>
<li><a href="#quick-start" rel="">Quick Start</a></li>
<li><a href="#user-management" rel="">User Management</a></li>
<li><a href="#security" rel="">Security</a></li>
<li><a href="#monitoring--observability" rel="">Monitoring &amp; Observability</a></li>
<li><a href="#documentation-index" rel="">Documentation Index</a></li>
<li><a href="#api-reference" rel="">API Reference</a></li>
</ul>
<hr>
<h2 id="overview">Overview</h2>
<div class="mermaid" id="id-8">flowchart LR
    subgraph Clients
        cc[Claude Code]
        api[API Clients]
    end

    subgraph Gateway[&#34;LiteLLM Gateway&#34;]
        auth[API Key Auth]
        budget[Budget Manager]
        router[Model Router]
        metrics[Prometheus Metrics]
    end

    subgraph Storage
        pg[(PostgreSQL)]
    end

    subgraph Bedrock[&#34;AWS Bedrock&#34;]
        haiku[Claude Haiku 4.5]
        sonnet[Claude Sonnet 4.5]
        opus[Claude Opus 4.5]
    end

    cc --&gt; auth
    api --&gt; auth
    auth --&gt; budget
    budget --&gt; router
    router --&gt; haiku &amp; sonnet &amp; opus
    budget &lt;--&gt; pg
    router --&gt; metrics

    style haiku fill:#90EE90
    style sonnet fill:#FFD700
    style opus fill:#FF6B6B</div><h3 id="features">Features</h3>
<table>
	<thead>
			<tr>
					<th>Feature</th>
					<th>Status</th>
					<th>Description</th>
			</tr>
	</thead>
	<tbody>
			<tr>
					<td>OpenAI-compatible API</td>
					<td>✅</td>
					<td>Drop-in replacement for Claude Code</td>
			</tr>
			<tr>
					<td>User Management</td>
					<td>✅</td>
					<td>Create users with budgets and model restrictions</td>
			</tr>
			<tr>
					<td>API Key Generation</td>
					<td>✅</td>
					<td>Per-user API keys with limits</td>
			</tr>
			<tr>
					<td>Usage Tracking</td>
					<td>✅</td>
					<td>Token and cost tracking per user</td>
			</tr>
			<tr>
					<td>Prometheus Metrics</td>
					<td>✅</td>
					<td>Full observability stack</td>
			</tr>
			<tr>
					<td>Langfuse Tracing</td>
					<td>✅</td>
					<td>LLM call logging and analysis</td>
			</tr>
			<tr>
					<td>Streaming Support</td>
					<td>✅</td>
					<td>SSE for real-time responses</td>
			</tr>
			<tr>
					<td>Admin Endpoint Security</td>
					<td>✅</td>
					<td>Blocked from public access</td>
			</tr>
	</tbody>
</table>
<hr>
<h2 id="architecture">Architecture</h2>
<h3 id="request-flow">Request Flow</h3>
<div class="mermaid" id="id-9">sequenceDiagram
    participant Client as Claude Code
    participant CF as CloudFront
    participant LiteLLM as LiteLLM Proxy
    participant PG as PostgreSQL
    participant Bedrock as AWS Bedrock
    participant VM as Victoria Metrics

    Client-&gt;&gt;CF: POST /v1/chat/completions
    CF-&gt;&gt;LiteLLM: Forward request

    LiteLLM-&gt;&gt;LiteLLM: Validate API Key
    LiteLLM-&gt;&gt;PG: Check User Budget

    alt Budget OK
        LiteLLM-&gt;&gt;Bedrock: InvokeModel (SigV4)
        Bedrock--&gt;&gt;LiteLLM: Response &#43; tokens
        LiteLLM-&gt;&gt;PG: Update spend
        LiteLLM-&gt;&gt;LiteLLM: Record metrics
        LiteLLM--&gt;&gt;Client: 200 OK
    else Budget Exceeded
        LiteLLM--&gt;&gt;Client: 429 Budget Exceeded
    end

    VM-&gt;&gt;LiteLLM: Scrape /metrics/</div><h3 id="deployment-architecture">Deployment Architecture</h3>
<div class="mermaid" id="id-10">flowchart TB
    subgraph Internet
        clients[Claude Code / API Clients]
    end

    subgraph AWS[&#34;AWS Cloud (us-west-1)&#34;]
        cf[CloudFront&lt;br/&gt;Admin endpoints blocked]

        subgraph VPC[&#34;VPC 10.10.0.0/16&#34;]
            subgraph Public[&#34;Public Subnets&#34;]
                alb[Application Load Balancer]
                nat[NAT Instance]
            end

            subgraph Private[&#34;Private Subnets&#34;]
                subgraph ECS[&#34;ECS Fargate Cluster&#34;]
                    litellm[LiteLLM Proxy&lt;br/&gt;1 vCPU, 2GB]
                    langfuse[Langfuse&lt;br/&gt;0.5 vCPU, 1GB]
                    grafana[Grafana&lt;br/&gt;0.25 vCPU, 0.5GB]
                    victoria[Victoria Metrics&lt;br/&gt;0.25 vCPU, 0.5GB]
                end

                rds[(RDS PostgreSQL&lt;br/&gt;db.t4g.micro)]
            end
        end

        bedrock[AWS Bedrock]
        secrets[Secrets Manager]
    end

    clients --&gt; cf
    cf --&gt; alb
    alb --&gt; litellm
    alb --&gt; grafana
    alb --&gt; langfuse
    litellm --&gt; bedrock
    litellm &lt;--&gt; rds
    litellm --&gt; langfuse
    langfuse &lt;--&gt; rds
    victoria --&gt; litellm
    grafana --&gt; victoria
    litellm -.-&gt; secrets

    style cf fill:#ffcccc,stroke:#ff0000
    style rds fill:#336791,color:#fff
    style langfuse fill:#e6f3ff</div><h3 id="security-model">Security Model</h3>
<div class="mermaid" id="id-11">flowchart TB
    subgraph Public[&#34;Public Access (CloudFront)&#34;]
        chat[&#34;/v1/chat/completions ✅&#34;]
        models[&#34;/v1/models ✅&#34;]
        health[&#34;/health/* ✅&#34;]
        metrics[&#34;/metrics/ ✅&#34;]
    end

    subgraph Blocked[&#34;Blocked from CloudFront (403)&#34;]
        user[&#34;/user/* ❌&#34;]
        key[&#34;/key/* ❌&#34;]
        model[&#34;/model/* ❌&#34;]
        spend[&#34;/spend/* ❌&#34;]
    end

    subgraph Internal[&#34;Internal Access Only (ALB)&#34;]
        user_alb[&#34;/user/* ✅&#34;]
        key_alb[&#34;/key/* ✅&#34;]
        model_alb[&#34;/model/* ✅&#34;]
        spend_alb[&#34;/spend/* ✅&#34;]
    end

    cf[CloudFront] --&gt; Public
    cf -.-&gt;|403 Forbidden| Blocked
    alb[ALB Direct] --&gt; Internal

    style Blocked fill:#ffcccc
    style Internal fill:#ccffcc</div><hr>
<h2 id="quick-start">Quick Start</h2>
<h3 id="for-claude-code-users">For Claude Code Users</h3>
<p>Add these environment variables to your shell:</p>]]></description></item></channel></rss>