Skip to main content

Command Palette

Search for a command to run...

I Handed Claude Code the Keys to a Fresh VM and Walked Away. Here's What Broke.

Lessons learned from letting an AI agent build a self-hosted inference stack and the five configuration traps it missed.

Published
10 min read
I Handed Claude Code the Keys to a Fresh VM and Walked Away. Here's What Broke.
K
I view technology through the lens of mission-critical resilience. As a Systems Engineer in public safety communications, I operate with the reality that UPTIME is a life-safety requirement (not just a KPI). This high-stakes mindset is the foundation of VertexOps. My engineering philosophy is a hybrid of professional infrastructure management and the "boots on the ground" pragmatism I’ve gained through CERT. I’m a firm believer in DIGITAL SOVEREIGNTY: knowing exactly where data lives, how it’s secured, and how it survives when things get chaotic. CURRENT TECHNICAL FOCUS: >LOCAL AI & SOVEREIGNTY: Deploying models like Gemma and Qwen on my Dell T3610 to prove that high-performance AI doesn't have to trade off privacy. >RESILIENT INFRASTRUCTURE: Managing the transition from legacy virtualization to hardened, self-hosted stacks using Nextcloud and Proxmox. >INFOSEC & RF: Hardening systems against modern threats while maintaining my roots in Amateur Radio (KO6JKE). There is a natural crossover between the "tinkerer" soul of a Ham operator and the discipline of a Systems Engineer. I’m here to document the builds, share the troubleshooting logs, and help other professionals bridge the gap between "it works" and "it’s resilient."

Not "here's what failed catastrophically." The stack actually came up. But there were five things that needed fixing before it was really usable, and at least two of them are traps I'd probly have fallen into manually anyway.

This is the full breakdown: what I was building, how I prompted it, what came out, and where I had to go back in with a screwdriver.


What I Was Actually Trying to Build

A self-hosted AI inference stack on an Ubuntu Server 24.04 VM running inside VMware ESXi 8. CPU-only for now (this is a dev environment, not the homelab box with the RTX 3060). The goal was something I could use to test local models, play with an LLM gateway, and have a real web interface to interact with everything.

Four services, one Docker bridge network:

Service Role
Ollama Local LLM runtime
LiteLLM OpenAI-compatible proxy/gateway
Open WebUI Web chat frontend
PostgreSQL 16 Backend database for LiteLLM

VM specs were intentionally modest:

Component Spec
OS Ubuntu Server 24.04
RAM 32 GB
vCPUs 8
Disk 150 GB
GPU None (CPU-only)
Platform VMware ESXi 8

The architecture I wanted: Open WebUI talks to LiteLLM as its OpenAI-compatible backend. LiteLLM routes to local Ollama models or Ollama Cloud depending on what's selected. Ollama itself has no host port binding — completely internal to the Docker bridge network. Only ports 3000 and 4000 exposed to the outside, UFW locking everything else.


The Prompt Was the Whole Game

Before running anything, I spent real time writing the Claude Code prompt. This ended up being the part that mattered most.

I covered directory structure, secret generation strategy, the full Docker Compose configuraiton, healthcheck logic, UFW rules, and a required credential summary at the end. A few things I was careful about:

Secrets handling. All passwords and API keys generated with openssl rand, stored in a .env file immediately chmod 600'd, never embedded in any config file. LiteLLM's config.yaml uses os.environ/ references throughout. The .env gets passed to containers via Docker's env_file: directive, which injects values as environment variables rather than mounting the file anywhere web-accessible.

Non-interactive execution. I added an explicit execution mode block:

You have full sudo access. Execute every step autonomously without pausing to ask for confirmation, approval, or clarification. Do not prompt the user at any point during the run. Treat every step as pre-approved.

Without this, Claude Code gates on nearly every tool use. File writes, sudo commands, service restarts — all of it. That one paragraph is the difference between a fully autonomous run and sitting there clicking "approve" for twenty minutes.

Network isolation. All four containers on a single internal Docker bridge network called ai-net. They talk to each other by service name. Ollama intentionally has no host port binding. The only way to reach it from outside the Docker network is through LiteLLM.


What the Autonomous Run Actually Did

I ran the prompt and left it alone. The sequence, without any input from me:

  1. Installed Docker Engine and the Compose plugin from the official Docker apt repo

  2. Created /opt/ai-stack/ with the full directory structure

  3. Generated all secrets with openssl rand

  4. Wrote .env, immediately chmod 600'd it, added it to .gitignore

  5. Wrote litellm/config.yaml using os.environ/ references (no hardcoded secrets)

  6. Created prometheus/prometheus.yml as a required placeholder file (if this doesn't exist as a file, Docker creates it as a directory and the compose fails)

  7. Wrote docker-compose.yml with all four services, healthchecks, and dependency ordering

  8. Configured UFW: SSH rule first, then enabled with --force

  9. Ran docker compose pull then docker compose up -d

  10. Verified each service

  11. Printed a full credential summary

One unattended session. No prompts, no approvals, no intervention.


The Architecture That Came Out

Internet
    │
    ├── :3000 ──► Open WebUI
    └── :4000 ──► LiteLLM Proxy
                      │
              ┌───────┴────────┐
              │                │
           Ollama          PostgreSQL
        (internal only)   (internal only)

All containers on Docker bridge network: ai-net

Open WebUI doesn't talk to Ollama directly. It connects to LiteLLM as its OpenAI-compatible backend. LiteLLM routes to local Ollama models or Ollama Cloud depending on which model is selected. Everything goes through the gateway, so you get key management and spend tracking even on local model requests.


Where It Landed: Mostly Solid, Five Fixes Required

The core infrastructure was genuinely solid. Containers came up in the correct order, health checks resolved, the dependency chain worked as intended (Open WebUI waits for LiteLLM healthy, LiteLLM waits for Postgres healthy), UFW was configured correctly, and the credential summary printed cleanly at the end.

Verified externally: API keys and passwords not exposed, .env not web-accessible, secrets injected as environment variables rather than mounted files, nothing sensitive showing up in HTTP responses from either service. That part worked exactly as designed.

The five issues that needed fixing weren't fundamental architecture problems. They were configuration details, three of which are actually useful things to know regardless of how you built the stack.


Fix 1: LiteLLM Healthcheck Endpoint and Tooling

The original compose file used /health with curl for the LiteLLM healthcheck. Two problems with that:

  • /health on LiteLLM requires an API key. Without one, it returns 401 Unauthorized, which Docker interprets as a failed healthcheck.

  • curl isn't installed in the LiteLLM Docker image.

The fix: switch the endpoint to /health/liveliness (no auth required) and replace the curl command with a Python one-liner using urllib:

healthcheck:
  test: ["CMD-SHELL", "python3 -c \"import urllib.request; urllib.request.urlopen('http://localhost:4000/health/liveliness')\""]
  interval: 15s
  timeout: 10s
  retries: 5
  start_period: 30s

This one had the biggest impact. Open WebUI's depends_on condition is service_healthy for LiteLLM. Broken healthcheck means Open WebUI never starts. Fix the healthcheck and everything downstream resolves.


Fix 2: ENABLE_OLLAMA_API Was Set to False

The initial config had ENABLE_OLLAMA_API: "false" on the Open WebUI service. Local Ollama models pulled into the container weren't showing up in the model selector at all, only LiteLLM-proxied models.

Setting it to "true" gives Open WebUI a direct connection to Ollama for listing and running local models, while still using LiteLLM as the gateway for API-keyed requests. Both paths active simultaneously.


Fix 3: DATABASE_URL Leaking Into Open WebUI

This was the trickiest one, and honestly the most useful thing in this whole post.

Using env_file: .env on a service in Docker Compose passes all variables in that file to the container. Every one. If multiple services share the same .env, anything in there goes to all of them.

DATABASE_URL (LiteLLM's PostgreSQL connection string) was in the .env file. Open WebUI picked it up, tried to connect to LiteLLM's Postgres instance, and crashed immediately on missing tables — it was looking for calendar_event and others that exist in Open WebUI's own schema, not LiteLLM's. The web UI stopped loading entirely.

The fix: remove DATABASE_URL from .env entirely. Set it inline on the LiteLLM service only:

litellm:
  environment:
    DATABASE_URL: postgresql://litellm:${POSTGRES_PASSWORD}@postgres:5432/litellm

Open WebUI then correctly fell back to its default SQLite database. Any admin account created during the misconfiguration was in the wrong database and was gone after the fix, so a fresh account had to be created. Minor, but worth knowing ahead of time.

This is actually a Docker pattern worth keeping in mind well outside of this specific stack. Shared .env files and service-specific secrets don't mix cleanly. Anything that's genuinely owned by one service should be set inline on that service, not in the shared file.


Fix 4: Ollama Cloud api_base Wrong

When I added Ollama Cloud models to LiteLLM after the initial setup, the api_base I used was wrong. LiteLLM's OpenAI provider appends /chat/completions to whatever api_base you give it. The path needs to already include /v1:

What I used What LiteLLM constructed Result
https://ollama.com https://ollama.com/chat/completions 404
https://ollama.com/api https://ollama.com/api/v1/chat/completions 404
https://ollama.com/v1 https://ollama.com/v1/chat/completions 200

The correct config.yaml entry:

- model_name: my-cloud-model
  litellm_params:
    model: openai/gpt-oss-120b
    api_base: https://ollama.com/v1
    api_key: os.environ/OLLAMA_API_KEY

Fix 5: Ollama Is in Docker — Its CLI Is Too

This one's on me. After setup I ran ollama pull llama3.2 on the host and got a command not found error. The Ollama binary isn't on the host because Ollama is running inside a container.

All Ollama operations in a Dockerized install go through docker exec:

# Pull a model
docker exec -it ollama ollama pull llama3.2

# List models
docker exec ollama ollama list

No functional impact. Just operator awareness, and the kind of thing you forget in the moment.


Adding Models to the Stack

Once you've got a model pulled, getting it into LiteLLM and visible in Open WebUI is three steps:

Step 1 — Pull the model into the Ollama container:

sudo docker exec -it ollama ollama pull llama3.2

Step 2 — Add it to /opt/ai-stack/litellm/config.yaml:

model_list:
  - model_name: llama3.2
    litellm_params:
      model: ollama/llama3.2
      api_base: http://ollama:11434

Step 3 — Restart LiteLLM:

sudo docker compose -f /opt/ai-stack/docker-compose.yml restart litellm

It'll appear in Open WebUI's model selector. Local Ollama models also show up directly in Open WebUI via the Ollama connection (bypassing LiteLLM) because ENABLE_OLLAMA_API is enabled.


What I Actually Took Away From This

Prompt quality determines output quality, full stop. The reason this went as smoothly as it did is that the prompt was thorough. Directory structure, secret generation, network topology, healthcheck logic, UFW ordering — all of it specified explicitly. Gaps in the prompt become gaps in the output. That's not a Claude Code problem, that's just how agentic automation works.

The non-interactive directive is not optional. Without it you're not running autonomous infrastructure work, you're running a slightly faster manual process.

The env_file scope issue is the most interesting one to me, because it's not obvious until it bites you and it has nothing to do with AI-generated configs specifically. It's a Docker behavior. Worth knowing.

90% correct on first run for a full four-service stack with network isolation, secrets management, and firewall configuration is genuinely good. The five issues were all configuration details. Nothing had to be rebuilt. The services came up, the network worked, the secrets were handled correctly, and the firewall was sane.

For a CPU-only dev environment on a clean VM with no prior configuration, I'll take that result.


The Final Stack

Service Access Purpose
Open WebUI :3000 Chat interface
LiteLLM Admin UI :4000/ui Key management, spend tracking
LiteLLM API :4000 OpenAI-compatible gateway
Ollama Internal only Local model runtime
PostgreSQL Internal only LiteLLM database

Firewall: ports 22, 3000, and 4000 open. Everything else blocked inbound.

Secrets: generated with openssl rand, stored in chmod 600 .env, injected as environment variables. Not web-accessible.


Stack: Ollama + LiteLLM + Open WebUI + PostgreSQL | Docker Compose | Ubuntu Server 24.04 | VMware ESXi 8 | CPU-only

More from this blog