Building a Private AI Stack: From Mini PC to Autonomous Agents

45 minute read

Published: May 01, 2026

For the past several years I have been thinking carefully about what it means to run AI infrastructure that I actually own, control, and understand from the ground up. The rapid proliferation of frontier model APIs, agentic coding tools, and open-weight model releases in 2025-2026 finally made this tractable at a price and complexity point that a single person could manage. This post documents the architecture I settled on: a self-hosted, Docker-based stack running on a mini PC, unified by a single OpenAI-compatible model gateway, and surfaced through a collection of local inference servers, agentic CLI tools, autonomous agent frameworks, open-source Cowork alternatives, and a task-bounded command harness built around structured queues. My goals were privacy, sovereignty, reproducibility, and the ability to swap components without rebuilding everything from scratch.

Motivation: Why Self-Host?

The short answer is control. When I work on grant-funded AI research, on student data in the context of my courses, or on institutional planning, I want to know exactly where inference is happening and what data is leaving my environment. The longer answer is pedagogical: I cannot credibly teach AI literacy, AI ethics, and responsible deployment if I have not made serious, hands-on architectural decisions myself. Running your own stack is humbling in the right ways.

There is also a strategic argument. As the open-source AI agent framework ecosystem matured through early 2026, it became clear that the layered architecture of these systems, separating the model layer, the orchestration layer, the tool-access layer, and the user-interface layer, was stabilizing into recognizable patterns. A well-designed self-hosted stack can plug into any of these layers without being locked to a single vendor. That flexibility is worth the setup cost.

Hardware: The Mini PC

The physical foundation of the stack is a mini PC running Linux Mint, which gives me a clean Debian-lineage environment with full access to the upstream Docker Engine repository and no virtualization layer between the container workloads and the host kernel. Everything runs directly on the host, which simplifies networking, volume permissions, and service lifecycle management considerably compared to hypervisor-based setups.

Installing Docker Engine

Linux Mint ships an older Docker build from the distribution mirror, so I install from the upstream Docker repository instead. The setup sequence is:

# Remove any distribution-packaged versions
sudo apt-get remove docker docker-engine docker.io containerd runc
 
sudo apt-get update
sudo apt-get install -y ca-certificates curl gnupg lsb-release
 
# Add Docker's official GPG key
sudo mkdir -m 0755 -p /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | \
  sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
 
# Point the apt source at the Ubuntu codename that Mint is based on
echo \
  "deb [arch=$(dpkg --print-architecture) \
  signed-by=/etc/apt/keyrings/docker.gpg] \
  https://download.docker.com/linux/ubuntu \
  $(. /etc/os-release && echo $UBUNTU_CODENAME) stable" | \
  sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
 
sudo apt-get update
sudo apt-get install -y docker-ce docker-ce-cli containerd.io \
  docker-buildx-plugin docker-compose-plugin
 
sudo docker run hello-world
sudo usermod -aG docker $USER
newgrp docker

The UBUNTU_CODENAME variable in the source line is the key detail for Mint: it resolves to the underlying Ubuntu release name rather than the Mint release name, which is what the Docker repository actually indexes.

Installing Ollama

Ollama runs as a native Linux service, not inside a container, which keeps model-weight I/O off the Docker networking path and avoids bind-mount overhead for the weight files:

curl -fsSL https://ollama.com/install.sh | sh
systemctl status ollama

Ollama listens on http://localhost:11434 by default and is reachable from within containers via --add-host=host.docker.internal:host-gateway, which every container in the stack declares. This flag is required on Linux Docker Engine; unlike Docker Desktop, the Linux engine does not inject host.docker.internal automatically.

Workspace Layout

All agent state lives under a single root in the home directory:

$HOME/agents/
├── workspace/              # Shared project files (all TUI tools mount here)
├── skills/core/            # Read-only skills package (mounted :ro)
├── litellm/                # LiteLLM config and Compose files
├── kilocode/home/          # KiloCode identity
├── opencode/home/          # opencode.ai identity
├── pi/home/                # pi.dev identity, models.json, settings.json
├── hermes/home/            # Hermes agent identity
├── gnhf/                   # Good Night Have Fun harness
├── open-design/            # Open Design app data and pi identity
├── commercial/             # Claude Code, Codex, Gemini CLI, Copilot, CCR, bash
├── mastra/data/            # Mastra SQLite database
├── a0/data/                # Agent Zero persistent user data
├── archon/data/            # Archon workflow state
├── ollama/data/            # Downloaded model weights (~/.ollama)
├── localai/                # LocalAI models/, backends/, config/
├── googleworkspacecli/     # Google Workspace CLI (gcloud/)
├── googleagentscli/        # Google ADK CLI
└── openwebui/data/         # Open WebUI persistent data

Creating the full tree is a one-liner drawn from the deploy script:

BASE="$HOME/agents"
mkdir -p \
  "$BASE/workspace" "$BASE/skills/core" "$BASE/litellm" \
  "$BASE/kilocode/home" "$BASE/opencode/home" "$BASE/pi/home" \
  "$BASE/hermes/home" "$BASE/ollama/data" \
  "$BASE/localai/models" "$BASE/localai/backends" "$BASE/localai/config" \
  "$BASE/mastra/data" "$BASE/openwebui/data" \
  "$BASE/commercial/claude/workspace" "$BASE/commercial/claude/home" \
  "$BASE/commercial/claude/npm" "$BASE/commercial/claude/config" \
  "$BASE/commercial/claude/cache" \
  "$BASE/googleworkspacecli/gcloud" \
  "$BASE/googleagentscli/data" "$BASE/googleagentscli/config" \
  "$BASE/googleagentscli/cache/uv" "$BASE/googleagentscli/cache/npm" \
  "$BASE/googleagentscli/evals" "$BASE/googleagentscli/logs" \
  "$BASE/archon/data" "$BASE/a0/data" \
  "$BASE/gnhf/home" "$BASE/gnhf/npm" "$BASE/gnhf/config" "$BASE/gnhf/cache" \
  "$BASE/open-design/data" "$BASE/open-design/pi"
echo "Directory tree created under $BASE"

The Core Design Principle: A Unified Model Gateway

The single most important architectural decision I made was to route all LLM inference through a single OpenAI-compatible endpoint rather than having each tool reach out to Ollama, Anthropic, or OpenRouter independently. I use LiteLLM for this. Every service in the stack sends its requests to http://localhost:4000/v1 with the bearer token sk-litellm-local. LiteLLM translates those requests to whatever backend is appropriate, whether a local Ollama model, a LocalAI GGUF endpoint, or an OpenRouter free-tier model, according to a YAML routing configuration.

model_list:
  - model_name: llama3
    litellm_params:
      model: ollama/llama3
      api_base: http://host.docker.internal:11434
 
  - model_name: qwen2.5-3b
    litellm_params:
      model: ollama/qwen2.5:3b
      api_base: http://host.docker.internal:11434
 
  - model_name: hermes3
    litellm_params:
      model: ollama/hermes3:8b
      api_base: http://host.docker.internal:11434
 
  - model_name: openrouter/auto
    litellm_params:
      model: openrouter/auto
      api_key: ${OPENROUTER_API_KEY}
      api_base: https://openrouter.ai/api/v1

The practical consequence is that changing a model, adding a provider, or adjusting routing requires editing one YAML file and restarting one service. No other container needs to know that anything changed. This is the same composability principle I teach in software engineering: minimize coupling, maximize cohesion.

Connecting a commercial CLI tool like Claude Code to the local stack requires only two environment variable exports:

export ANTHROPIC_BASE_URL=http://localhost:4000
export ANTHROPIC_API_KEY=sk-litellm-local

LiteLLM Deployment

LiteLLM runs as a Docker Compose service. The docker-compose.yml pulls the pre-built image; everything else is bind-mounted configuration.

# litellm/docker-compose.yml
version: "3.9"
services:
  litellm:
    image: ghcr.io/berriai/litellm:main-latest
    container_name: litellm-${USER}
    restart: unless-stopped
    ports:
      - "4000:4000"
    volumes:
      - ./litellm_config.yaml:/app/config.yaml:ro
    env_file:
      - .env
    command: ["--config", "/app/config.yaml", "--port", "4000", "--num_workers", "2"]
    extra_hosts:
      - "host.docker.internal:host-gateway"

The .env file holds the master key and any cloud provider keys:

# litellm/.env
LITELLM_MASTER_KEY=sk-litellm-local
OPENROUTER_API_KEY=YOUR_OPENROUTER_API_KEY

Build and run scripts are minimal wrappers around Compose:

# litellm/build.sh — pull image, start, verify endpoint
cd "$HOME/agents/litellm"
docker compose pull
docker compose up -d
sleep 20
curl -s http://localhost:4000/models \
  -H "Authorization: Bearer sk-litellm-local" | python3 -m json.tool | head -20
 
# litellm/run.sh — start after reboot
cd "$HOME/agents/litellm" && docker compose up -d
 
# litellm/attach.sh — tail live logs
cd "$HOME/agents/litellm" && docker compose logs --tail=30 -f litellm-${USER}

Restart after a config change: cd $HOME/agents/litellm && docker compose down && docker compose up -d.

The Full Stack: Services and Ports

Service	Port	Purpose
LiteLLM	4000	Unified model gateway (OpenAI-compatible)
Ollama	11434	Local LLM inference (native systemd service)
LocalAI	8080	GGUF model inference (OpenAI-compatible)
Open WebUI	3000	Browser-based LLM frontend with MCP tool calling
Mastra	4111	TypeScript AI agent server (API + Studio UI)
Agent Zero	8081	Autonomous hierarchical agent with web UI
Archon	3090	Workflow-driven agent runner
Open Design	5173	Collaborative design canvas with embedded pi agent
Portainer	9000	Docker management UI

Local Inference: Ollama and LocalAI

The stack runs two local inference backends with distinct tradeoffs, and LiteLLM routes between them transparently based on model alias.

Ollama is the primary inference backend. Its systemd service model keeps it available before Docker is fully up, its model management CLI is clean, and its HTTP API is stable. The models I maintain are selected for RAM footprint first: phi4-mini, smollm2, gemma4:e2b, qwen2.5:1.5b, qwen2.5:3b, llama3, and hermes3:8b.

# ollama/run.sh
docker run -d \
  --name ollama-${USER} \
  --restart no \
  --add-host=host.docker.internal:host-gateway \
  -p 11434:11434 \
  -v "$HOME/agents/ollama/data:/root/.ollama" \
  ollama/ollama:latest
 
# Pull models after starting
for model in phi4-mini smollm2 "gemma4:e2b" "qwen2.5:1.5b" "qwen2.5:3b" llama3 "hermes3:8b"; do
  docker exec ollama-${USER} ollama pull "$model"
done

LocalAI (github.com/mudler/LocalAI) is the secondary inference backend, running on port 8080 behind an OpenAI-compatible API surface. It supports llama.cpp for text generation, whisper.cpp for speech transcription, and stable diffusion for image generation, all behind the same endpoint. LocalAI is organized around three bind-mounted directories: models/ holds GGUF weight files, backends/ holds compiled backend binaries, and config/ holds per-model YAML configuration. Note that the host directory is named config/ while the container mount path is /configuration; this asymmetry is intentional and must be preserved.

# localai/run.sh
docker run -d \
  --name localai-${USER} \
  --restart no \
  --add-host=host.docker.internal:host-gateway \
  -p 8080:8080 \
  -v "$HOME/agents/localai/models:/models" \
  -v "$HOME/agents/localai/backends:/backends" \
  -v "$HOME/agents/localai/config:/configuration" \
  localai/localai:latest

A per-model YAML config controls backend selection and context length:

# localai/config/phi4-mini.yaml
name: phi4-mini
backend: llama
parameters:
  model: phi-4-mini-instruct.Q4_K_M.gguf
  context_size: 8192
  threads: 8

Browser Frontend: Open WebUI

Open WebUI is the browser-based interface for direct LLM interaction, running on port 3000. It connects to Ollama directly and enumerates available models automatically. Its native MCP tool-calling support (version 0.4 and later) intercepts tool-call responses, dispatches them to registered MCP servers, and injects results back into the conversation.

# openwebui/run.sh
docker run -d -p 3000:8080 \
  --add-host=host.docker.internal:host-gateway \
  -v "$HOME/agents/openwebui/data:/app/backend/data" \
  --name open-webui-${USER} \
  ghcr.io/open-webui/open-webui:main

After launch, connect to LiteLLM via Admin Settings → Connections → OpenAI: set the URL to http://host.docker.internal:4000/v1 and the API key to sk-litellm-local. Models confirmed to work reliably with Open WebUI tool calling include hermes3:8b, llama3.1:8b, qwen2.5:7b, qwen2.5:14b, and mistral-nemo:12b.

Agentic CLI Tools: A Comparative Survey

By April 2026, a mature set of agentic coding CLI tools has emerged with distinct architectural philosophies, and I run all of them through the unified LiteLLM gateway. Each tool runs inside a dedicated Docker container with an identity bind mount, a shared workspace mount, and an optional skills mount. The sections below show the Dockerfile, build script, and run script for each.

Claude Code (Anthropic, Node.js) is the most fully featured in terms of built-in subagent support, MCP integration, and permission gate granularity. It uses CLAUDE.md files for project context and .claude/agents/ Markdown files for custom subagent definitions.

OpenAI Codex CLI (Rust) supports native multi-provider configuration through a TOML config file. Custom providers are defined as named sections:

model = "llama3.3:70b"
model_provider = "openwebui"
 
[model_providers.openwebui]
name = "Open WebUI"
base_url = "http://localhost:3000/openai"
env_key = "OPENWEBUI_API_KEY"

Gemini CLI (Google, Node.js) uses GEMINI.md files for project context and a three-tier discovery hierarchy for skills. Routing it through an OpenAI-compatible endpoint requires the open-gemini-cli fork, which injects an adapter layer that translates Gemini’s internal message format.

OpenCode (opencode.ai, Go) is the most flexible in terms of provider support, relying on the @ai-sdk/openai-compatible adapter to connect to any OpenAI-compatible backend:

{
  "provider": {
    "litellm": {
      "npm": "@ai-sdk/openai-compatible",
      "options": { "baseURL": "http://localhost:4000/v1" },
      "models": { "llama3": {}, "qwen2.5-3b": {}, "hermes3": {} }
    }
  }
}

Commercial Tools Deployment (Claude Code, Codex, Gemini CLI, Copilot, CCR, bash)

Claude Code, Codex, Gemini CLI, GitHub Copilot, the Claude Code Router, and a plain bash shell all share a single Docker image. The tool to launch is selected at runtime as an argument to run.sh. The bash option is a deliberately included escape hatch: it drops into an interactive shell inside the commercial-ai container with all identity and workspace volumes mounted, but without launching any AI tool. This is useful for inspecting the container environment, running scripts manually, debugging mount layouts, or staging files before invoking an AI tool in a separate session.

# commercial/Dockerfile
FROM node:22-bookworm
 
RUN apt-get update && apt-get install -y --no-install-recommends \
    git curl ca-certificates ripgrep less nano vim \
    && rm -rf /var/lib/apt/lists/*
 
RUN npm install -g \
    @anthropic-ai/claude-code \
    @openai/codex \
    @google/gemini-cli \
    @github/copilot \
    @musistudio/claude-code-router
 
RUN mkdir -p /root/.claude-code-router
WORKDIR /workspace
CMD ["/bin/bash"]

The build.sh creates per-tool identity directories for all six options, including bash, so the directory layout is consistent regardless of which tool is invoked:

#!/usr/bin/env bash
set -euo pipefail
 
docker build -t commercial-ai:latest .
 
BASE_DIR="/home/bill/agents/commercial"
mkdir -p "${BASE_DIR}/workspace"
 
for tool in claude codex gemini copilot ccr bash; do
  mkdir -p \
    "${BASE_DIR}/${tool}/home" \
    "${BASE_DIR}/${tool}/npm" \
    "${BASE_DIR}/${tool}/config" \
    "${BASE_DIR}/${tool}/cache"
done

The run.sh dispatches on the tool name to set the API variable and CLI command. The bash case sets no API variable and uses /bin/bash as the command, so no credential prompt is issued and the container simply provides an interactive shell:

#!/usr/bin/env bash
set -euo pipefail
 
IMAGE="commercial-ai:latest"
BASE_DIR="/home/bill/agents/commercial"
CCR_CONFIG="${BASE_DIR}/ccr/config.json"
 
if [[ $# -ne 1 ]]; then
  echo "[run.sh] Usage: $0 {bash|claude|codex|gemini|copilot|ccr}"
  exit 1
fi
 
TOOL="$1"
EXTRA_ARGS=()
ENV_ARGS=()
 
case "$TOOL" in
  bash)
    API_VAR=""
    CLI_CMD="/bin/bash"
    ;;
  claude)
    API_VAR="ANTHROPIC_API_KEY"
    CLI_CMD="claude"
    ;;
  ccr)
    API_VAR="ANTHROPIC_API_KEY"
    CLI_CMD="ccr code"
    if [[ ! -f "${CCR_CONFIG}" ]]; then
      echo "[run.sh] CCR config not found at ${CCR_CONFIG}"
      echo "[run.sh] Create it before running with the ccr option."
      exit 1
    fi
    EXTRA_ARGS+=(-v "${CCR_CONFIG}:/home/agent/.claude-code-router/config.json:ro")
    ;;
  codex)
    API_VAR="OPENAI_API_KEY"
    CLI_CMD="codex"
    ;;
  gemini)
    API_VAR="GEMINI_API_KEY"
    CLI_CMD="gemini"
    ;;
  copilot)
    API_VAR="GITHUB_TOKEN"
    CLI_CMD="copilot"
    ;;
  *)
    echo "[run.sh] Invalid tool: $TOOL"
    echo "[run.sh] Valid options: bash | claude | codex | gemini | copilot | ccr"
    exit 1
    ;;
esac
 
TOOL_DIR="${BASE_DIR}/${TOOL}"
mkdir -p \
  "${BASE_DIR}/workspace" \
  "${TOOL_DIR}/home" \
  "${TOOL_DIR}/npm" \
  "${TOOL_DIR}/config" \
  "${TOOL_DIR}/cache"
 
if [[ -n "${API_VAR}" ]]; then
  if [[ -z "${!API_VAR:-}" ]]; then
    echo "[run.sh] ${API_VAR} is not set. Please enter it:"
    read -rsp ">>> " USER_KEY
    echo ""
    export "${API_VAR}=${USER_KEY}"
  fi
  ENV_ARGS+=("-e" "${API_VAR}=${!API_VAR}")
fi
 
docker run --rm -it \
  --name "commercial-${TOOL}-bill" \
  -e HOME="/home/agent" \
  "${ENV_ARGS[@]}" \
  -v "${BASE_DIR}/workspace:/workspace" \
  -v "${TOOL_DIR}/home:/home/agent" \
  -v "${TOOL_DIR}/npm:/home/agent/.npm" \
  -v "${TOOL_DIR}/config:/home/agent/.config" \
  -v "${TOOL_DIR}/cache:/home/agent/.cache" \
  "${EXTRA_ARGS[@]}" \
  -w /workspace \
  "${IMAGE}" ${CLI_CMD}

Invoking ./run.sh bash thus drops into the container with the full commercial identity environment present but no tool running, which makes it straightforward to inspect installed package versions, verify that volume mounts resolved correctly, or stage configuration files before running a tool. Because the bash identity directory (commercial/bash/home) is separate from all other tool identity directories, any modifications made in a shell session cannot bleed into a Claude Code or Codex session.

The ccr (Claude Code Router) variant additionally mounts its routing config read-only:

// commercial/ccr/config.json — routing table
{
  "Router": {
    "default":          "ollama,llama3:latest",
    "background":       "ollama,qwen2.5:1.5b",
    "think":            "ollama,gemma4:e2b",
    "longContext":      "openrouter,google/gemini-2.5-pro-exp-03-25:free",
    "longContextThreshold": 60000,
    "webSearch":        "openrouter,google/gemini-2.5-pro-exp-03-25:online"
  }
}

Switching the active model within a running Claude Code session is a single slash command: /model ollama,llama3:latest.

OpenCode Deployment

# opencode/Dockerfile
FROM node:20-bookworm-slim
 
RUN apt-get update \
    && apt-get install -y --no-install-recommends \
        curl ca-certificates git bash findutils \
    && rm -rf /var/lib/apt/lists/*
 
RUN curl -fsSL https://opencode.ai/install | bash \
    && mkdir -p /opt/opencode/bin \
    && cp "$(find /root -type f -name opencode | head -n 1)" /opt/opencode/bin/opencode \
    && chmod 755 /opt/opencode/bin/opencode
 
ENV PATH="/opt/opencode/bin:${PATH}"
ENV HOME=/home/opencode
VOLUME ["/workspace"]
WORKDIR /workspace
ENTRYPOINT ["/opt/opencode/bin/opencode"]

# opencode/build.sh
docker build -t opencode:local "$HOME/agents/opencode"
mkdir -p "$HOME/agents/opencode/home"
 
# opencode/run.sh
docker run --restart no -it \
  --name opencode-${USER} \
  --add-host=host.docker.internal:host-gateway \
  -v "$HOME/agents/opencode/home:/home/opencode" \
  -v "$HOME/agents/workspace:/workspace" \
  -v "$HOME/agents/skills/core:/app/skills/core:ro" \
  opencode:local
 
# opencode/attach.sh
docker start -ai opencode-${USER}

KiloCode (VS Code extension, Node.js) is the VS Code-native member of the survey. It brings the agentic loop into the editor rather than the terminal, with direct access to the VS Code language server for diagnostics, symbol navigation, and refactoring. Connecting it to LiteLLM is a one-field change in its settings JSON.

KiloCode Deployment

# kilocode/Dockerfile
FROM debian:bookworm-slim
 
RUN apt-get update && \
    apt-get install -y --no-install-recommends \
        ca-certificates git bash wget curl && \
    rm -rf /var/lib/apt/lists/*
 
ARG TARGETARCH=amd64
 
RUN ARCH=${TARGETARCH} && \
    if [ "$ARCH" = "arm64" ]; then ARCH="arm64"; else ARCH="x64"; fi && \
    wget -qO /tmp/kilo.tar.gz \
        "https://github.com/Kilo-Org/kilocode/releases/latest/download/kilo-linux-${ARCH}.tar.gz" && \
    tar -xzf /tmp/kilo.tar.gz -C /usr/local/bin && \
    chmod +x /usr/local/bin/kilo && \
    rm /tmp/kilo.tar.gz
 
VOLUME ["/workspace"]
WORKDIR /workspace
ENTRYPOINT ["kilo"]

# kilocode/build.sh
docker build -t kilocode:local "$HOME/agents/kilocode"
mkdir -p "$HOME/agents/kilocode/home"
chown -R $(id -u):$(id -g) "$HOME/agents/kilocode/home"
chmod -R u+rwX "$HOME/agents/kilocode/home"
 
# kilocode/run.sh
docker run --restart no -it \
  --user $(id -u):$(id -g) \
  --add-host=host.docker.internal:host-gateway \
  -e HOME=/home/kilo \
  -e XDG_CONFIG_HOME=/home/kilo/.config \
  -e XDG_DATA_HOME=/home/kilo/.local/share \
  -e XDG_CACHE_HOME=/home/kilo/.cache \
  --name kilocode-${USER} \
  -e TERM=xterm-256color \
  -v "$HOME/agents/kilocode/home:/home/kilo" \
  -v "$HOME/agents/workspace:/workspace" \
  -v "$HOME/agents/skills/core:/app/skills/core:ro" \
  kilocode:local
 
# kilocode/attach.sh
docker start -ai kilocode-${USER}

pi (pi.dev, Node.js) takes a deliberately minimal stance, omitting built-in MCP, plan mode, and permission gates in favor of a package-based extensibility model. It supports OpenRouter directly through a provider block in models.json, and NVIDIA NIM endpoints are equally accessible through the same mechanism. I reach for pi primarily for rapid exploratory work precisely because it does not impose an opinionated workflow.

Pi Deployment

# pi/Dockerfile
FROM node:22-bookworm-slim
 
RUN apt-get update \
 && apt-get install -y --no-install-recommends \
    bash ca-certificates curl git openssh-client \
 && rm -rf /var/lib/apt/lists/*
 
RUN npm install -g @mariozechner/pi-coding-agent
 
ENV HOME=/home/pi-agent
VOLUME ["/workspace"]
WORKDIR /workspace
ENTRYPOINT ["pi"]

# pi/build.sh
docker build -t pi:local "$HOME/agents/pi"
mkdir -p "$HOME/agents/pi/home/.pi/agent"
 
# Write default models.json pointing at Ollama
cat > "$HOME/agents/pi/home/.pi/agent/models.json" << 'EOF'
{
  "providers": {
    "ollama": {
      "baseUrl": "http://host.docker.internal:11434/v1",
      "api": "openai-completions",
      "apiKey": "ollama",
      "models": [
        {
          "id": "qwen2.5:7b",
          "name": "Qwen 2.5 7B (Local)",
          "contextWindow": 32768,
          "maxTokens": 8192,
          "cost": { "input": 0, "output": 0 }
        }
      ]
    },
    "openrouter": {
      "baseUrl": "https://openrouter.ai/api/v1",
      "apiKey": "${OPENROUTER_API_KEY}",
      "models": [
        { "id": "google/gemini-2.5-pro-exp-03-25:free", "contextWindow": 1000000 },
        { "id": "meta-llama/llama-4-maverick:free",      "contextWindow": 128000  }
      ]
    }
  }
}
EOF
 
# pi/run.sh
docker run --restart no -it \
  --name pi-${USER} \
  --add-host=host.docker.internal:host-gateway \
  -e TERM=xterm-256color \
  -e OLLAMA_HOST="http://host.docker.internal:11434" \
  -e OPENROUTER_API_KEY="${OPENROUTER_API_KEY:-YOUR_OPENROUTER_API_KEY}" \
  -v "$HOME/agents/pi/home:/home/pi-agent" \
  -v "$HOME/agents/workspace:/workspace" \
  -v "$HOME/agents/skills/core:/app/skills/core:ro" \
  pi:local
 
# pi/attach.sh
docker start -ai pi-${USER}

Hermes is a named agent identity maintained around Nous Research’s Hermes 3 model family. The hermes/home/ directory holds a configuration and prompt library tuned for Hermes 3’s specific instruction format and function-calling conventions. Because Hermes 3 handles structured output and tool-use with uncommon consistency, I route all tool-calling-heavy workloads to this identity.

Hermes Agent Deployment

# hermes/build.sh — probe first, then run
# Confirm the image's runtime user and home directory before committing a mount path:
docker pull nousresearch/hermes-agent:latest
docker run --rm --entrypoint sh nousresearch/hermes-agent:latest \
  -c 'id && echo HOME=$HOME'
 
mkdir -p "$HOME/agents/hermes/home"
 
# hermes/run.sh
# Always launch with -it — running detached causes immediate exit
docker run --restart no -it \
  --name hermes-${USER} \
  --add-host=host.docker.internal:host-gateway \
  -e TERM=xterm-256color \
  -v "$HOME/agents/hermes/home:/home/hermes/.hermes" \
  -v "$HOME/agents/workspace:/workspace" \
  nousresearch/hermes-agent:latest
 
# hermes/attach.sh
docker start -ai hermes-${USER}

GNHF (Good Night Have Fun) is the task-bounded harness I use when I want a single-purpose agent that executes a defined workflow, reports results, and stops. The name is a ham radio sign-off, which fits its character: polite, brief, and does exactly what it was asked to do. Unlike Claude Code or pi, which are interactive and session-oriented, GNHF takes a task description and a workspace path as inputs, executes against the LiteLLM gateway, writes its outputs to the workspace, and exits.

GNHF Deployment

GNHF requires a Dockerfile placed at $HOME/agents/gnhf/Dockerfile before build.sh can execute, because the gnhf binary distribution mechanism is external to this stack. A representative starting point:

# gnhf/Dockerfile — adapt to your gnhf binary distribution
FROM node:22-bookworm
 
RUN apt-get update && apt-get install -y --no-install-recommends \
    git curl ca-certificates ripgrep less \
    && rm -rf /var/lib/apt/lists/*
 
# Install the agent CLIs that gnhf wraps
RUN npm install -g \
    @anthropic-ai/claude-code \
    @openai/codex \
    @github/copilot
 
# Install gnhf — adapt to your distribution method:
# RUN npm install -g gnhf
# or: COPY gnhf /usr/local/bin/gnhf && chmod +x /usr/local/bin/gnhf
 
WORKDIR /workspace
CMD ["/bin/bash"]

# gnhf/build.sh
if [[ ! -f "$HOME/agents/gnhf/Dockerfile" ]]; then
  echo "ERROR: place Dockerfile in $HOME/agents/gnhf/ first"; exit 1
fi
docker build -t gnhf:latest "$HOME/agents/gnhf"
 
# gnhf/run.sh — key arguments shown; full script handles key resolution per agent
# Usage: ./run.sh --agent <codex|claude|copilot> --repo <path> \
#                 [--max-iterations N] [--max-tokens N] "task description"
docker run --rm -it \
  --name "gnhf-${AGENT}-${USER}" \
  -e ANTHROPIC_API_KEY="${ANTHROPIC_API_KEY:-YOUR_ANTHROPIC_API_KEY}" \
  -e HOME="/home/agent" \
  -v "$HOME/agents/workspace:/workspace" \
  -v "$HOME/agents/gnhf/home:/home/agent" \
  -v "$HOME/agents/gnhf/npm:/home/agent/.npm" \
  -v "$HOME/agents/gnhf/config:/home/agent/.config" \
  -v "$HOME/agents/gnhf/cache:/home/agent/.cache" \
  gnhf:latest \
  bash -lc '
    cd "$1" &&
    git config --global --add safe.directory "$1" &&
    shift && exec "$@"
  ' _ "${REPO_PATH}" gnhf --agent "${AGENT}" "${PROMPT}"

All six tools use project context files (CLAUDE.md, AGENTS.md, GEMINI.md) to provide persistent project instructions without consuming prompt tokens on every turn, and all six are converging on MCP as the standard for tool integration.

The Google Ecosystem: Agents CLI and Workspace CLI

Two Google-specific tools occupy their own tier in the stack, with independent container identities and a shared philosophy of treating Google’s API surface as a set of agent-accessible tools.

Google Agents CLI is the command-line interface for Google’s Agent Development Kit (ADK), a Python framework for building multi-agent systems that run on Google’s infrastructure and interact with Gemini models. The uv cache indicates a Python-heavy dependency footprint, the evals/ directory holds evaluation datasets and result logs, and the container runs as a non-root user to match the bind-mounted volume permissions.

Google Agents CLI Deployment

# googleagentscli/Dockerfile
FROM python:3.12-slim
 
RUN apt-get update && apt-get install -y --no-install-recommends \
        ca-certificates curl git gnupg lsb-release unzip wget \
        jq vim less procps build-essential \
    && rm -rf /var/lib/apt/lists/*
 
RUN curl -fsSL https://deb.nodesource.com/setup_lts.x | bash - \
    && apt-get install -y --no-install-recommends nodejs \
    && rm -rf /var/lib/apt/lists/*
 
COPY --from=ghcr.io/astral-sh/uv:latest /uv /uvx /usr/local/bin/
 
# Google Cloud SDK
RUN echo "deb [signed-by=/usr/share/keyrings/cloud.google.gpg] \
        https://packages.cloud.google.com/apt cloud-sdk main" \
        | tee /etc/apt/sources.list.d/google-cloud-sdk.list \
    && curl -fsSL https://packages.cloud.google.com/apt/doc/apt-key.gpg \
        | gpg --dearmor -o /usr/share/keyrings/cloud.google.gpg \
    && apt-get update && apt-get install -y --no-install-recommends google-cloud-cli \
    && rm -rf /var/lib/apt/lists/*
 
RUN groupadd -g 1000 agent && useradd -m -u 1000 -g agent -s /bin/bash agent
 
ENV UV_TOOL_DIR=/usr/local/uv-tools
ENV PATH="${UV_TOOL_DIR}/bin:${PATH}"
 
RUN uv tool install google-agents-cli && chown -R agent:agent "${UV_TOOL_DIR}"
 
RUN mkdir -p /workspace /home/agent/.config/agents-cli /home/agent/.config/gcloud \
        /home/agent/.cache/uv /home/agent/.cache/npm /home/agent/evals /home/agent/logs \
    && chown -R agent:agent /workspace /home/agent/.config /home/agent/.cache \
        /home/agent/evals /home/agent/logs
 
USER agent
WORKDIR /workspace
CMD ["sleep", "infinity"]

# googleagentscli/build.sh
docker build --progress=plain -t google-agents-cli:local \
  "$HOME/agents/googleagentscli"
 
# googleagentscli/run.sh — starts detached; exec in with attach.sh
GCLOUD_MOUNT=()
[[ -d "$HOME/.config/gcloud" ]] && \
  GCLOUD_MOUNT=(-v "$HOME/.config/gcloud:/home/agent/.config/gcloud:ro")
 
docker run \
  --detach \
  --name "googleagentscli-${USER}" \
  --restart unless-stopped \
  --add-host=host.docker.internal:host-gateway \
  -v "$HOME/agents/googleagentscli/data:/workspace" \
  -v "$HOME/agents/googleagentscli/config:/home/agent/.config/agents-cli" \
  -v "$HOME/agents/googleagentscli/cache/uv:/home/agent/.cache/uv" \
  -v "$HOME/agents/googleagentscli/cache/npm:/home/agent/.cache/npm" \
  -v "$HOME/agents/googleagentscli/evals:/home/agent/evals" \
  -v "$HOME/agents/googleagentscli/logs:/home/agent/logs" \
  "${GCLOUD_MOUNT[@]}" \
  -e GOOGLE_API_KEY="${GOOGLE_API_KEY:-YOUR_GOOGLE_API_KEY}" \
  --workdir /workspace \
  google-agents-cli:local
 
# googleagentscli/attach.sh
docker exec -it --user agent --workdir /workspace \
  "googleagentscli-${USER}" bash --login

Post-launch authentication: Option A (AI Studio key, no Cloud billing) is docker exec -it googleagentscli-${USER} agents-cli login. Option B (Google Cloud ADC for production workloads) requires running gcloud auth application-default login on the host machine; the run.sh mounts ~/.config/gcloud read-only into the container automatically.

Google Workspace CLI is a containerized gcloud environment configured with the scopes necessary to drive the Google Workspace APIs programmatically: Gmail, Drive, Calendar, Sheets, and Docs. Authenticate once inside the container; the bind-mounted credentials directory persists across container recreations.

Google Workspace CLI Deployment

# googleworkspacecli/run.sh
docker run --restart no -it \
  --name gworkspace-${USER} \
  --add-host=host.docker.internal:host-gateway \
  -v "$HOME/agents/googleworkspacecli/gcloud:/root/.config/gcloud" \
  -v "$HOME/agents/workspace:/workspace" \
  google/cloud-sdk:slim \
  bash
 
# Inside container (first time only):
# gcloud auth login
# gcloud config set project YOUR_PROJECT_ID
 
# googleworkspacecli/attach.sh
docker start -ai gworkspace-${USER}

The authentication separation between the Workspace CLI container and the rest of the stack is intentional: the container that holds the Google credentials never has access to the project workspace or to any other agent’s identity directory. Data flows through the workspace volume only.

Agent Frameworks: Agent Zero, Archon, and Mastra

The stack runs three agent frameworks that occupy distinct positions on the spectrum from fully autonomous to fully programmable.

Agent Zero

Agent Zero is the most autonomous framework in the stack, designed around the premise that the agent should be able to self-improve its own instructions and tools over the course of a session. It runs as a web UI on port 8081 and exposes a chat interface backed by a hierarchical agent system where the primary agent can spawn specialized subagents. The persistent state in a0/data/ includes the agent’s memory bank, its accumulated tool library, and its evolving system prompt, all of which carry forward across container restarts.

# a0/build.sh
docker pull agent0ai/agent-zero
mkdir -p "$HOME/agents/a0/data"
 
# a0/run.sh
docker run -d \
  --name "a0-${USER}" \
  --restart no \
  --add-host=host.docker.internal:host-gateway \
  -p 8081:80 \
  -v "$HOME/agents/a0/data:/a0/usr" \
  agent0ai/agent-zero
 
# a0/attach.sh — tail live logs (Ctrl+C safe; container keeps running)
docker logs --follow --timestamps "a0-${USER}"

Open http://localhost:8081 in a browser after the container starts.

Archon

Archon occupies a meta-level in the stack: it is an agent framework whose purpose is to help build other agent frameworks. Its Streamlit-based UI presents a development environment where I describe the agent I want to build in natural language, and Archon generates the scaffolding, tool definitions, system prompt, and evaluation harness for that agent. Archon-generated agents are configured at generation time to use the LiteLLM endpoint, so they enter the stack already wired to the unified gateway without any post-generation modification.

# archon/build.sh
docker pull ghcr.io/coleam00/archon:latest
mkdir -p "$HOME/agents/archon/data/workflows"
 
# archon/data/config.yaml — default model routing
cat > "$HOME/agents/archon/data/config.yaml" << 'EOF'
assistant: pi
assistants:
  pi:
    provider: openrouter
    model: openrouter/openrouter/free
EOF
 
# archon/run.sh — ephemeral (--rm); exits after task
[[ -z "${OPENROUTER_API_KEY:-}" ]] && \
  { echo "ERROR: OPENROUTER_API_KEY not set"; exit 1; }
 
docker run --rm \
  --name "archon-${USER}" \
  --user "$(id -u):$(id -g)" \
  -v "$HOME/agents/workspace:/home/bun/.archon/workspaces" \
  -v "$HOME/agents/archon/data:/home/bun/.archon" \
  -p 3090:3090 \
  -e OPENROUTER_API_KEY="${OPENROUTER_API_KEY}" \
  -e DEFAULT_AI_ASSISTANT=pi \
  ghcr.io/coleam00/archon:latest workflow list

Mastra

Mastra is a TypeScript-based AI agent framework that runs as a Docker Compose service exposing a REST API and a Studio UI on port 4111. It uses LibSQL for persistent conversation history, meaning that agent memory survives container restarts. Mastra occupies the programmable end of the spectrum: rather than autonomous self-direction, it provides a typed API for defining agents, tools, workflows, and memory retrievers in TypeScript.

The Mastra image is a multi-stage build. Critically, it handles the instrumentation.mjs file conditionally, since its presence varies across Mastra version upgrades.

# mastra/Dockerfile (multi-stage)
FROM node:22-alpine AS builder
WORKDIR /app
RUN apk add --no-cache gcompat
COPY package*.json ./
RUN npm install
COPY tsconfig*.json ./
COPY src ./src
RUN npx mastra build
 
FROM node:22-alpine AS runner
WORKDIR /app
RUN apk add --no-cache gcompat wget
RUN addgroup -g 1001 -S nodejs && adduser -S mastra -u 1001
COPY --from=builder --chown=mastra:nodejs /app/.mastra/output ./.mastra/output
COPY --from=builder --chown=mastra:nodejs /app/node_modules ./node_modules
COPY --from=builder --chown=mastra:nodejs /app/package.json ./package.json
RUN mkdir -p /app/data && chown mastra:nodejs /app/data
USER mastra
ENV PORT=4111
ENV NODE_ENV=production
ENV DATABASE_URL="file:/app/data/mastra.db"
EXPOSE 4111
HEALTHCHECK --interval=30s --timeout=10s --start-period=20s --retries=3 \
    CMD wget -qO- http://localhost:4111/api > /dev/null || exit 1
# Conditional instrumentation: works across Mastra versions
CMD ["sh", "-c", "if [ -f .mastra/output/instrumentation.mjs ]; then \
  node --import=./.mastra/output/instrumentation.mjs .mastra/output/index.mjs; \
  else node .mastra/output/index.mjs; fi"]

The agent definition is deliberately minimal:

// src/mastra/agents/assistant.ts
import { Agent } from "@mastra/core/agent";
 
export const assistant = new Agent({
  name: "assistant",
  instructions: "You are a helpful, concise, and accurate assistant.",
  model: "openrouter/meta-llama/llama-3.1-8b-instruct:free",
  // Other free-tier options:
  // openrouter/mistralai/mistral-7b-instruct:free
  // openrouter/google/gemma-3-12b-it:free
});

# mastra/docker-compose.yml
services:
  mastra:
    build:
      context: .
      dockerfile: Dockerfile
    container_name: mastra-${USER}
    ports:
      - "4111:4111"
    environment:
      OPENROUTER_API_KEY: ${OPENROUTER_API_KEY}
      NODE_ENV: production
      DATABASE_URL: "file:/app/data/mastra.db"
    volumes:
      - /home/${USER}/agents/mastra/data:/app/data
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "wget", "-qO-", "http://localhost:4111/api"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 20s

# mastra/build.sh — self-contained; creates src files if absent, prompts for key
cd "$HOME/agents/mastra"
docker compose up --build -d
 
# mastra/run.sh — start after reboot without rebuild
cd "$HOME/agents/mastra" && docker compose up -d
 
# mastra/attach.sh — tail live logs
docker logs --follow --timestamps "mastra-${USER}"

The agent server exposes a standard REST endpoint:

curl -X POST http://localhost:4111/api/agents/assistant/generate \
  -H "Content-Type: application/json" \
  -d '{"messages": [{"role": "user", "content": "Summarize this document."}]}'

Open Design: Collaborative Canvas with Embedded Agent

Open Design is a web-based collaborative design canvas with an embedded pi coding agent, running on port 5173. It is the visual layer of the stack, useful for design work where the agent can read and modify the canvas state directly. The pi identity directory is separate from the main pi tool identity, so Open Design’s agent configuration does not interfere with standalone pi sessions.

Open Design Deployment

# open-design/Dockerfile
FROM node:24-bookworm
 
ARG OPEN_DESIGN_REPO=https://github.com/nexu-io/open-design.git
ARG OPEN_DESIGN_REF=main
 
ENV APP_DIR=/opt/open-design
ENV PNPM_HOME=/root/.local/share/pnpm
ENV PATH=/root/.local/share/pnpm:/usr/local/bin:/usr/local/sbin:/usr/sbin:/usr/bin:/sbin:/bin
ENV PORT=5173
ENV HOST=0.0.0.0
ENV OD_HOST=0.0.0.0
ENV OD_ALLOWED_DEV_ORIGINS=127.0.0.1,localhost
 
RUN apt-get update && apt-get install -y --no-install-recommends \
    git ca-certificates curl bash python3 \
    && rm -rf /var/lib/apt/lists/*
 
RUN corepack enable
RUN npm install -g @mariozechner/pi-coding-agent
 
RUN git clone --branch "${OPEN_DESIGN_REF}" --depth 1 \
    "${OPEN_DESIGN_REPO}" "${APP_DIR}"
 
WORKDIR ${APP_DIR}
 
# Patch next.config.ts to accept OD_ALLOWED_DEV_ORIGINS from environment
RUN python3 - << 'PY'
from pathlib import Path
p = Path("apps/web/next.config.ts")
s = p.read_text()
old = "allowedDevOrigins: ['127.0.0.1'],"
new = """allowedDevOrigins: (
    process.env.OD_ALLOWED_DEV_ORIGINS
      ? process.env.OD_ALLOWED_DEV_ORIGINS.split(',').map((s) => s.trim()).filter(Boolean)
      : ['127.0.0.1']
  ),"""
if old not in s:
    raise SystemExit("Could not find allowedDevOrigins line")
p.write_text(s.replace(old, new))
PY
 
RUN corepack pnpm --version && pnpm install
 
EXPOSE 5173
CMD ["pnpm", "tools-dev", "run", "web", "--web-port", "5173"]

# open-design/build.sh
docker build -t open-design-pi "$HOME/agents/open-design"
mkdir -p "$HOME/agents/open-design/data"
mkdir -p "$HOME/agents/open-design/pi"
 
# First-time pi setup (run once to authenticate)
docker run --rm -it \
  -v "$HOME/agents/open-design/pi:/root/.pi" \
  open-design-pi \
  pi
# Inside pi session: /login
 
# open-design/run.sh
# OD_ALLOWED_DEV_ORIGINS must match the IP the browser uses to reach the container
docker run --rm -it \
  --name open-design-${USER} \
  -e "OD_ALLOWED_DEV_ORIGINS=YOUR_HOST_IP" \
  -p 5173:5173 \
  -v "$HOME/agents/open-design/data:/opt/open-design/.od" \
  -v "$HOME/agents/open-design/pi:/root/.pi" \
  open-design-pi

Open the canvas at http://YOUR_HOST_IP:5173. OD_ALLOWED_DEV_ORIGINS must match the IP address the browser uses to reach the container. To detect it automatically, substitute $(hostname -I | awk '{print $1}') for the hardcoded value.

Docker Volume Architecture: Identity, Workspace, Skills

One of the more carefully considered design decisions in this stack is the separation of Docker bind mounts into four independent tiers, which I call identity, workspace, skills, and tool data. This separation means that swapping a user identity, adding a skills package, or destroying an experimental container affects only its own tier; the other three are untouched.

Tier	Host Path Pattern	Container Mount	Shared?	Destroyable?
Identity	`$HOME/agents/{tool}/home`	varies per tool	No	Backup first
Workspace	`$HOME/agents/workspace`	`/workspace`	Yes	No
Skills	`$HOME/agents/skills/{name}`	`/app/skills/{name}`	Yes	Yes
Tool Data	`$HOME/agents/{tool}/data`	varies	No	Snapshot first

The identity hot-swap pattern is simple enough to describe in three commands: stop the container, remove it (volumes are untouched), and rerun with a different home/ path. The workspace and skills mount points are identical in both invocations. This makes it straightforward to work on the same project files under different API key contexts or with different tool configurations.

Permission repair across containers with differing internal UID/GID values is handled by a disposable Alpine container:

docker run --rm \
  -v "$HOME/agents/some-tool/home:/mnt/target" \
  alpine \
  chown -R 1000:1000 /mnt/target

Container-Isolated Tool Invocation

One of the more practically useful habits I have developed with this stack is running agentic CLI tools inside dedicated Docker containers rather than installing them to my host user environment. The motivation is threefold: environment isolation, workspace scope control, and plugin sandboxing.

Workspace scope control is where the bind-mount architecture pays its most direct dividend. Rather than giving a tool access to the entire home filesystem, I mount only the specific project directories I want it to operate on:

docker run --rm -it \
  --add-host=host.docker.internal:host-gateway \
  -e ANTHROPIC_API_KEY="${ANTHROPIC_API_KEY:-YOUR_ANTHROPIC_API_KEY}" \
  -v "$HOME/agents/commercial/claude/home:/home/agent" \
  -v "$HOME/projects/project-alpha:/workspace/project-alpha" \
  -v "$HOME/projects/project-beta:/workspace/project-beta" \
  -w /workspace/project-alpha \
  commercial-ai:latest claude

From inside the container, the agent sees exactly two project directories and nothing else. For work involving student data or grant-sensitive materials, this mount-scoping discipline is not optional; it is the architectural enforcement of the data minimization principle.

Plugin sandboxing makes it practical to evaluate new tools without risk. I can install an untrusted npm package, register a new MCP server, or try an experimental integration inside an ephemeral container with a scratch identity directory, observe its behavior against a scoped workspace mount, and discard the container entirely if I decide against it. The two-stage pattern, scratch evaluation followed by deliberate promotion to the production identity directory, is something the tiered bind-mount architecture makes nearly effortless.

The Extended Ecosystem: Containerized Evaluation of AI Tools

The stack described above is not a closed list. One of the structural advantages of the containerized, mount-scoped architecture is that any new AI tool can be evaluated, and subsequently adopted or discarded, without touching host state, leaking credentials, or requiring destructive cleanup. The evaluation pattern is consistent: build a throw-away image that installs the candidate tool and any dependencies it requires; run a container with a scratch identity directory and a scoped workspace mount containing only the project data relevant to the evaluation task; observe behavior against a free or low-cost model routed through LiteLLM; and either promote the configuration to a named identity directory or docker rmi the image and move on. What follows is a survey of tools I use or actively monitor, each of which fits neatly into this workflow.

Fabric

Fabric (Go) is an AI augmentation framework built around the concept of patterns, which are markdown-formatted system prompt templates stored in ~/.config/fabric/patterns/. Rather than a conversational agent, Fabric is a UNIX-pipeline-oriented tool: it reads from stdin or a file, applies a named pattern as the system prompt, and writes the model’s response to stdout. This makes it composable with every other shell tool by design. Because Fabric speaks to any OpenAI-compatible endpoint, it integrates with the LiteLLM gateway through a single environment variable, OPENAI_BASE_URL=http://localhost:4000/v1, and is well-suited to free-model operation for batch summarization, extraction, and classification tasks. The container footprint is minimal: the Go binary, the patterns directory bind-mounted from ~/.config/fabric/patterns, and a workspace volume.

AnythingLLM

AnythingLLM is a full-stack application that provides document ingestion, vector storage, and retrieval-augmented generation through a browser-based interface, all self-hostable and all pointing at whatever OpenAI-compatible endpoint you configure. Unlike Open WebUI, which is primarily a chat interface to models you have already pulled, AnythingLLM is organized around workspaces that each maintain their own document corpus and retrieval context. This makes it the most natural comparison point for evaluating RAG pipeline quality, including chunking strategies and retrieval configurations, against the same local models without committing to a production indexing infrastructure. It runs as a single Docker container and connects to LiteLLM without modification.

AutoGPT

AutoGPT is one of the original autonomous agent frameworks, now significantly matured into a platform with a visual workflow builder and a marketplace of pre-built agents. Its architectural evolution mirrors the broader field: the early single-agent loop has been replaced by a multi-agent orchestration model where specialized agents collaborate on subtasks. AutoGPT’s containerized deployment is well-documented, and its use of a PostgreSQL backend for persistent agent state means that evaluation sessions survive container restarts. I evaluate AutoGPT primarily for its workflow builder, which provides a visual alternative to writing agent orchestration code by hand, and because its agent marketplace offers a useful inventory of community-developed task patterns.

CLI-Anything

CLI-Anything is a natural-language shell interface that translates plain-English task descriptions into shell command sequences, explains the commands it generates before executing them, and allows interactive refinement. The evaluation pattern is particularly straightforward: mount a scratch workspace, describe a file manipulation or build task in natural language, and assess the fidelity of the generated commands against the intent. Because CLI-Anything operates at the shell command level rather than the source code level, it is usable with substantially smaller models than coding-focused tools, which makes it a good candidate for free-tier or small local model evaluation.

Google Antigravity

Google Antigravity is an experimental framework for rapid agent prototyping that provides a higher-level abstraction layer over the Google Agent Development Kit described above. It is oriented toward fast iteration on multi-agent system designs, with an emphasis on making architectural experiments cheap to run and discard, which maps naturally onto the containerized evaluation philosophy of this stack. I use it alongside the Google Agents CLI container, sharing the same workspace volume but maintaining a separate identity directory so that Antigravity’s experimental state does not contaminate the production ADK configuration.

T3 Code

T3 Code is a code generation tool that applies Theo Browne’s T3 stack architectural preferences (TypeScript, tRPC, Tailwind, Prisma) to AI-assisted scaffolding. Its opinionated output is both a strength and a constraint: the generated code is immediately coherent within the T3 ecosystem but requires deliberate adaptation outside it. I evaluate it in a container against a workspace containing a greenfield TypeScript project, routed through LiteLLM, and find it most useful as a rapid scaffolding baseline rather than a continuous coding companion.

Paperclip

Paperclip is a document-aware coding assistant that maintains a live index of a project’s files and surfaces relevant context into the model’s prompt automatically as the conversation evolves. The document-indexing architecture distinguishes it from tools that rely on the user to provide context explicitly: Paperclip’s retrieval layer operates continuously in the background, which makes it well-suited to exploration of unfamiliar codebases. Containerized evaluation against a read-only mount of a target codebase is a clean pattern: the index lives in the scratch identity directory, the codebase is mounted read-only, and the entire evaluation state is discardable.

Open-Source Cowork Alternatives in the Ecosystem

The tools already described in dedicated sections below, including OpenWork, OpenCoworkAI, and Multica, represent the most directly Cowork-comparable options in the ecosystem, but several adjacent tools occupy related positions.

Kuse Cowork implements the agent runtime in Rust with Docker-based sandboxing built into the design, which makes its security boundary properties particularly legible from the perspective of this stack’s threat model.

The Containerization Argument

The tooling landscape I have described, spanning fabric, AnythingLLM, AutoGPT, CLI-Anything, Antigravity, T3 Code, Paperclip, and the Cowork-adjacent frameworks, shares a property that makes the containerized evaluation pattern especially productive: nearly all of them support connection to a popular API-compatible backend (OpenAI, Anthropic, or a generic OpenAI-compatible endpoint), and nearly all of them can be pointed at free-tier models on OpenRouter or at local Ollama models with no modification beyond a base URL. This means that a feasibility evaluation of any tool in the list, assessing whether its interaction model, output quality, and integration characteristics are worth the effort of a production deployment, can be conducted at near-zero cost: no paid API usage, no host-level installation, no persistent state that requires cleanup. The directory structure is created by build.sh, the container runs against a scoped workspace mount, the evaluation task executes against a free model, and the result determines whether to invest in a full identity directory configuration or to docker rmi and move on. This evaluation-first discipline is, I would argue, the appropriate epistemological stance toward a tool ecosystem that is evolving faster than any individual practitioner can track.

Open-Source Cowork Alternatives

Anthropic’s Claude Cowork launch in January 2026 triggered a vigorous open-source response. I track several of the resulting projects.

OpenWork is the most actively developed, functioning as a control surface for agentic workflows with hot-reloadable skills, session management, and SSE event stream subscriptions. It is ejectable to OpenCode, which provides a meaningful portability guarantee.

OpenWork Deployment

# openwork/Dockerfile
FROM node:22-bookworm-slim
 
ARG OPENWORK_ORCHESTRATOR_VERSION=latest
 
RUN apt-get update \
 && apt-get install -y --no-install-recommends \
    ca-certificates curl git tar unzip \
 && rm -rf /var/lib/apt/lists/*
 
RUN npm install -g "openwork-orchestrator@${OPENWORK_ORCHESTRATOR_VERSION}"
 
ENV OPENWORK_DATA_DIR=/data/openwork-orchestrator
ENV OPENWORK_SIDECAR_DIR=/data/sidecars
ENV OPENWORK_WORKSPACE=/workspace
 
EXPOSE 8787
VOLUME ["/workspace", "/data"]
 
CMD ["openwork", "serve", "--workspace", "/workspace", "--remote-access", \
     "--openwork-port", "8787", "--opencode-host", "127.0.0.1", \
     "--opencode-port", "4096", "--connect-host", "127.0.0.1", \
     "--cors", "*", "--approval", "manual", "--no-opencode-router"]

# openwork/build.sh
docker build -t openwork:local "$HOME/agents/openwork"
mkdir -p "$HOME/agents/openwork/workspace" "$HOME/agents/openwork/data"
 
# openwork/run.sh
docker run -it \
  --restart no \
  --add-host=host.docker.internal:host-gateway \
  -p 8787:8787 \
  -v "$HOME/agents/openwork/workspace:/workspace" \
  -v "$HOME/agents/openwork/data:/data" \
  -e OPENWORK_TOKEN=dev-token \
  -e OPENWORK_HOST_TOKEN=dev-host-token \
  --name openwork-${USER} \
  openwork:local
 
# openwork/attach.sh
docker start -ai openwork-${USER}

Accomplish takes a “BYO-AI” stance, functioning as the orchestration layer (hands and eyes) while allowing model selection, supporting OpenAI, Anthropic, Google, and Ollama backends without lock-in. Kuse Cowork implements the agent runtime in Rust with Docker-based sandboxing. OpenCoworkAI explicitly commits to VM/bwrap isolation and checkpoint-rollback capability.

Multica occupies a different conceptual position as a managed agents platform rather than a desktop agent. It assigns issues to AI agents as you would assign them to human teammates, and it implements skill compounding: when an agent completes a task successfully, the solution is saved as a reusable skill that future tasks can leverage. This maps interestingly onto organizational learning theory, though a thoughtful critique in the project’s issue tracker notes that the human-management metaphor may be insufficient for genuinely autonomous AI orchestration at scale. I find this a productive tension worth thinking about seriously.

MCP Servers: Extending the Stack with Custom Tools

The Model Context Protocol (MCP), introduced by Anthropic in late 2024, defines how AI assistants communicate with external tools through four primitive types: tools (callable functions), resources (readable data streams), prompts (reusable templates), and sampling (delegated inference requests). Running MCP servers in Docker and connecting them to the local stack over a shared external network is straightforward.

docker network create mcp-shared

With this network in place, any container that declares mcp-shared as an external network can reach an MCP server at its container DNS name, such as http://mcp-bibliography:8000/mcp, without host port exposure. For backends that use OpenAI function-calling format rather than MCP’s JSON-RPC protocol, a thin Flask adapter service translates between the two on startup:

def mcp_post(method: str, params: dict, req_id: int = 1) -> dict:
    payload = json.dumps({
        "jsonrpc": "2.0", "id": req_id,
        "method": method, "params": params
    }).encode()
    req = urllib.request.Request(
        MCP_URL, data=payload,
        headers={"Content-Type": "application/json"}, method="POST"
    )
    with urllib.request.urlopen(req, timeout=10) as r:
        return json.loads(r.read())

OpenRouter: A Cloud Model Interface

OpenRouter serves as the cloud model interface for tasks I route away from local inference. It exposes multiple model providers through an OpenAI-compatible API surface, so the same client code that talks to the local LiteLLM gateway can talk to OpenRouter without modification. Model identifiers follow a provider-qualified format:

anthropic/claude-3-opus
openai/gpt-4o
mistralai/mixtral-8x7b
meta-llama/llama-3-70b-instruct

The free-tier model list changes, but as of early 2026 includes Gemini 2.5 Pro, DeepSeek Chat v3.5, and LLaMA 4 Maverick. I use OpenRouter as a fallback in the Mastra agent server, as the primary cloud provider for pi, and as the escalation path from GNHF when a batch task exceeds local model capability. All API key management is handled through environment variables; no key is ever written into a container image.

Reproducibility: The One-Shot Deploy Script

The entire stack, including all Dockerfiles, helper scripts, LiteLLM configuration, Mastra project files, and the directory tree, is generated by a single bash script called deploy-agents.sh. Running this script on a new machine, after providing the necessary API keys, produces a fully functional environment covering all the services described above with no manual steps (except placing a Dockerfile in gnhf/ for the task-bounded harness).

The script follows a fixed sequence: create the directory tree; write Dockerfiles for each custom image; write LiteLLM, LocalAI, and CCR configs; write Mastra project files; write GNHF task modules; write Agent Zero and Archon startup scripts; build custom images; clone and build OpenClaude; pull pre-built images; and finally start all daemon containers. This design means the script itself is the documentation, and any configuration drift between machines is detected by diffing the generated files against a known-good reference.

Publishing a custom Docker image to GitHub Container Registry follows the same minimal GitHub Actions pattern:

- uses: docker/login-action@v3
  with:
    registry: ghcr.io
    username: $
    password: $
 
- uses: docker/build-push-action@v6
  with:
    context: .
    push: true
    tags: $

After the workflow runs, making the resulting package public in the GitHub package settings allows anyone to docker pull ghcr.io/yourusername/yourrepo:main without further configuration.

Model Selection Notes

A few observations on local model selection from operational experience. For the think-heavy routing slot I use gemma4:e2b; for lightweight background tasks such as file summarization and classification I use qwen2.5:1.5b; for tool-calling workflows in Open WebUI and via the Hermes agent identity I use hermes3:8b; for general interactive sessions I use llama3. The selection criteria are principally RAM footprint and whether the model’s function-calling format is well-supported by the tool in question.

Share on

Bluesky Facebook LinkedIn X (formerly Twitter)

BetterWebUI: A Faculty-Friendly Agentic Front End for OpenWebUI

11 minute read

Published: May 15, 2026

Most large language model interfaces are designed for developers or for a general consumer audience. Faculty who want to use an AI assistant to help with grading, research, or course preparation either accept the limitations of a consumer chat interface or invest significant time learning to run and configure a developer-grade setup. BetterWebUI is an attempt to close that gap. It is a local Python/FastAPI server with a pure-HTML front end that connects to an existing OpenWebUI instance and layers on the features that make an agentic assistant genuinely useful in a higher-education context: workspaces, skills, MCP server management, CLI shortcuts, math rendering, and a suite of integrations with sibling agentic services.

AutoGUI: A Vendor-Neutral Desktop Automation Agent for LLMs

9 minute read

Published: May 15, 2026

Most LLM agents can read files, call APIs, and run shell commands, but they have no reliable way to operate a graphical desktop. They cannot click a button in a running application, verify that a dialog appeared, fill a form field, or observe what is currently on screen. AutoGUI is a research prototype that fills that gap. It connects any OpenAI-compatible LLM — including models served locally through OpenWebUI or directly through Ollama — to a full suite of OS-level desktop controls via a ReAct-style agentic loop.

OSScreenObserver: Giving AI Agents Eyes and Hands on Your Desktop

15 minute read

Published: May 11, 2026

Most AI agents, whether a large language model assistant running locally or a cloud-hosted agentic framework, have no reliable way to see or interact with the desktop applications running on the machine they are supposed to be helping with. They can read files, call APIs, and run shell commands, but they cannot observe that a dialog box appeared, that a form field is waiting for input, or that an application is in a specific state. OSScreenObserver is a prototype that changes that. It exposes the operating system’s UI accessibility tree, textual descriptions from multiple sources, and ASCII spatial sketches of the current screen layout through two simultaneous interfaces: a browser-based web inspector for humans and an MCP sees are always consistent.

A Private AI Knowledge Base: Obsidian, GitHub Sync, and Cross-Platform AI Context

38 minute read

Published: May 02, 2026

For the past year I have been building a knowledge management system with a specific design constraint in mind: every AI system I work with, whether a cloud-hosted assistant, a local agentic coding tool, or an automated GitHub Action, should be able to read the same authoritative description of who I am, what I am working on, and how I want to interact. More importantly, those systems should be able to write back into the knowledge base and have their work appear seamlessly in Obsidian on my local machine the next time I open the app. The proliferation of capable AI tools in 2025-2026 made both sides of this problem, reading and writing, tractable in a way they had not been before. This post documents the architecture I settled on: an Obsidian vault hosted on GitHub, synchronized via the Gitless Sync plugin, structured around three canonical files that any AI system can read and act on, and organized into a curated wiki that agents can query, extend, and maintain across platforms.

Bill Mongan

Motivation: Why Self-Host?

Hardware: The Mini PC

Installing Docker Engine

Installing Ollama

Workspace Layout

The Core Design Principle: A Unified Model Gateway

LiteLLM Deployment

The Full Stack: Services and Ports

Local Inference: Ollama and LocalAI

Browser Frontend: Open WebUI

Agentic CLI Tools: A Comparative Survey

Commercial Tools Deployment (Claude Code, Codex, Gemini CLI, Copilot, CCR, bash)

OpenCode Deployment

KiloCode Deployment

Pi Deployment

Hermes Agent Deployment

GNHF Deployment

The Google Ecosystem: Agents CLI and Workspace CLI

Google Agents CLI Deployment

Google Workspace CLI Deployment

Agent Frameworks: Agent Zero, Archon, and Mastra

Agent Zero

Archon

Mastra

Open Design: Collaborative Canvas with Embedded Agent

Open Design Deployment

Docker Volume Architecture: Identity, Workspace, Skills

Container-Isolated Tool Invocation

The Extended Ecosystem: Containerized Evaluation of AI Tools

Fabric

AnythingLLM

AutoGPT

CLI-Anything

Google Antigravity

T3 Code

Paperclip

Open-Source Cowork Alternatives in the Ecosystem

The Containerization Argument

Open-Source Cowork Alternatives

OpenWork Deployment

MCP Servers: Extending the Stack with Custom Tools

OpenRouter: A Cloud Model Interface

Reproducibility: The One-Shot Deploy Script

Model Selection Notes

Share on

You May Also Enjoy

BetterWebUI: A Faculty-Friendly Agentic Front End for OpenWebUI

AutoGUI: A Vendor-Neutral Desktop Automation Agent for LLMs

OSScreenObserver: Giving AI Agents Eyes and Hands on Your Desktop

A Private AI Knowledge Base: Obsidian, GitHub Sync, and Cross-Platform AI Context