May 2026 · subject to drift

stack

The exact hardware, AI tooling, open-weight models, and web stack I run. Reproducible, auditable, local-first wherever local-first works.

Jump to layer

noteEvery link goes to the official product page. I only list things I actually use.

L0Hardware

Framework 16 · maxed

L1OS

Ubuntu 26.04 LTS · Resolute Raccoon

GNOME 50 on Wayland-only, memory-safe Rust coreutils, systemd 259 with mandatory cgroup v2, TPM-backed full-disk encryption out of the box. AMDGPU support for Strix Point ships in the GA tree. Boring on purpose.

L2Inference (GPU + model serving)

L3Models · local

downloaded · running on this laptop

I do not run Llama. The 2026 open-weight frontier shifted decisively to Qwen, Gemma, Kimi, and DeepSeek, and these are all sitting on the NVMe in this laptop, not behind a third-party API.

Qwen 3.6 27B Dense (coder)

Daily driver for agentic coding loops. 77.2% on SWE-bench Verified. The 27B Dense actually beats Qwen's own 397B MoE flagship on coding tasks. Apache 2.0.

Qwen 3.6 35B-A3B (mixture)

The MoE variant in the same family. I rotate it in for fan-out turns where I want throughput over peak intelligence. Apache 2.0.

Gemma 4 31B Dense

Default for UI generation with Tailwind. Google trained the family heavily on frontend code. The 31B Dense lands #3 on the open Arena leaderboard. Apache 2.0.

Kimi K2.6 (quantized GGUF)

Moonshot AI's 1T-parameter MoE, run aggressively quantized via llama.cpp's INT4 path. Strongest local model I have for natural-language to Awwwards-grade UI.

DeepSeek V4 Lite

The ~200B parameter local-friendly variant of V4. Multimodal plus spatial reasoning: diagram parsing, screenshot-to-code, document extraction. MIT.

L4Models · frontier APIs

~5% of turns

The only things I call by network. Reserved for when local genuinely cannot do the job. Two providers by design, no single-vendor lock-in.

Claude (Sonnet 4.5 / Opus 4.x)

My pick for the ~5% of agentic turns that genuinely need the smartest model in the world. Skills, hooks, and the harness around Claude Code are the best in class as of May 2026.

GPT-5

Second frontier option. I rotate between the two when one regresses. Two providers, one harness, no lock-in.

L5Data: ingest, validate, store, retrieve

The full data path: get it in (Apify), parse documents (LlamaIndex), shape and validate (Pydantic), store it where it belongs (Chroma for vectors, Neo4j for graph, Postgres for everything else).

L6Orchestration

Workflow runtimes that turn a one-off prompt into a recurring, governed, durable system.

L7Coding agents

The daily harnesses I rotate through. Different strengths, same project memory file.

L8Web stack

The framework, deployment, and CMS the site runs on. Picked because every coding agent has read enough of these to be genuinely useful in them.

L9Comms

The four pieces that handle every inbound and outbound message: mail server, transactional email, newsletter, customer chat.

L10Bizops

Sales, money, and operations. Replaces ~$1,500/month of managed SaaS at the cost of one VPS.

Ops & knowledge

Configs

The ROCm install commands, GART kernel parameters, llama.cpp HIP build flags, n8n workflow templates, MCP bridge config for Ollama, and the docker-compose for the self-hosted PostHog / GlitchTip / Listmonk / Umami stack land on github.com/sudosoph as apu-config and solo-stack.