May 2026 · subject to drift

stack

The exact hardware, AI tooling, open-weight models, and web stack I run. Reproducible, auditable, local-first wherever local-first works.

Jump to layer

L0Hardware
L1OS
L2Inference
L3Models · local
L4Models · frontier
L5Data
L6Orchestration
L7Coding agents
L8Web stack
L9Comms
L10Bizops

noteEvery link goes to the official product page. I only list things I actually use.

L0Hardware

Framework 16 · maxed

Framework 16 (AMD)↗

Ryzen AI 9 HX 370 · Strix Point · Zen 5 + RDNA 3.5 + XDNA 2

Repairable, upgradeable, the only laptop I trust to still be running this stack in 2030.

Radeon RX 7700S↗

8GB GDDR6 · 100W TGP · Navi 33

Discrete GPU module for cold starts and bursty image / video workloads.

Memory↗

96GB DDR5-5600 (2 × 48GB SO-DIMM)

The number that matters. 90GB carved as GART feeds the iGPU directly out of system RAM.

Storage↗

4TB primary + 2TB secondary (NVMe Gen 4)

Models, datasets, vector indexes, and the inevitable swap when you push the loop too hard.

Audio↗

AIAIAI TMA-2 Studio Wireless+

Modular, repairable headphones from a Danish industrial-design house. Same philosophy as Framework.

L1OS

Ubuntu 26.04 LTS · Resolute Raccoon

GNOME 50 on Wayland-only, memory-safe Rust coreutils, systemd 259 with mandatory cgroup v2, TPM-backed full-disk encryption out of the box. AMDGPU support for Strix Point ships in the GA tree. Boring on purpose.

L2Inference (GPU + model serving)

ROCm 7.3↗

AMD GPU runtime. First release where Strix Point + RDNA 3.5 is genuinely supported as a pair.

Vulkan↗

Cross-platform GPU runtime. The fallback when a model or quant misbehaves on HIP, and increasingly competitive on AMD for inference.

llama.cpp↗

LLM inference engine. Built from source against gfx1150 + gfx1102. The most-used binary on this machine.

Ollama↗

Model server with REST API. Wrapped by an MCP bridge for tool-using agents (Ollama still does not speak MCP natively).

L3Models · local

downloaded · running on this laptop

I do not run Llama. The 2026 open-weight frontier shifted decisively to Qwen, Gemma, Kimi, and DeepSeek, and these are all sitting on the NVMe in this laptop, not behind a third-party API.

Qwen 3.6 27B Dense (coder)

Daily driver for agentic coding loops. 77.2% on SWE-bench Verified. The 27B Dense actually beats Qwen's own 397B MoE flagship on coding tasks. Apache 2.0.

Qwen 3.6 35B-A3B (mixture)

The MoE variant in the same family. I rotate it in for fan-out turns where I want throughput over peak intelligence. Apache 2.0.

Gemma 4 31B Dense

Default for UI generation with Tailwind. Google trained the family heavily on frontend code. The 31B Dense lands #3 on the open Arena leaderboard. Apache 2.0.

Kimi K2.6 (quantized GGUF)

Moonshot AI's 1T-parameter MoE, run aggressively quantized via llama.cpp's INT4 path. Strongest local model I have for natural-language to Awwwards-grade UI.

DeepSeek V4 Lite

The ~200B parameter local-friendly variant of V4. Multimodal plus spatial reasoning: diagram parsing, screenshot-to-code, document extraction. MIT.

L4Models · frontier APIs

~5% of turns

The only things I call by network. Reserved for when local genuinely cannot do the job. Two providers by design, no single-vendor lock-in.

Claude (Sonnet 4.5 / Opus 4.x)

My pick for the ~5% of agentic turns that genuinely need the smartest model in the world. Skills, hooks, and the harness around Claude Code are the best in class as of May 2026.

GPT-5

Second frontier option. I rotate between the two when one regresses. Two providers, one harness, no lock-in.

L5Data: ingest, validate, store, retrieve

The full data path: get it in (Apify), parse documents (LlamaIndex), shape and validate (Pydantic), store it where it belongs (Chroma for vectors, Neo4j for graph, Postgres for everything else).

Apify↗

Web scraping and data acquisition. Marketplace of 25K+ tools agents can call without rolling your own scrapers. The way external data gets into the loop.

LlamaIndex↗

Document parsing and ingest. PDFs, scans, tables, charts in. Structured agent-ready output out. The OCR plus reasoning pipeline most agentic doc workloads need.

Pydantic↗

Schemas and validation. The reason structured-output extraction works. Used under the hood by FastMCP, CrewAI, and most production agent frameworks.

Chroma↗

Vector store and agentic-search tooling. Their context-rot research is required reading before you reach for a million-token window.

Neo4j↗

Property graph database. The substrate for knowledge graphs agents traverse. GraphRAG patterns run here.

Postgres + pgvector↗

Boring, fast, audit-friendly. Default for relational plus vector storage when a use case outgrows flat files.

L6Orchestration

Workflow runtimes that turn a one-off prompt into a recurring, governed, durable system.

n8n (self-hosted)↗

Low-code workflow automation. The factory floor for every recurring agentic workflow on this site and business.

Temporal↗

Durable distributed execution. Used when an agentic workflow has to survive process restarts, deploys, and the kind of failures that do not happen on demos.

CrewAI↗

Multi-agent orchestrator for role-based crews. The framework I reach for when a workflow decomposes into specialists.

LangGraph↗

Graph-shaped agent execution. The architecture the Bain HR Services payroll agent uses (8 subgraphs, 98% accuracy on 3K live emails per day).

L7Coding agents

The daily harnesses I rotate through. Different strengths, same project memory file.

Claude Code↗

Daily driver. Skills + hooks + project memory. The harness most of this site was built in.

OpenAI Codex↗

Second-string daily driver. Better at long detail-oriented refactors. Worse at the snappy interactive loop.

Aider↗

Terminal-native pair-programmer. The honest middle ground between vibe-coding and writing every line.

OpenClaw↗

Persistent terminal-resident agent that lives across sessions. The CEO interface on top of n8n.

L8Web stack

The framework, deployment, and CMS the site runs on. Picked because every coding agent has read enough of these to be genuinely useful in them.

Next.js↗

App Router, Turbopack, RSC. The framework every coding agent has read enough of to be genuinely useful in.

Tailwind CSS↗

Utility-first CSS that LLMs were trained on so heavily it almost generates itself. Pairs with Gemma 4 / Kimi for instant UI.

Cloudflare Workers↗

The site you are reading runs on a single Worker via @opennextjs/cloudflare. Free tier handles real traffic.

Keystatic↗

Git-backed CMS. Content lives in MDX in the repo, not in someone else's database.

L9Comms

The four pieces that handle every inbound and outbound message: mail server, transactional email, newsletter, customer chat.

Cloudflare Email Routing↗

Today's forwarder. sophia@agenticarchitect.ai routes through Cloudflare, lands in Gmail, sends as the custom domain. Free, zero infra. Migration target is Stalwart on Hetzner once the agent loop needs an inbox it can write to as well as read.

Stalwart Mail (self-hosted, Q3 2026)↗

Open-source mail server. The Fastmail / Google Workspace replacement when you want to own the inbox the agent reads. Migration begins after BSW.

Plunk (self-hosted)↗

Open-source transactional email, MIT-licensed. The Resend alternative when you want to own deliverability. Resend is the managed fallback.

Listmonk (self-hosted)↗

Newsletter platform. Sends The Architect's Notebook. Replaces Buttondown / Substack at near-zero marginal cost.

Chatwoot (self-hosted)↗

Customer service plus live chat widget. Inbound conversations route through the triage agent before reaching me.

ntfy (self-hosted)↗

Push notifications for the human-in-the-loop. When an agent needs me to confirm or look at something, ntfy delivers the ping to my phone. Pairs with HITL flows in n8n. Apache 2.0.

L10Bizops

Sales, money, and operations. Replaces ~$1,500/month of managed SaaS at the cost of one VPS.

Sales & money

Cal.com (self-hosted)↗

Open-source booking. Calendly replacement. Routes inbound calls through n8n with agent-prepared context before the meeting.

EspoCRM (self-hosted)↗

Open-source CRM. The HubSpot alternative when you want the agent to read and write customer state without sending everything to a vendor.

Stripe + Lago↗

Stripe for the payment processor (no real OSS for moving money, banking is regulated). Lago for the billing infrastructure on top: usage metering, invoicing, plan management. MIT-licensed.

Mercury · Relay · Bunq↗

Founder-friendly business banking. Mercury and Relay (US) for free no-fee operating accounts. Bunq Business (EU) for all-in-one. Wise for multi-currency.

Ops & knowledge

Plane (self-hosted)↗

Open-source issue tracking. The Linear alternative when you want the polish without the per-seat tax. AGPL-licensed.

PostHog (self-hosted)↗

Product analytics, session replay, feature flags, surveys. Replaces ~$300/month of SaaS at zero marginal cost.

Umami (self-hosted)↗

Privacy-friendly web analytics for the public site. Lighter touch than PostHog where session replay is overkill.

GlitchTip (self-hosted)↗

Sentry-compatible error tracking. ~5x cheaper than Sentry at scale, MIT-licensed.

Outline (self-hosted)↗

Team knowledge base. Notion alternative. Stores skills, runbooks, and the docs the coding agents read alongside the codebase.

Vaultwarden (self-hosted)↗

Password manager. Bitwarden-compatible server, Rust rewrite. Where every API key the agent needs to rotate actually lives.

Hetzner Cloud↗

The $5/mo VPS that hosts the self-hosted half of this stack. EU-based, frugal, predictable.

Uptime Kuma (self-hosted)↗

Open-source uptime monitoring. The Pingdom / Better Uptime alternative. Watches every self-hosted service in this stack, pings ntfy when something goes down. MIT-licensed.

Configs

The ROCm install commands, GART kernel parameters, llama.cpp HIP build flags, n8n workflow templates, MCP bridge config for Ollama, and the docker-compose for the self-hosted PostHog / GlitchTip / Listmonk / Umami stack land on github.com/sudosoph as apu-config and solo-stack.