Sophia Stein · Boulder, CO · online

AI Architect.
Local-first by design.

I design and ship agentic systems that run on hardware you own. Local inference, smaller models, deliberate architecture. The benchmark below is real.

28.4 t/sGLM-4 9B Q8_0 · ROCm 7.3 · 90GB GART · AMD Strix Point

Local-First AI

One email a week on local-first AI, agentic architecture, and what is actually working in 2026.

n8n templates, cost teardowns, and what is actually working in 2026. No drip sequences, no upsells. Reply to opt out.

read the blog →·beginner guides →·work with me →

Start here

I Run a Whole AI Stack on a Laptop for Three Cents an Hour

28.4 tokens per second on a laptop running GLM-4 9B, three cents of electricity per session, and the moment local inference stopped being a hobby.

The 40K Token Wall

Why bigger context windows are not the answer, and what production-tuned engineers actually trust in 2026.

When to Run Locally and When to Pay Anthropic

Real numbers, real workloads, real break-even points. When local is the obvious answer, when cloud is, and the hybrid that wins for most teams.

Speaking next · Boulder Startup Week 2026

Architecting Agentic Workflows for the Lean 2026 Startup

Thu May 7 · 11:00 AM · RegenHub

materials rsvp ↗

Pre-order · The Sovereign Stack

An interactive e-book on production agents on hardware you own — built to update for free as the local-AI stack changes.

Hardware selection, ROCm + Metal setup, model picks across Qwen / Gemma / Kimi / DeepSeek with the benchmark numbers I actually ran, agent harness design with paste-able eval harnesses, n8n + MCP orchestration patterns from production. PDF + EPUB + browsable web version. ~220 pages, ships fall 2026. Pre-order locks in lifetime free updates.

see the table of contents →

AI Architect.
Local-first by design.

Start here

I Run a Whole AI Stack on a Laptop for Three Cents an Hour

The 40K Token Wall

When to Run Locally and When to Pay Anthropic

What I do

Architecture

OSS Tools

Field Notes

Recent posts

Cost-Quality Pareto for Coding Agents (May 2026)

Building an Eval Harness That Catches Regressions

MCP Server Audit: Which Ones Actually Work in 2026

Mixing and Matching Open-Weight Models: A Recipe Book

The Multi-Agent Orchestration Frontier in May 2026

AI Architect.Local-first by design.

Start here

I Run a Whole AI Stack on a Laptop for Three Cents an Hour

The 40K Token Wall

When to Run Locally and When to Pay Anthropic

What I do

Architecture

OSS Tools

Field Notes

Recent posts

Cost-Quality Pareto for Coding Agents (May 2026)

Building an Eval Harness That Catches Regressions

MCP Server Audit: Which Ones Actually Work in 2026

Mixing and Matching Open-Weight Models: A Recipe Book

The Multi-Agent Orchestration Frontier in May 2026

AI Architect.
Local-first by design.