Cost-Quality Pareto for Coding Agents (May 2026)
Paying $5 per task no longer gets you the frontier. Qwen3.6-27B running locally hits 77.2% SWE-bench Verified at ~$0.04/task, within 10pp of Opus 4.7 for ~130x less money.
blog
Deep dives and field notes on local-first AI, agentic architecture, and what is actually working in 2026, with primary sources and reproducible benchmarks.
Showing 11 of 38 posts in Local models · clear
Long-form research articles with primary sources, benchmarks, and reference tables.
Paying $5 per task no longer gets you the frontier. Qwen3.6-27B running locally hits 77.2% SWE-bench Verified at ~$0.04/task, within 10pp of Opus 4.7 for ~130x less money.
An 84% cost reduction on a real SaaS workload, a 97% reduction on agentic dev loops, and the three-tier mix that actually ships in May 2026.
Local hardware vs rented GPUs vs serverless OSS APIs. Real prices, real benchmarks, and the workload-shape question that decides which path is right.
What the Model Context Protocol actually is, what it gets right, where it leaks, and why the local-first version is the cleaner story.
The engineering math behind preventing an agentic loop from burning through your monthly runway in one night.
28.4 tokens per second on a laptop running GLM-4 9B, three cents of electricity per session, and the moment local inference stopped being a hobby.
The build, the OpenClaw config, and the first agent worth running. End to end on a Framework 16 with 96GB unified memory.
A framework for finding which 20% of your tasks are agent-ready before you write a line of code.
Triage that does not just summarize. It prepares the drafts and fetches the data, and you approve. The 60-line config that actually works.
Real numbers, real workloads, real break-even points. When local is the obvious answer, when cloud is, and the hybrid that wins for most teams.
The repairable, AMD-powered laptop that runs my entire AI stack at three cents per session. The hardware case for Framework 16 in 2026.