blog

Local-First AI

Deep dives and field notes on local-first AI, agentic architecture, and what is actually working in 2026, with primary sources and reproducible benchmarks.

all posts Local models Agent architecture Production Founders & SMB Identity & compliance Culture

Showing 29 of 38 posts in Agent architecture · clear

Deep dives

Long-form research articles with primary sources, benchmarks, and reference tables.

Building an Eval Harness That Catches Regressions

May 7, 2026

Anthropic shipped three concurrent regressions over six weeks and their eval suite caught none of them. Even Anthropic ships blind. Here is the six-layer harness pattern that would have caught it.

Deep diveevalsagentsproductiontestinganthropicshadow-testing

MCP Server Audit: Which Ones Actually Work in 2026

May 7, 2026

Of the ~14,000 MCP servers in PulseMCP's hand-curated index, fewer than 30 are demonstrably production-ready. Here is the list, the criteria, and the failure modes.

Deep divemcpagentssecurityproductionoauthreview

Mixing and Matching Open-Weight Models: A Recipe Book

May 7, 2026

An 84% cost reduction on a real SaaS workload, a 97% reduction on agentic dev loops, and the three-tier mix that actually ships in May 2026.

Deep divelocal-modelscostopen-weightsagentsqwenkimideepseek

The Multi-Agent Orchestration Frontier in May 2026

May 7, 2026

Nine frameworks, one durable-execution wedge, zero unbroken benchmarks. An honest map of the multi-agent ecosystem in May 2026, anchored in Anthropic's 90.2%/15× receipts and Berkeley's eight-benchmark exploit.

Deep diveagentsframeworksproductionlanggraphcrewai

Field notes

AI Agent Conference NYC: 10 Takeaways

May 5, 2026

The 10 things from AI Agent Conference 2026 NYC (May 4-5, NY Hilton Midtown) that are actually load-bearing if you ship agents in 2026. The trust paradox, CrewAI's 42% AI-authored code, the iceberg under every project, AX as the new UX, and what the panels from Datadog, LanceDB, Carta, and the Codex/Linear/Graphite room actually said.

agentsproductionconferenceshitlevals

AI Dev SF: 10 Takeaways

May 5, 2026

The 10 things from AI Dev 26 SF (April 28-29, Pier 48) that are actually load-bearing if you build agentic systems in 2026. Marc Brooker on defects, Andrew Ng on PM bottlenecks, Bain's 8-subgraph payroll system, the 4-legged identity, hybrid doc OCR, and the simulation sandbox every action-taking agent needs.

agentsproductionconferencesevalssecurity

Shadow Testing: From 70% to 98% in Four Weeks

May 4, 2026

The single highest-leverage decision when shipping mission-critical autonomous agents. Production is the only truth.

agentsproductiontestingevals

Swarm vs Monolith

May 4, 2026

Why five specialized $0.01 agents beat one $0.50 god model, and what the multi-agent crowd gets wrong about it.

agentsarchitecturepatternsproduction

Notes on Human-in-the-Loop

May 4, 2026

Why human-in-the-loop is the only ethical and profitable way to scale agentic AI in a world of bot fatigue.

hitlagentsethicsproduction

AX: When Your Users Are Agents

May 3, 2026

Agent Experience is the new SEO. Here is what it means, what changes, and the four-step audit to figure out how your product looks to the agents already using it.

axagentsuxseodesign

Below the Waterline: What Decides Whether Your Agent Ships

May 3, 2026

The hidden engineering that decides whether your agent makes it to production. The 65/95 gap and the three foundations underneath it.

agentsproductioninfrastructureevals

Document OCR for Agentic Workflows

May 3, 2026

90% of enterprise data is locked in PDFs. The 2026 pipeline that gets it out is not RAG, not vision-only, and not the OCR you remember from 2018.

ocragentsdocumentsvlmllamaindex

Notes on the EU AI Act Deadline for Agentic Systems

May 2, 2026

Privacy, security, and consent when agents have access to your terminal and your sensitive data. The 2026 framework, and the EU AI Act deadline most teams are sleeping on.

ethicsagentssecurityprivacyai-act

MCP, Honestly: What It Is and What It Is Not

May 2, 2026

What the Model Context Protocol actually is, what it gets right, where it leaks, and why the local-first version is the cleaner story.

mcpagentsprotocolslocal-firstollama

How Agents Burn Through Runway, and How to Stop Them

May 2, 2026

The engineering math behind preventing an agentic loop from burning through your monthly runway in one night.

agentseconomicslocal-inferenceops

How Agents Remember, and How They Forget

May 1, 2026

Why agents forget by default, what the four types of memory actually are, and how to build a system that compounds across sessions.

memoryagentscontext-engineeringneo4j

OAuth Was Built for Three Actors. Agents Are the Fourth.

May 1, 2026

OAuth was designed for three actors. Agentic systems have four. Here is what breaks, what RFC 8693 fixes, and why most teams are shipping shared credentials anyway.

identityoauthagentsmcpsecurity

The 40K Token Wall

Apr 30, 2026

Why bigger context windows are not the answer, and what production-tuned engineers actually trust in 2026.

context-engineeringagentsragevals

Simulation Sandboxes for Agents

Apr 30, 2026

The fastest 2026 teams are testing autonomous agents in synthetic enterprise environments before any customer is exposed. With the case for it and the open-source pieces to build one.

agentstestingsimulationevalsproduction

The Death of the Junior Dev

Apr 29, 2026

What agentic workflows are actually doing to entry-level engineering, and what to do about it.

careersagentsai-native-teamshiring

What Vibe Coding Became

Apr 29, 2026

How to lead a codebase by stating intent instead of writing syntax, and the discipline that keeps it from falling apart.

agentsvibe-codingclaude-codeai-native

Plan-and-Execute vs ReAct: Picking Your Agent's Brain

Apr 26, 2026

The two dominant agent reasoning patterns in 2026, what they get right, where each one fails, and how to know which to pick.

agentspatternsreactplan-and-executedesign

How a Solo Founder Hits $1M ARR in 2026

Apr 25, 2026

Pieter Levels at $420K a month. Marc Lou at $1M a year across twelve micro-SaaS. Tony Dinh at $1M working twenty hours a week. The narrow real pattern, with sources, costs, and where it breaks.

startupsai-nativeagentsoperationsfounders

Building a Personal AI Agent in a Weekend

Apr 24, 2026

The build, the OpenClaw config, and the first agent worth running. End to end on a Framework 16 with 96GB unified memory.

amdrocmopenclawagentslocal-first

Self-Healing CI/CD Patterns

Apr 23, 2026

Agentic loops that detect, diagnose, and fix deployment errors before you see the notification. With the workflow that actually works in 2026.

devopsci-cdagentsproductionobservability

Find Your Agent-Ready Tasks in 90 Minutes

Apr 22, 2026

A framework for finding which 20% of your tasks are agent-ready before you write a line of code.

smbagentsauditautomationframework

An Agent for Competitive Intelligence

Apr 21, 2026

Using Apify, Firecrawl, and a local model to monitor every move your competitors make in real time. With the architecture and the weekly digest format that actually gets read.

competitive-intelagentsapifyfirecrawlmonitoring

The Cold Email Agent That Still Works in 2026

Apr 20, 2026

A research-first outbound agent that scrapes news, LinkedIn, and financials before drafting an email. With the architecture, the prompts, and the guardrails.

salesagentsn8noutboundautomation

The Zero-Inbox Agent

Apr 19, 2026

Triage that does not just summarize. It prepares the drafts and fetches the data, and you approve. The 60-line config that actually works.

agentsn8nollamainboxautomation

Local-First AI

Deep dives

›Building an Eval Harness That Catches Regressions

›MCP Server Audit: Which Ones Actually Work in 2026

›Mixing and Matching Open-Weight Models: A Recipe Book

›The Multi-Agent Orchestration Frontier in May 2026

Field notes

›AI Agent Conference NYC: 10 Takeaways

›AI Dev SF: 10 Takeaways

›Shadow Testing: From 70% to 98% in Four Weeks

›Swarm vs Monolith

›Notes on Human-in-the-Loop

›AX: When Your Users Are Agents

›Below the Waterline: What Decides Whether Your Agent Ships

›Document OCR for Agentic Workflows

›Notes on the EU AI Act Deadline for Agentic Systems

›MCP, Honestly: What It Is and What It Is Not

›How Agents Burn Through Runway, and How to Stop Them

›How Agents Remember, and How They Forget

›OAuth Was Built for Three Actors. Agents Are the Fourth.

›The 40K Token Wall

›Simulation Sandboxes for Agents

›The Death of the Junior Dev

›What Vibe Coding Became

›Plan-and-Execute vs ReAct: Picking Your Agent's Brain

›How a Solo Founder Hits $1M ARR in 2026

›Building a Personal AI Agent in a Weekend

›Self-Healing CI/CD Patterns

›Find Your Agent-Ready Tasks in 90 Minutes

›An Agent for Competitive Intelligence

›The Cold Email Agent That Still Works in 2026

›The Zero-Inbox Agent

Building an Eval Harness That Catches Regressions

MCP Server Audit: Which Ones Actually Work in 2026

Mixing and Matching Open-Weight Models: A Recipe Book

The Multi-Agent Orchestration Frontier in May 2026

AI Agent Conference NYC: 10 Takeaways

AI Dev SF: 10 Takeaways

Shadow Testing: From 70% to 98% in Four Weeks

Swarm vs Monolith

Notes on Human-in-the-Loop

AX: When Your Users Are Agents

Below the Waterline: What Decides Whether Your Agent Ships

Document OCR for Agentic Workflows

Notes on the EU AI Act Deadline for Agentic Systems

MCP, Honestly: What It Is and What It Is Not

How Agents Burn Through Runway, and How to Stop Them

How Agents Remember, and How They Forget

OAuth Was Built for Three Actors. Agents Are the Fourth.

The 40K Token Wall

Simulation Sandboxes for Agents

The Death of the Junior Dev

What Vibe Coding Became

Plan-and-Execute vs ReAct: Picking Your Agent's Brain

How a Solo Founder Hits $1M ARR in 2026

Building a Personal AI Agent in a Weekend

Self-Healing CI/CD Patterns

Find Your Agent-Ready Tasks in 90 Minutes

An Agent for Competitive Intelligence

The Cold Email Agent That Still Works in 2026

The Zero-Inbox Agent