Build the Founder's Discovery Engine

A Code-Along Tutorial

Build a customer-discovery agent that watches HN, Reddit, and Product Hunt for ICP signals, drafts personalized emails in your voice, and drops them in Gmail Drafts for human approval. Open source, ~$8–$15/month, runs while you sleep.

Audience: Early-stage founders, lean operators, anyone who can configure a Google Sheet. Time: 20 minutes live demo · 45–60 minutes self-paced if new to n8n. License: MIT — fork freely, ship yours.

Repo: github.com/sudosoph/bsw26-agentic-workflows Workshop: Boulder Startup Week 2026 · Sophia Stein · agenticarchitect.ai/blog Live reference workflow: https://sophiastein.app.n8n.cloud/workflow/bd3bleAE0dPG4SgN (replace sophiastein with your own n8n Cloud subdomain when you import it; the workflow ID stays the same)

⚠️ Before you start, scan n8n/README.md — it captures every platform gotcha we hit while building this live (Reddit blocking n8n Cloud IPs, voice.md DOCX issue, paired-item lookups, 60s code timeout, Gmail "To" field expression mode, etc.). Saves at least an hour of debugging.

What you'll have when you're done

✅ A scheduled n8n workflow that wakes up daily at 7 AM
✅ Drafts ~5 personalized customer-discovery emails per run
✅ Drops them in your Gmail Drafts folder (never sends — you send)
✅ Logs every action to a Google Sheet
✅ Sends you a daily digest email at 7:30 AM
✅ Follows up on no-replies after 5 days
✅ Costs about $0.21 per run · $6/month at the paid tier

Prerequisites

Before you start, you need 5 accounts. Free tiers cover everything.

Account	Why	Get it from	Cost
n8n.cloud	Workflow runtime	n8n.cloud	Free trial · $24/mo after
Anthropic API	Claude model + web search	console.anthropic.com	$20 budget covers months
Firecrawl	Web extraction	firecrawl.dev	500 free credits
Google Workspace	Sheets, Drive, Gmail	What you already use	Free
GitHub	Repo to fork	github.com	Free

Optional: cal.com for the Office Hours calendar link (free tier).

⚡ Autopilot · let an AI browser agent do the setup for you (~10 min hands-off)

If you have Claude for Chrome (Max/Pro subscribers · claude.com/chrome) or Gemini in Chrome (built into Chrome 127+) or OpenAI Operator, the entire setup phase can run autonomously while you make coffee.

Paste this into the agent:

"Set up the BSW Growth Agent demo. Walk through these tasks in order. Pause for me only when you need me to enter a password, accept an OAuth consent screen, or confirm payment.

1. Sign me in to console.anthropic.com · create a new API key called bsw-growth-agent · copy it to my clipboard · note: I'll add billing manually later.

2. Sign me in to firecrawl.dev · go to Dashboard → API Keys · copy the free-tier key · I shouldn't need a credit card for the trial.

3. Sign me in to console.groq.com · create an API key for the free tier called bsw-growth-agent · copy it.

4. Open n8n.cloud · sign in or start the 14-day trial · create a new workflow called BSW Growth Agent · then go to Workflows → Add → Import from File · I'll provide the JSON path.

5. Open Google Sheets · create a new sheet called Discovery Engine with three tabs named ICP, Sent, and Runs · paste the headers from handouts/google-sheet-seed.csv from the BSW repo into each tab.

6. Open Google Drive · create a folder called agentic-architect · inside it, create a Google Doc named voice.md and paste the contents of handouts/voice-md-template.md from the BSW repo (which I'll customize after).

7. When done, give me back: the Anthropic key, the Firecrawl key, the Groq key, the Sheet ID (from the URL), and the voice.md file ID. I'll wire these into n8n manually."

The agent handles signups + form fills + key copy. You only intervene at OAuth consent screens and payment forms — typically 2 minutes of your real time across the whole 10-minute autopilot run.

Cannot autopilot via Claude Code CLI / shell-only AI. Browser interaction needs a browser-controlling agent. If you don't have one, follow the manual steps below — they take ~30 minutes.

Setup (5 min · do this before the build)

1. Get API keys

Anthropic:

Sign in to console.anthropic.com
Settings → API Keys → Create Key
Name it n8n-discovery-engine · copy the key (starts with sk-ant-...)
Add $20 in Billing if not already

Firecrawl:

Sign in at firecrawl.dev
Dashboard → API Keys → copy (starts with fc-...)

2. Create the Google Sheet

In Google Sheets, create one spreadsheet called Discovery Engine with three tabs:

Tab 1 — ICP (1 row of config)

icp_description	signal_keywords
Early-stage SaaS founders, pre-PMF, technical-leaning, complaining about outbound costs	n8n cost, Lindy credits, hired SDR, founder-led sales, Sonnet 4.6 cost

Tab 2 — Sent (just the headers, agent will append rows)

date · person · signal_type · source_url · score · draft_subject · status

Tab 3 — Runs (just the headers, agent will append rows)

date · leads_found · qualified · drafts · errors · notes

Save the sheet ID from the URL — looks like 1AbC...XyZ between /d/ and /edit.

3. Create the Drive folder + voice.md

In Google Drive, create a folder called agentic-architect.

Inside it, create a Google Doc named voice.md. Paste the contents of handouts/voice-md-template.md from the repo. Edit it to be your voice. 5 example emails minimum.

Right-click voice.md → Get link → copy. Save the file ID (between /d/ and /view).

4. Get your n8n workflow scaffold

Two options:

Option A — fork the repo (fastest)

# In your terminal
gh repo fork sudosoph/bsw26-agentic-workflows --clone
cd bsw-growth-agent

Then in n8n: Workflows → Add → Import from File → upload n8n/bsw-growth-agent.json. The full workflow appears with placeholder credentials. You only need to wire the credentials and replace the placeholder IDs (covered below).

Option B — build from scratch (this tutorial) Follow the 17 steps below.

The 20-minute build · 17 steps

Time budget per step shown in [X min]. Total: 18 minutes building + 2 minutes testing.

Step 1 · Create the workflow + Schedule Trigger `[1 min]`

n8n → Workflows → Add Workflow → Blank
Name it Founder's Discovery Engine
Drag a Schedule Trigger node onto canvas
Configure:
- Trigger Interval: Custom (Cron)
- Cron Expression: 0 13 * * * (every day at 7:00 AM Boulder/Denver during MDT — 13:00 UTC). Adjust UTC for your zone: PT=14, ET=11, UK=06, CET=05.

Why cron? Almost every SMB agent starts as a daily scheduled job. You want predictable behavior, not chaos.

Step 2 · Add a Manual Trigger `[1 min]`

For testing and live demos, add a manual trigger as a parallel entry point.

Drag a Manual Trigger node next to the Schedule Trigger
Both will feed into Step 3

Why both? Cron runs daily without you. Manual lets you test the workflow on demand without waiting until 7 AM.

Step 3 · Read ICP from Sheets `[2 min]`

Drag a Google Sheets node, connect from both triggers
Operation: Read Rows from Sheet
Authentication: OAuth2 → Connect new credential → sign into your Google account → grant access
Document: paste your Sheet ID from setup step 2
Sheet: ICP
Output options: leave defaults

Test it: Click "Execute Node" — you should see your icp_description and signal_keywords come back.

Pattern: Config-as-files. The agent reads its operating instructions from a Sheet. You can edit the Sheet without touching n8n.

Step 4 · Read voice.md from Drive `[1 min]`

Drag a Google Drive node, connect from both triggers (parallel to the Sheets read)
Operation: Download File
File: paste your voice.md file ID
Binary Property: voiceMd (we'll reference this name later)

Test it: Click "Execute Node" — you should see the file content as base64. We'll decode it in Step 11.

Step 5 · Discovery via Claude Haiku 4.5 + web_search `[3 min]`

This is sub-agent #1. One Claude call does both web search and structured extraction.

Drag an HTTP Request node, connect from the Sheets node
Method: POST
URL: https://api.anthropic.com/v1/messages
Authentication → Generic Credential Type → HTTP Header Auth
- Create new credential: Name=x-api-key, Value=your Anthropic key
Headers (add):
- anthropic-version: 2023-06-01
- content-type: application/json
Body → JSON:

{
  "model": "claude-haiku-4-5",
  "max_tokens": 2000,
  "tools": [
    { "type": "web_search_20250305", "name": "web_search", "max_uses": 5 }
  ],
  "system": "You are a customer-discovery research agent for a lean startup.\n\nICP context:\n{{ $('Google Sheets').item.json.icp_description }}\n\nSignal keywords: {{ $('Google Sheets').item.json.signal_keywords }}\n\nReturn ONLY a JSON array. No prose.",
  "messages": [
    {
      "role": "user",
      "content": "Search Hacker News, Reddit (r/SaaS, r/Entrepreneur, r/AI_Agents), and Product Hunt for posts from the last 7 days where founders or operators are publicly displaying signals matching my ICP. Look for: pain mentions, hiring posts I could solve, complaints about competitors, asks for tools.\n\nReturn JSON array, max 30 items, schema:\n[\n  {\n    \"person\": \"@handle or display name\",\n    \"signal_type\": \"pain | hiring | complaint | tool_ask\",\n    \"source_url\": \"full URL to the original post\",\n    \"evidence_quote\": \"verbatim 1-2 sentence quote\",\n    \"score\": 0-10,\n    \"company\": \"company name if discoverable\",\n    \"company_url\": \"company website if discoverable\"\n  }\n]"
    }
  ]
}

Why Haiku, not Sonnet? Sub-agent #1 is classification. $1/$5 per million tokens is plenty. Save Sonnet for nuance. This is the cascade pattern that cuts your bill 60–70%.

Test it: Execute. You should see a Claude response with content[0].text containing a JSON array. May take 15–30 sec because of web search.

Step 6 · Parse JSON from Claude `[1 min]`

Claude returns content blocks. We need to extract the JSON array.

Drag a Code node after the Claude HTTP Request
Mode: Run Once for All Items
Language: JavaScript
Paste:

const response = $input.first().json;
const contentBlocks = response.content || [];

let rawText = '';
for (const block of contentBlocks) {
  if (block.type === 'text' && block.text) rawText += block.text;
}

const jsonMatch = rawText.match(/\[[\s\S]*\]/);
if (!jsonMatch) {
  return [{ json: { error: 'No JSON in response', raw: rawText, leads: [] } }];
}

let leads;
try { leads = JSON.parse(jsonMatch[0]); }
catch (e) { return [{ json: { error: 'Parse failed', message: e.message } }]; }

const qualified = leads.filter(l => (l.score ?? 0) >= 6);
return qualified.map(lead => ({ json: lead }));

What this does: Strips Claude's prose around the JSON, parses it, filters to leads scored 6 or higher. Each surviving lead becomes a separate item flowing through the workflow.

Step 7 · Read the Sent log for dedup `[1 min]`

Drag another Google Sheets node, connect from the Code node
Operation: Read Rows from Sheet
Same Document ID as before
Sheet: Sent

Why? We need to know who we've already contacted so we don't double-email anyone. This is idempotency — re-run safety.

Step 8 · Dedup + keep top 5 `[1 min]`

Drag another Code node
Connect from BOTH the Parse node AND the Read Sent node (n8n supports multi-input)
Paste:

const leads = $input.all().map(i => i.json);
const sentRows = $('Google Sheets1').all().map(i => i.json);  // your Sent-read node name

const contactedUrls    = new Set(sentRows.map(r => r.source_url).filter(Boolean));
const contactedHandles = new Set(sentRows.map(r => r.person).filter(Boolean));

const fresh = leads.filter(lead => {
  if (lead.source_url && contactedUrls.has(lead.source_url)) return false;
  if (lead.person     && contactedHandles.has(lead.person))    return false;
  return true;
});

fresh.sort((a, b) => (b.score ?? 0) - (a.score ?? 0));
const top5 = fresh.slice(0, 5);

return top5.map(lead => ({ json: lead }));

What this does: Filters out anyone in the Sent log, sorts by score, keeps top 5. Progressive enrichment starts here — we only spend money researching the 5 best candidates.

Step 9 · Enrich top 5 with Firecrawl `[1 min]`

Drag an HTTP Request node
Method: POST
URL: https://api.firecrawl.dev/v1/scrape
Authentication → HTTP Header Auth (new credential):
- Name: Authorization
- Value: Bearer YOUR_FIRECRAWL_KEY
Body → JSON:

{
  "url": "{{ $json.company_url || $json.source_url }}",
  "formats": ["markdown"],
  "onlyMainContent": true
}

Options → Timeout: 30000 (Firecrawl can take a few seconds)

Why Firecrawl? Best signal-to-noise on JavaScript-heavy sites. Apache 2.0 license — self-host on Docker for free, or use Cloud at $83/mo to skip operations.

Step 10 · Summarize the company in 2 sentences `[1 min]`

Drag another HTTP Request to Anthropic (same credential as Step 5)
Method: POST · URL: https://api.anthropic.com/v1/messages
Body → JSON:

{
  "model": "claude-haiku-4-5",
  "max_tokens": 300,
  "system": "You write concise 2-line company summaries. Output plain text. No markdown.",
  "messages": [{
    "role": "user",
    "content": "Company markdown extract:\n\n{{ $json.data.markdown.slice(0, 4000) }}\n\nIn exactly two sentences, what does this company do and what's their current state?"
  }]
}

Pattern: Cascade again — Haiku for the cheap summary, Sonnet for the next step's drafting.

Step 11 · Draft the email with Sonnet 4.6 + voice.md `[2 min]`

This is the only premium-token step in the workflow. The voice.md file gets cached so we pay full price once and 0.1× on every subsequent run within 5 minutes.

Drag another HTTP Request to Anthropic
Add header: anthropic-beta: prompt-caching-2024-07-31
Body → JSON:

{
  "model": "claude-sonnet-4-6",
  "max_tokens": 600,
  "system": [
    {
      "type": "text",
      "text": "You write customer-discovery emails for a lean startup founder. Goal: an INTERVIEW ASK — not a pitch. Soft CTA. 80–110 words. Follow voice.md exactly."
    },
    {
      "type": "text",
      "text": "voice.md contents:\n\n{{ $('Google Drive').item.binary.voiceMd.toString('utf-8') }}",
      "cache_control": { "type": "ephemeral" }
    }
  ],
  "messages": [{
    "role": "user",
    "content": "Draft a customer-discovery email.\n\nPerson: {{ $('Code1').item.json.person }}\nSignal: {{ $('Code1').item.json.signal_type }} — {{ $('Code1').item.json.evidence_quote }}\nSource: {{ $('Code1').item.json.source_url }}\nCompany context: {{ $('HTTP Request2').item.json.content[0].text }}\n\nReturn:\nSUBJECT: [subject]\nBODY:\n[body]"
  }]
}

Note: Replace Code1 and HTTP Request2 with your actual node names. n8n auto-names them; rename for clarity if you want.

Step 12 · Parse SUBJECT/BODY from Claude `[1 min]`

Drag a Code node
Paste:

const response = $input.first().json;
const text = (response.content?.[0]?.text || '').trim();

const subjectMatch = text.match(/^SUBJECT:\s*(.+?)\s*\n/);
const bodyMatch    = text.match(/BODY:\s*\n([\s\S]+)$/);

const subject = subjectMatch ? subjectMatch[1].trim() : 're: a quick question';
const body    = bodyMatch ? bodyMatch[1].trim() : text;

const lead = $('Code1').item.json;  // carry forward original lead
return [{ json: { subject, body, lead } }];

Why SUBJECT/BODY format and not nested JSON? LLMs occasionally produce malformed JSON when nesting multi-line strings. Plain SUBJECT: / BODY: is robust to imperfect output.

Step 13 · Create Gmail draft (HITL gate) `[1 min]`

Drag a Gmail node
Authentication → OAuth2 → sign in to your Google account · grant gmail.compose scope
Resource: Draft
Operation: Create
Subject: ={{ $json.subject }}
Message: ={{ $json.body }}
Options → To: ={{ $json.lead.person ? $json.lead.person + '@unknown.example' : 'TODO_RESOLVE@example.com' }}

HITL gate. The agent NEVER sends. It always creates a Draft. You review and click Send manually.

Note: Resolving the prospect's actual email address is out of scope today. Add an Apollo or Hunter API node here when productionizing.

Step 14 · Append to Sent log `[1 min]`

Drag a Google Sheets node
Operation: Append
Document: same ID
Sheet: Sent
Mapping mode: Define Below
- date: ={{ new Date().toISOString().slice(0,10) }}
- person: ={{ $json.lead.person }}
- signal_type: ={{ $json.lead.signal_type }}
- source_url: ={{ $json.lead.source_url }}
- score: ={{ $json.lead.score }}
- draft_subject: ={{ $json.subject }}
- status: pending_review

Why log immediately? Idempotency. If you re-run the workflow, dedup (Step 8) sees the already-logged person and skips them.

Step 15 · Generate the daily digest `[1 min]`

After all leads have been processed (i.e., all draft emails created), generate one summary email.

Drag another HTTP Request to Anthropic (Sonnet)
Body → JSON:

{
  "model": "claude-sonnet-4-6",
  "max_tokens": 400,
  "system": "You write a brief daily digest email for a founder running a customer-discovery agent. One paragraph. Friendly. Specific numbers.",
  "messages": [{
    "role": "user",
    "content": "Today the agent processed {{ $('Code').all().length }} discovery results, qualified {{ $('Code1').all().length }} leads, drafted {{ $('Code2').all().length }} emails sitting in Gmail Drafts. Top signal types: {{ $('Code1').all().map(i => i.json.signal_type).join(', ') }}. Write a short morning digest. Subject + body. Tell them to approve drafts before 5pm."
  }]
}

Step 16 · Send the digest to yourself `[1 min]`

Drag a Gmail node
Resource: Message · Operation: Send
To: your own email (e.g., sophia@agenticarchitect.ai)
Subject: parse from the digest response (use a small Code node before, or paste a static one for now)
Message: parse from the digest response

This is the only place the agent actually sends — and it's only sending to YOU. Everyone else gets drafts.

Step 17 · Append to Runs audit log `[1 min]`

Final step — log this run for trust-building.

Drag a Google Sheets node
Operation: Append
Sheet: Runs
Map:
- date: today's date
- leads_found: count from Step 5
- qualified: count from Step 6
- drafts: count from Step 12
- errors: 0 for now
- notes: auto

This is your trust-building primitive. After 14 runs of clean data, you have evidence to graduate the agent from Pending Review to Conditional Auto.

Test the workflow

Click the Manual Trigger node → Execute Workflow
Watch the execution flow through all 17 steps
Check Gmail Drafts — you should see ~5 fresh drafts
Check the Sent Sheet — 5 new rows
Check the Runs Sheet — 1 new row
Check your inbox — 1 digest email

Total run time: ~30–60 seconds depending on Firecrawl latency.

Cost per run: Check the Anthropic Console → Usage. Should be around $0.20.

Common errors

Error	Cause	Fix
`401 Unauthorized` from Anthropic	API key not pasted right	Re-paste in n8n credential, check for whitespace
`model not found`	Wrong model string	Use `claude-sonnet-4-6` and `claude-haiku-4-5` exactly
`402 Payment Required` from Firecrawl	Trial credits used up	Top up or self-host Firecrawl on Docker
Sheets append fails	OAuth scope missing	Re-auth Sheets credential, ensure write scope enabled
Gmail draft creation fails	OAuth scope `gmail.compose` not granted	Re-auth Gmail with full compose permissions
No drafts appear	Recipient address is placeholder	Drafts are still in Gmail Drafts folder · check there
`parse failed` in Step 6	Claude returned malformed JSON	Try the workflow again — usually transient

Activate the workflow

Once tests pass:

Click the workflow Active toggle (top right of n8n)
The Schedule Trigger will fire daily at 7 AM
Drafts wait for you in Gmail every morning

Verify: The next morning, check your Drafts folder. If there are 5 new drafts, you're live.

Add the follow-up sub-workflow (optional, +5 min)

Same architecture, different inputs. Drafts a follow-up if no reply in 5 days.

Create a second n8n workflow: Discovery Engine · Follow-Up
Schedule Trigger at 0 8 * * * (8 AM daily)
Gmail node → Get Many Threads → filter by label Sent — Discovery
Code node → for each thread, check elapsed days + reply count
If 5+ days AND no reply: HTTP Request to Sonnet 4.6 with the thread context, ask for a follow-up draft that references the original thread (not a copy-paste reminder)
Gmail → Create Draft with the follow-up

The same six primitives (trigger, read state, sub-agent, draft, gate, log) apply.

Going to production · the trust ladder

Today's workflow ships at Stage 2 · Pending Review of the trust ladder. The agent drafts, you send. Over time, you can graduate.

Stage	What changes	Required guardrails
2 · Pending Review (today)	Drafts only · never sends	Schema validation · audit log · do-not-contact filter
3 · Conditional Auto	Auto-fires drafts where score ≥ 8 AND voice match ≥ 0.85	+ rate limit per recipient · anomaly detection · sandbox isolation
4 · Full Auto	Fires all · alerts on exceptions	+ A/B replay testing · drift detection · automatic rollback

Promotion criteria:

Stage 2 → 3: ≥ 95% approval rate over 2 weeks
Stage 3 → 4: auto-error rate < 1% sustained 30+ days

Watch the Runs Sheet. The data tells you when you're ready.

Cost optimization · once you're running

Lever	Reduction	Effort
Cascade (Haiku for routing, Sonnet for drafting)	-60% to -70%	Built into this tutorial
5-min cache on voice.md + ICP	-30% additional	Built into Step 11
Batch API on the daily digest	-50% on that node	Add `?batch=true` to digest call
Self-host n8n on $5 Hetzner VPS	-$24/mo	One weekend of Docker
Self-host Firecrawl on Docker	-$83/mo	Apache 2.0 image, 30 min
Local Qwen 3.6 for classification (Strix Halo / Mac Studio)	API → ~$0	Hardware up front, then ~free

Ship Tier 1 (all SaaS) this week. Move to Tier 3 (lean self-hosted) when your bill hits $50/mo. Skip Tier 4 unless you have privacy-sensitive workloads.

Customize for your startup

The architecture you just built is general. Same six nodes, different content:

Your agent	Trigger change	Search target	Output target
Inbound Triage	Form fill webhook	Lead's company website	Gmail draft
Competitor Pulse	Cron daily	Competitor changelogs	Email digest
Support Draft	Helpdesk webhook	Your KB + ticket history	Helpdesk draft
Content Repurposer	Blog publish webhook	Original blog post	Drafts in Drive
Investor Update	Cron monthly	Your metrics dashboards	Gmail draft

Fork the repo, change the trigger, change icp.md and voice.md, ship. Same architecture, different agent.

Resources

Repo: github.com/sudosoph/bsw26-agentic-workflows
Workflow JSON: n8n/bsw-growth-agent.json
voice.md template: handouts/voice-md-template.md
ICP template: handouts/icp-md-template.md
Schemas reference: handouts/schemas-md-template.md
Workshop slides: slides/index.html
Speaker script: script/speaker-notes.md

Office Hours · 5 free 30-min audits for BSW attendees: cal.com/sophia-stein/architect-audit-bsw (capped at 5 bookings)

Newsletter: agenticarchitect.ai/blog — weekly deep-dives on agentic architecture for lean founders

Email: sophia@agenticarchitect.ai

Tutorial v1.0 · MIT licensed · fork freely · ship yours · — Sophia Stein, AI Architect

Build the Founder's Discovery Engine

A Code-Along Tutorial

What you'll have when you're done

Prerequisites

⚡ Autopilot · let an AI browser agent do the setup for you (~10 min hands-off)

Setup (5 min · do this before the build)

1. Get API keys

2. Create the Google Sheet

3. Create the Drive folder + voice.md

4. Get your n8n workflow scaffold

The 20-minute build · 17 steps

Step 1 · Create the workflow + Schedule Trigger [1 min]

Step 2 · Add a Manual Trigger [1 min]

Step 3 · Read ICP from Sheets [2 min]

Step 4 · Read voice.md from Drive [1 min]

Step 5 · Discovery via Claude Haiku 4.5 + web_search [3 min]

Step 6 · Parse JSON from Claude [1 min]

Step 7 · Read the Sent log for dedup [1 min]

Step 8 · Dedup + keep top 5 [1 min]

Step 9 · Enrich top 5 with Firecrawl [1 min]

Step 10 · Summarize the company in 2 sentences [1 min]

Step 11 · Draft the email with Sonnet 4.6 + voice.md [2 min]

Step 12 · Parse SUBJECT/BODY from Claude [1 min]

Step 13 · Create Gmail draft (HITL gate) [1 min]

Step 14 · Append to Sent log [1 min]

Step 15 · Generate the daily digest [1 min]

Step 16 · Send the digest to yourself [1 min]

Step 17 · Append to Runs audit log [1 min]