April 21, 2026
An Agent for Competitive Intelligence
Using Apify, Firecrawl, and a local model to monitor every move your competitors make in real time. With the architecture and the weekly digest format that actually gets read.
Contents (8)
TL;DR. The competitive intelligence playbook of 2024 was a junior PM with a Google Alerts inbox. The 2026 version is a small agent that pulls competitor changelogs, pricing pages, social posts, hiring activity, and product releases on a daily schedule, deduplicates the signal, and delivers a single weekly digest worth opening. It costs electricity to run, surfaces the things that actually matter, and the founders who built it during 2025 are running rings around the ones still on Google Alerts. Here is the architecture.
What the old version got wrong
Google Alerts and its descendants (Mention, Brand24, Awario) sent volume. A startup tracking five competitors got 50-200 notifications a day. Most were noise: blog posts that mentioned the company in passing, recycled press releases, social posts about unrelated namespaces. The real signal (a pricing change, a stack rewrite, a key hire, a product launch) drowned in the noise.
The fix is not better filters. The fix is having the agent read each thing and decide whether it matters. LLMs in 2026 do this reliably enough on a four-class problem that the old volume-driven pattern is obsolete.
What actually matters
Five categories of competitor activity worth tracking. Everything else is noise.
| Category | What you watch | Frequency to check |
|---|---|---|
| Pricing changes | Pricing page diffs | Daily |
| Product launches | Changelog, release notes, blog posts | Daily |
| Hiring signals | Careers page, LinkedIn job posts | Weekly |
| Funding events | Press releases, Crunchbase, filings | Weekly |
| Strategic positioning | Homepage messaging, dev docs, demo content | Monthly |
The frequencies match the rate of actual change. Pricing pages are the highest-leverage category because they update frequently and tell you the most about positioning. Strategic-positioning changes are the lowest-leverage signal day-to-day but the highest-leverage signal in retrospect (a competitor pivoting their homepage messaging is leading the pivot).
The architecture
Three stages, an n8n-driven cron-and-queue architecture you can host anywhere.
[Stage 1] Schedulers (n8n cron flows)
- Daily: pricing pages, changelogs, blog feeds
- Weekly: careers pages, LinkedIn job listings, funding databases
- Monthly: homepage / docs / positioning content
[Stage 2] Crawl + diff
- Apify MCP server: structured scraping
- Firecrawl: deep crawls of multi-page resources
- Stored as dated snapshots in Postgres
- Diff computed against last snapshot
[Stage 3] Classifier + summarizer (small tool-calling model)
- Classifies each diff into one of five categories
- Summarizes the diff in one sentence
- Scores it for signal strength (0-3)
- Stores in the weekly digest queue
[Stage 4] Weekly digest generator
- Runs Sunday evening
- Filters to signal score >= 2
- Produces a single markdown email
- Sent via Listmonk to a private list (just me, mostly)
Four stages, two human-readable outputs (the daily Slack alerts for high-signal events, the weekly digest), one digest worth opening.
The classifier prompt
The classifier prompt is the load-bearing piece. It decides what to surface and what to ignore.
You are a competitive intelligence classifier.
Given a diff between two snapshots of a competitor's website, output:
{
"category": one of [pricing, product_launch, hiring, funding, positioning, noise],
"summary": one-sentence neutral description of what changed,
"signal_strength": 0 (noise), 1 (worth knowing), 2 (worth a digest entry), 3 (worth a Slack alert)
}
Decision rules:
- "noise" includes: typo fixes, image swaps, logo changes, navigation reordering, copy polishing
- "noise" gets signal_strength 0
- Pricing changes are at minimum signal_strength 2
- Product launches with named new features are signal_strength 3
- Hiring signals are signal_strength 1 unless 5+ senior roles posted in one week (then 2)
- Anything ambiguous: prefer the lower score
The signal-strength scale is non-negotiable. Without it the agent will surface everything it sees and you are back to Google Alerts. The hard cutoff at signal_strength >= 2 for the digest is what makes the digest readable.
The diff approach
Stage 2 is where most teams over-engineer. The simple version that works: store the entire HTML of the page in Postgres on every crawl, with a timestamp. Diff against the previous snapshot using HTML-aware diff (htmldiff or similar). Pass the diff to the classifier with the URL, the previous snapshot timestamp, and the current snapshot timestamp.
Two refinements worth adding:
Selector-scoped diffs. A pricing page has a .pricing-table selector. Scope the diff to that selector instead of the whole page. Filters out cookie banners, A/B test variations, and other non-meaningful changes.
Visual diff for design-heavy changes. When messaging changes shape (homepage hero, new section blocks, brand refresh), text diffs miss it. A monthly screenshot diff with a vision model (DeepSeek V4 Lite handles this well) catches what the text diff drops.
Neither refinement is required to start. The basic HTML diff plus the classifier handles 80% of the value. Add complexity only when you hit specific gaps.
What the weekly digest looks like
The format that actually gets read, after iterating on it for two months:
# Competitive digest, week of May 13, 2026
## Pricing (2)
- **Vendor A** dropped enterprise tier from $999/mo to $749/mo on May 10.
Removed the seat-based add-on. [snapshot diff]
- **Vendor B** added a new $19/mo solo tier on May 12, distinct from the
existing $49 starter. [snapshot diff]
## Product launches (1)
- **Vendor C** shipped a new "AI agents" feature on May 11. Demo video
shows tool calling and a marketplace. Direct overlap with our roadmap.
[release notes]
## Hiring (1)
- **Vendor A** posted 7 backend engineering roles in the last 7 days,
4 of them senior. Either ramping for a launch or replacing a team.
[careers page diff]
## Other items in the noise floor: 47
[link to full noise log]
Five real signal items. One link to the noise log if curious. Read time: under five minutes. The previous (manual) version of this took a junior PM 6-8 hours per week and missed half of what mattered.
What goes wrong
Three failure modes.
Cookie banners and CSP popups. Half of all crawled pages now have AI-generated cookie banners that change wording every week. Filter these aggressively in stage 2 before the diff hits the classifier. Otherwise the classifier wastes tokens explaining that "the cookie banner now says 'OK' instead of 'Accept'."
Bot detection. Apify and Firecrawl handle most of this, but some competitors detect aggressive crawling and start serving you dummy pages. If your daily diff is suspiciously empty, hand-check from a different IP. The fix is rotating proxies (Apify supports this) or backing off to weekly cadence on the affected target.
Classifier drift. After three months, the classifier starts marking things as noise that should be signal because the same edit pattern repeats. Re-baseline the prompt quarterly. Add new examples of edge cases that snuck through.
The takeaway
Competitive intelligence in 2026 is a small agent reading the web on a schedule, classifying what matters, and delivering a digest worth opening. Build it once. Maintain the prompt quarterly. Read the digest on Sunday evening. The competitors who are still on Google Alerts are sending email volume. The competitors who built this two quarters ago are taking their lunch.
Local-First AI
If this was useful, the weekly notes go deeper. No drip sequences, no upsells.
n8n templates, cost teardowns, and what is actually working in 2026. No drip sequences, no upsells. Reply to opt out.