B2B SaaS Outbound

The v1 hero score. A graph-decomposed cold-leads + reply-triage workflow that finds VPs of Engineering at series A/B SaaS companies, drafts personalised openers, writes them to a pipeline for human review, and classifies inbound replies.

Two scores ship: B2B SaaS Outbound (the cold-leads pipeline) and Reply triage (the inbound classification). Each is played by its own orchestra on an independent cron; they coordinate through the shared pipeline.

This page describes the reality on the lab box as of v0.2.x. For the underlying conceptual model, see Scores. For the editor that authors / clones / edits these graphs, see Composer.

What the score does

Sunday night you sign up, define your ICP, connect Gmail + Apollo, and clone the template. Monday at 09:00 your customised score runs. By Wednesday you have replies. By Friday you have a booked demo. The Pipeline shows you exactly what Maestro did, separate from any other system you have.

That’s the bar. The architecture changed substantially between v1 and v0.2 (LLM-loop → graph orchestrator), but the operator experience is the same: configure, run, review.

Cold-leads-v2 graph

11 nodes, 16 edges, runs in ~22 seconds on the lab box for ~$0.04 per run.

[deterministic] apollo.find_leads (10 candidates)
        │
        ▼
   [llm] Shortlist (filter to top 3, no tools — pure reasoning)
        │
        ▼
[control:map_over] Per-lead loop (iterate over the 3 candidates)
        │            │
        │            └─→ (per iteration body, runs 3×):
        │                 [det] pipeline.find_contact (cheap dedup; halt iter if exists)
        │                          │
        │                          ▼
        │                 [det] apollo.enrich_one (single-record enrichment)
        │                          │
        │                          ▼
        │                 [det] pipeline.find_contact (by email; second dedup pass)
        │                          │
        │                          ├─→ [det] apollo.enrich_domain (company context)
        │                          │
        │                          └─→ [llm] Draft opener (subject + body, 1 call)
        │                                 │
        │                                 ▼
        │                          [det] pipeline.add_contact (stage=ready)
        │                                 │
        │                                 ▼
        │                          [det] pipeline.log_activity (kind=note, draft=true)
        │
        ▼
[deterministic] notify.send_event ("3 drafts ready for review")

Two LLM nodes (Shortlist + Draft opener), eight deterministic nodes, one control node. The LLM-loop predecessor of this score burned ~110K Anthropic tokens per run; this graph burns ~5.4K. The deterministic nodes coordinate via the orchestrator; the LLM nodes only fire when reasoning is genuinely needed.

Reply-triage-v2 graph

7 nodes, 10 edges, runs in ~14 seconds per inbox poll.

[deterministic] gmail.list_inbox (recent threads)
        │
        ▼
[control:map_over] Per-thread loop
        │
        └─→ (per thread, in parallel-ish):
             [det] pipeline.find_contact (skip thread if sender isn't a tracked contact)
                    │
                    ▼
             [det] gmail.read_thread (latest message body)
                    │
                    ▼
             [llm] Classify intent (1 LLM call → intent + new_stage + notify)
                    │
                    ├─→ [det] pipeline.log_activity (triaged + stage advance)
                    │
                    └─→ [det] notify.send_attention (only when intent is "interested" or "needs_review")

Single LLM call per thread classifies intent (interested / not_interested / out_of_office / wrong_person / unsubscribe / needs_review) and emits the new pipeline stage in the same JSON response. Only actionable intents fire a notification — silent classifications log to the pipeline without pinging the operator.

Stage progression

The pipeline’s stages model the contact lifecycle:

new → enriching → ready → contacted → replied → triaged → booked
                                                disqualified
                                                unsubscribed

Cold-leads writes contacts at stage ready (draft prepared, awaiting human review). When the operator hits Send via Gmail on the contact’s draft card, the activity flips to contacted and the stage advances. Reply-triage classifies inbound replies → maps to replied / disqualified / unsubscribed / null (out_of_office stays put for next-poll handling).

Event	Stage transition	Logged activity
cold-leads writes draft	`new` → `ready`	`note` (with `payload.draft: true`)
operator clicks Send via Gmail	`ready` → `contacted`	activity flips to `contacted`
reply-triage: `interested` or `needs_review`	`contacted` → `replied`	`triaged` + notification
reply-triage: `not_interested` / `wrong_person`	`contacted` → `disqualified`	`triaged` (silent)
reply-triage: `unsubscribe`	`contacted` → `unsubscribed`	`triaged` (silent)
reply-triage: `out_of_office`	unchanged	none (will reprocess on next run)
operator manually books a meeting	`replied` → `booked`	manual `note` (UI not yet wired)

Configuring for your install

Two surfaces, each handles a different layer of customisation.

1. Tune the agents (Composer or `/agents`)

The shipped score graphs reference three seeded agents (the LLM-node reasoning units):

Shortlist — filters Apollo’s 10 candidates down to top 3 fitting the ICP.
Draft opener — writes the cold-outreach email subject + body grounded in lead + company signal.
Classify intent — reply-triage’s classification step.

Each ships with example sender context, an example ICP, and the lab-box’s prompt language. For your install:

Open /agents in the dashboard.
Click Shortlist → edit the system prompt’s title-list, seniority filter, headcount range to match your ICP.
Click Draft opener → edit the sender persona block (your name, your company, your URL, your value prop). Edit the constraint block if you want different style guidelines.
Click Classify intent if you want to customise the intent → stage mapping (you usually don’t).

Save each. Each agent’s version bumps; next runs of cold-leads-v2 / reply-triage-v2 use the new prompts.

2. Tune the score graph (Composer)

Less common, but available when you want behaviour changes the prompts can’t express. Open /composer/score_cold_leads_v2, click Edit, and:

Drag in additional skills (e.g. web_research.search for richer pre-draft context).
Adjust per-node config (e.g. apollo.find_leads arguments to change per_page, geographic filter, exclusion lists).
Edit the Per-lead loop map_over’s body_node_ids to add or remove iteration steps.
Re-run dagre’s auto-layout via the Reset Layout button after edits.

Save commits a single version-bump to the score. Next run uses the new graph.

3. Configure the orchestra (`/orchestras/$id/settings`)

Each score is played by an orchestra — the cron-attached deployment of the score in your workspace. To customise:

Cold leads · SaaS — open the orchestra from the sidebar’s “Orchestras · N” list, click Settings. Adjust the cron (default 0 9 * * MON-FRI), update the description, change the tag.
Reply triage — same flow. Default cron */30 * * * * (every 30 minutes).

The settings page only handles orchestra metadata (name, cron, description). Score graph edits go through the Composer; agent edits go through /agents.

4. Add the secrets

The shipped scores reference these skills, each requiring credentials:

apollo → apollo_api_key (see docs/skills/apollo.md)
gmail → Google OAuth (google_oauth_client_id + google_oauth_client_secret, then connect via OAuth flow — see docs/skills/gmail.md)
pipeline and notify are built-in (no secrets)

Anthropic’s ANTHROPIC_API_KEY is configured at install time.

5. Enable the orchestras

Both orchestras ship enabled: false so a fresh install never auto-fires emails. Open each orchestra’s dashboard and click the Disabled toggle to flip it to Enabled.

Customising for your own audience

The most common path to a custom hero score: clone, then edit.

Open /composer. Click Clone on the B2B SaaS Outbound template row.
Workspace copy is created with is_template: false, source_score_id pointing at the original. Opens in edit mode.
The clone’s LLM-node agents are also cloned (Shortlist-copy, Draft opener-copy). Edit them on /agents for your ICP.
Make any graph edits in the Composer.
Save.
From the sidebar’s ”+ New” button (Orchestras section), open the OrchestraBuilder and create an orchestra pointed at the clone. Disable the original orchestra if you want only the clone to fire on cron.

The original template stays untouched, so you can re-clone later if you mess up.

Cost estimate

Per cold-leads-v2 run drafting 3 openers:

Operation	Calls	Approx cost
`apollo.find_leads`	1	covered by Apollo plan
`apollo.enrich_one` (per-record)	~3	covered by Apollo plan (3 credits)
`apollo.enrich_domain`	~3	covered by Apollo plan (3 credits)
`compose.draft_personalized_opener` (Sonnet via Draft-opener agent)	3	~$0.025
`pipeline.*` writes	~10	free (Postgres)
Anthropic Shortlist call (Sonnet)	1	~$0.008
Anthropic Draft opener calls (Sonnet)	3	~$0.025
Total Anthropic	—	~$0.04

Total runtime: ~22 seconds end-to-end. The 9× cost reduction over the v1 LLM-loop predecessor comes from removing the per-step orchestration overhead — the v1 agent loop spent ~110K tokens “thinking about what to do next” between each tool call. The graph orchestrator just walks the graph; only the two LLM nodes burn tokens, and only when they’re genuinely reasoning.

Reply-triage-v2 runs cost ~$0.01–$0.02 each (smaller per-thread work, single LLM call per thread).

At the default schedules (cold-leads weekday mornings, reply-triage every 30 min), monthly Anthropic spend lands around $2–5 before scaling sends.

Verifying the score works

After installing (and with the agents tuned for your ICP):

Open the Cold leads · SaaS orchestra dashboard. Click Run now.
Watch the run timeline stream. Should show ~10 deterministic step pills + 4 LLM step pills (Shortlist call once + Draft opener call ~3×).
Bell icon in the topbar shows an unread event notification: “Cold leads run complete — drafts ready for review.”
Click into the SaaS pipeline. ~3 new contacts at stage ready.
Open one contact. Draft (subject + body) renders in a brass-bordered review card above the activity timeline. Read it.
Optional: edit the subject or body inline.
Click Send via Gmail. Within ~2 seconds the card disappears, the activity flips to contacted, the contact’s stage advances to contacted, and the email lands in your Gmail Sent folder.
Reply to that email yourself with “yes, would Tuesday at 2pm work?” Open the Reply triage orchestra dashboard, click Run now.
Bell icon pings: “Sarah Chen replied — needs response.”
Pipelines UI — the contact advanced to stage replied with a triaged activity logged carrying the classifier’s intent + confidence + explanation.

That’s the full hero-score loop, end-to-end, on real infrastructure.

What’s NOT in this score’s v1 scope

Auto-drafted replies. Reply-triage classifies and advances stages but doesn’t draft responses. The human writes the actual reply. Stays human in v1 to protect sender reputation; auto-replies require voice modeling + a feedback loop that doesn’t exist yet.
Paced sending across hours (gmail.queue_paced_send). The current Send via Gmail dispatches synchronously when the operator clicks. Pacing across a day requires a persistent send queue + scheduler.
Multi-pipeline scores. A single score writes to a single pipeline (pl_saas). To run “Cold leads · Healthcare” alongside SaaS, clone the score graph in the Composer, edit the agent prompts for the healthcare ICP, and create a second orchestra pointed at the clone.
Inline run-replay / step-through debugging on the canvas. Run details are on the orchestra dashboard, not the Composer canvas. Linking step rows back to the canvas’s nodes is a future polish.
A/B variants of the score — fork via clone for now; first-class A/B (split-traffic across two graph variants) lands in v2+.

Scores — the conceptual model.
Composer — the editor surface for graph edits.
Agents — workspace LLM-node configs (the Shortlist / Draft opener / Classify intent agents).
Orchestras — cron deployments that play scores.
Pipelines — the data model the score reads + writes.
Apollo / Compose / Gmail / Pipeline — the four skills the graphs use.