01·2026·Designer, sole implementer·shipped

Salesforce Org Assessment Pipeline

A senior-led pipeline that compresses 2–3 weeks of analyst-hour discovery into a 1-week engagement, with findings strong enough to convert typical detail-work into real architectural revision and advisory.

Claude CodePythonSalesforce CLIElements.cloudMermaid

Discovery time1 weekwas 2–3 weeks

Team needed1 seniorwas team + weeks of synthesis

Cost of inaction surfaced2.9M NOK/yr

Context

A senior consultant on a system-audit engagement spends most of the engagement doing detail work: running tools, reconciling their outputs, cross-checking findings, and packaging the result into a document. The architectural judgment the client is paying for happens in the gaps between those tasks.

This pipeline closes that ratio. The detail work runs in a week, end-to-end, by one person. The architectural work gets the rest of the time.

The substrate underneath was built around one observation: in a typical assessment, every tool and every analyst pass produces output, and nothing reads across them. The synthesis happens twice: once to find the issues, then again to write the deliverable. When the engagement ends, the client gets a PDF that can't be re-run.

Problem

The villain: nothing composes. Every tool, skill, and analyst pass produces output, but the outputs never combine into one picture of the org. Five places this shows up:

No progressive accumulation. Each tool wrote its own report. Each analyst wrote their own notes. Nothing read across them. You could ask what did this tool find but never what do we know about this org.
No orientation layer. Analysts jumped into detailed assessments before understanding the shape of the org, so priorities came out wrong.
Communication was an afterthought. Every deliverable was a fresh re-synthesis of scattered findings. The analytical work got done twice.
No execution specification. The framework described what to do, never how. That prevented delegation and made it impossible to track what had been checked.
AI enrichment was uncontrolled. Out-of-the-box enrichment treated every component equally, made hundreds of calls, and produced descriptive output instead of diagnostic.

Each one keeps the senior in detail mode. None of them get fixed by trying harder. They get fixed by changing the substrate.

Shape of the solution

A layered pipeline where every layer has a single job and composes with the next. Structural harvest is free and fast. Orientation synthesizes the harvest into a 2-page org profile. Assessments run in parallel against the profile. Synthesis deduplicates across domains. Communication packages are a filtering operation over the accumulated findings register, not a new round of analysis.

flowchart TD
  A[Project setup] --> B[SFDX retrieve]
  B --> C[hardis structural]
  B --> D[hardis linting]
  B --> E[hardis diagnostics]
  A --> F[Elements intelligence]
  C --> G[Org profile]
  D --> G
  F --> G
  G --> H[Process mapping]
  G --> I[Selective AI enrichment]
  G --> J[Metadata assessment]
  G --> K[Code assessment]
  F --> L[Field assessment]
  F --> M[Permission assessment]
  G --> N[MDD assessment]
  J --> O[Synthesis]
  K --> O
  L --> O
  M --> O
  N --> O
  O --> P[Deliverables]

The Salesforce instantiation. The shape transfers.

Key design decisions

1. depends_on vs enriched_by as first-class dependency types. Hard dependencies block execution. Soft dependencies improve quality but never block. Process mapping is slow and human-in-the-loop; blocking every assessment on it would stall the pipeline. Instead, every domain assessment produces technically-valid findings first, then gets enriched with business context once mapping completes. This one distinction is the core of composability.

2. Model tiering instead of blanket AI enrichment. The default enrichment pattern in tools like this is one expensive call per item. On a few hundred components that's around $20 of compute and zero diagnostic value. I cut it. The pipeline runs cheap models for bulk description across the long tail and reserves the expensive model for diagnostic flags on the components orientation has identified as load-bearing. Compute cost falls roughly 85%. More importantly, the expensive calls now do diagnostic work the cheap pass can't.

3. Shared findings register over per-skill outputs. All skills write to a common JSON findings format with evidence arrays, process-context fields, and status. Cross-domain synthesis is a read operation over one file, not a manual reconciliation across seven report formats.

4. Runbooks are human-readable SOPs and agent-executable specs. Same markdown file with standard sections: Prerequisites, Steps, Verify, Outputs, Completion Criteria, Unblocks. A colleague can execute the step manually. An agent can parse the same file and dispatch. No separate "human" and "machine" documentation.

5. Orchestrator as convenience, not gatekeeper. An orchestrator skill reads the pipeline DAG and the assessment state file, determines unblocked steps, and dispatches. A human can bypass it and follow the runbooks manually. Both paths write to the same state file. The orchestrator never becomes a single point of failure.

Proof

Every runbook follows the same shape. The state file is the contract between executions.

That structure is the contract. A colleague can execute it from the markdown. An agent can parse the same file and dispatch the same step. Both write to a shared state file:

{
  "project": "nordic-broker",
  "steps": {
    "structural-harvest": {
      "status": "completed",
      "started_at": "2026-02-17T09:14:02Z",
      "completed_at": "2026-02-17T09:18:44Z",
      "executor": "joachim",
      "outputs": ["harvest/structural-report.csv"],
      "finding_counts": { "critical": 0, "warning": 47, "info": 113 }
    }
  }
}

The state file is short, about 30 lines for a typical engagement. Every step registers itself on completion, timestamps its outputs, and records finding counts by severity. Rerunning any step is idempotent.

Where it landed

Recent engagement: a Nordic insurance broker, mid-market scale. Full health review delivered across two weeks of calendar time, roughly a week of focused senior work. The pipeline produced 72 findings across security, data quality, automation, integrations, adoption, performance, and governance (12 critical, 20 high). The executive memo quantified the cost of inaction at roughly 2.9M NOK/year and laid out a phased remediation with 9–14 month payback.

The output reframed leadership's question from fix or replace? into a committed remediation pilot. The signal that mattered most was unscripted: during the findings preview, stakeholders surfaced problems they'd been struggling with for years that nobody had named. One requirement they'd been told by a previous partner couldn't be done turned out to be the next item in our migration roadmap. The pipeline's job is to make that conversation possible by 11am on day one, instead of by week six.

The hours the pipeline took off the senior's plate (manual triage, cross-tool reconciliation, deliverable formatting) were spent instead on the architectural work that produced the fix or replace answer. That trade is the case for the substrate.

Where this transfers

The Salesforce-specific bits are swappable. The substrate is shared state, composable skills, runbooks with named owners, cost-aware model tiering. Anywhere a growing operation has accumulated a sprawl of tools and nobody can answer what do we actually know about this system, the same shape applies: a recurring multi-system audit and improvement loop one person can run end-to-end, where every output is documented well enough to survive that person leaving.

Substitute the harvest step (export configurations from whichever systems matter), substitute the assessment domains, and the shape carries. The shared register, the runbook contract, and the dual execution paths are the parts that actually do the work.