Policy Engine Overview

Goal: Evaluate organisation policies deterministically against scanner SBOMs, Concelier advisories, and Excititor VEX evidence, then publish effective findings that downstream services can trust.

This document introduces the v2 Policy Engine: how the service fits into Stella Ops, the artefacts it produces, the contracts it honours, and the guardrails that keep policy decisions reproducible across air-gapped and connected deployments.


1 · Role in the Platform

  • Purpose: Compose policy verdicts by reconciling SBOM inventory, advisory metadata, VEX statements, and organisation rules.
  • Form factor: Dedicated .NET 10 Minimal API host (StellaOps.Policy.Engine) plus worker orchestration. Policies are defined in stella-dsl@1 packs compiled to an intermediate representation (IR) with a stable SHA-256 digest.
  • Tenancy: All workloads run under Authority-enforced scopes (policy:*, findings:read, effective:write). Only the Policy Engine identity may materialise effective findings collections.
  • Consumption: Findings ledger, Console, CLI, and Notify read the published effective_finding_{policyId} materialisations and policy run ledger (policy_runs).
  • Offline parity: Bundled policies import/export alongside advisories and VEX. In sealed mode the engine degrades gracefully, annotating explanations whenever cached signals replace live lookups.

2 · High-Level Architecture

flowchart LR
    subgraph Inputs
        A[Scanner SBOMs
Inventory & Usage] B[Concelier Advisories
Canonical linksets] C[Excititor VEX
Consensus status] D[Policy Packs
stella-dsl@1] end subgraph PolicyEngine["StellaOps.Policy.Engine"] P1[DSL Compiler
IR + Digest] P2[Joiners
SBOM ↔ Advisory ↔ VEX] P3[Deterministic Evaluator
Rule hits + scoring] P4[Materialisers
effective findings] P5[Run Orchestrator
Full & incremental] end subgraph Outputs O1[Effective Findings Collections] O2[Explain Traces
Rule hit lineage] O3[Metrics & Traces
policy_run_seconds,
rules_fired_total] O4[Simulation/Preview Feeds
CLI & Studio] end A --> P2 B --> P2 C --> P2 D --> P1 --> P3 P2 --> P3 --> P4 --> O1 P3 --> O2 P5 --> P3 P3 --> O3 P3 --> O4

3 · Core Concepts

ConceptDescription
Policy PackVersioned bundle of DSL documents, metadata, and checksum manifest. Packs import/export via CLI and Offline Kit bundles.
Policy DigestSHA-256 of the canonical IR; used for caching, explain trace attribution, and audit proofs.
Effective FindingsAppend-only Mongo collections (effective_finding_{policyId}) storing the latest verdict per finding, plus history sidecars.
Policy RunExecution record persisted in policy_runs capturing inputs, run mode, timings, and determinism hash.
Explain TraceStructured tree showing rule matches, data provenance, and scoring components for UI/CLI explain features.
SimulationDry-run evaluation that compares a candidate pack against the active pack and produces verdict diffs without persisting results.
Incident ModeElevated sampling/trace capture toggled automatically when SLOs breach; emits events for Notifier and Timeline Indexer.

4 · Inputs & Pre-processing

4.1 SBOM Inventory

  • Source: Scanner.WebService publishes inventory/usage SBOMs plus BOM-Index (roaring bitmap) metadata.
  • Consumption: Policy joiners use the index to expand candidate components quickly, keeping evaluation under the < 5 s warm path budget.
  • Schema: CycloneDX Protobuf + JSON views; Policy Engine reads canonical projections via shared SBOM adapters.

4.2 Advisory Corpus

  • Source: Concelier exports canonical advisories with deterministic identifiers, linksets, and equivalence tables.
  • Contract: Policy Engine only consumes raw content.raw, identifiers, and linkset fields per Aggregation-Only Contract (AOC); derived precedence remains a policy concern.

4.3 VEX Evidence

  • Source: Excititor consensus service resolves OpenVEX / CSAF statements, preserving conflicts.
  • Usage: Policy rules can require specific VEX vendors or justification codes; evaluator records when cached evidence substitutes for live statements (sealed mode).

4.4 Policy Packs

  • Authored in Policy Studio or CLI, validated against the stella-dsl@1 schema.
  • Compiler performs canonicalisation (ordering, defaulting) before emitting IR and digest.
  • Packs bundle scoring profiles, allowlist metadata, and optional reachability weighting tables.

5 · Evaluation Flow

  1. Run selection – Orchestrator accepts full, incremental, or simulate jobs. Incremental runs listen to change streams from Concelier, Excititor, and SBOM imports to scope re-evaluation.
  2. Input staging – Candidates fetched in deterministic batches; identity graph from Concelier strengthens PURL lookups.
  3. Rule execution – Evaluator walks rules in lexical order (first-match wins). Actions available: block, ignore, warn, defer, escalate, requireVex, each supporting quieting semantics where permitted.
  4. ScoringPolicyScoringConfig applies severity, trust, reachability weights plus penalties (warnPenalty, ignorePenalty, quietPenalty).
  5. Verdict and explain – Engine constructs PolicyVerdict records with inputs, quiet flags, unknown confidence bands, and provenance markers; explain trees capture rule lineage.
  6. Materialisation – Effective findings collections are upserted append-only, stamped with run identifier, policy digest, and tenant.
  7. Publishing – Completed run writes to policy_runs, emits metrics (policy_run_seconds, rules_fired_total, vex_overrides_total), and raises events for Console/Notify subscribers.

6 · Run Modes

ModeTriggerScopePersistenceTypical Use
FullManual CLI (stella policy run), scheduled nightly, or emergency rebaselineEntire tenantWrites effective findings and run recordAfter policy publish or major advisory/VEX import
IncrementalChange-stream queue driven by Concelier/Excititor/SBOM deltasOnly affected artefactsWrites effective findings and run recordContinuous upkeep; ensures SLA ≤ 5 min from source change
SimulateCLI/Studio preview, CI pipelinesCandidate subset (diff against baseline)No materialisation; produces explain & diff payloadsPolicy authoring, CI regression suites

All modes are cancellation-aware and checkpoint progress for replay in case of deployment restarts.


7 · Outputs & Integrations

  • APIs – Minimal API exposes policy CRUD, run orchestration, explain fetches, and cursor-based listing of effective findings (see /docs/api/policy.md once published).
  • CLIstella policy simulate/run/show commands surface JSON verdicts, exit codes, and diff summaries suitable for CI gating.
  • Console / Policy Studio – UI reads explain traces, policy metadata, approval workflow status, and simulation diffs to guide reviewers.
  • Findings Ledger – Effective findings feed downstream export, Notify, and risk scoring jobs.
  • Air-gap bundles – Offline Kit includes policy packs, scoring configs, and explain indexes; export commands generate DSSE-signed bundles for transfer.

8 · Determinism & Guardrails

  • Deterministic inputs – All joins rely on canonical linksets and equivalence tables; batches are sorted, and random/wall-clock APIs are blocked by static analysis plus runtime guards (ERR_POL_004).
  • Stable outputs – Canonical JSON serializers sort keys; digests recorded in run metadata enable reproducible diffs across machines.
  • Idempotent writes – Materialisers upsert using {policyId, findingId, tenant} keys and retain prior versions with append-only history.
  • Sandboxing – Policy evaluation executes in-process with timeouts; restart-only plug-ins guarantee no runtime DLL injection.
  • Compliance proof – Every run stores digest of inputs (policy, SBOM batch, advisory snapshot) so auditors can replay decisions offline.

9 · Security, Tenancy & Offline Notes

  • Authority scopes: Gateway enforces policy:read, policy:write, policy:simulate, policy:runs, findings:read, effective:write. Service identities must present DPoP-bound tokens.
  • Tenant isolation: Collections partition by tenant identifier; cross-tenant queries require explicit admin scopes and return audit warnings.
  • Sealed mode: In air-gapped deployments the engine surfaces sealed=true hints in explain traces, warning about cached EPSS/KEV data and suggesting bundle refreshes (see docs/airgap/airgap-mode.md).
  • Observability: Structured logs carry correlation IDs matching orchestrator job IDs; metrics integrate with OpenTelemetry exporters; sampled rule-hit logs redact policy secrets.
  • Incident response: Incident mode can be forced via API, boosting trace retention and notifying Notifier through policy.incident.activated events.

10 · Working with Policy Packs

  1. Author in Policy Studio or edit DSL files locally. Validate with stella policy lint.
  2. Simulate against golden SBOM fixtures (stella policy simulate --sbom fixtures/*.json). Inspect explain traces for unexpected overrides.
  3. Publish via API or CLI; Authority enforces review/approval workflows (draft → review → approve → rollout).
  4. Monitor the subsequent incremental runs; if determinism diff fails in CI, roll back pack while investigating digests.
  5. Bundle packs for offline sites with stella policy bundle export and distribute via Offline Kit.

11 · Compliance Checklist

  • [ ] Scopes enforced: Confirm gateway policy requires policy:* and effective:write scopes for all mutating endpoints.
  • [ ] Determinism guard active: Static analyzer blocks clock/RNG usage; CI determinism job diffing repeated runs passes.
  • [ ] Materialisation audit: Effective findings collections use append-only writers and retain history per policy run.
  • [ ] Explain availability: UI/CLI expose explain traces for every verdict; sealed-mode warnings display when cached evidence is used.
  • [ ] Offline parity: Policy bundles (import/export) tested in sealed environment; air-gap degradations documented for operators.
  • [ ] Observability wired: Metrics (policy_run_seconds, rules_fired_total, vex_overrides_total) and sampled rule hit logs emit to the shared telemetry pipeline with correlation IDs.
  • [ ] Documentation synced: API (/docs/api/policy.md), DSL grammar (/docs/policy/dsl.md), lifecycle (/docs/policy/lifecycle.md), and run modes (/docs/policy/runs.md) cross-link back to this overview.

Last updated: 2025-10-26 (Sprint 20).