VEX Evidence Playbook (Bench Repo Blueprint)

Status: Draft – aligns with the “provable vulnerability decisions” advisory (Nov 2025).
Owners: Policy Guild · VEX Lens Guild · CLI Guild · Docs Guild.

This playbook defines the public benchmark repository layout, artifact shapes, verification tooling, and metrics that prove Stella Ops VEX decisions are reproducible, portable, and superior to baseline scanners. Treat it as the contract for every guild contributing artifacts to bench/.


1. Repository layout

bench/
  README.md                  # repo overview + quickstart
  findings/
    CVE-YYYY-NNNNN/          # one folder per advisory/product tuple
      evidence/
        reachability.json    # static+runtime call graph for the finding
        sbom.cdx.json        # CycloneDX slice containing the involved components
      decision.openvex.json  # OpenVEX statement (status + justification)
      decision.dsse.json     # DSSE envelope wrapping the OpenVEX payload
      rekor.txt              # optional Rekor UUID/index/checkpoint
      metadata.json          # producer info (policy rev, analyzer digests, CAS URIs)
  tools/
    verify.sh                # shell helper: dsse verify + optional rekor verification
    verify.py                # python verifier (offline) that recomputes digests
    compare.py               # baseline diff against Trivy/Syft/Grype/Snyk/Xray outputs
    replay.sh                # reruns reachability graphs via `stella replay`
  results/
    summary.csv              # FP reduction, MTTD, reproducibility metrics
    runs/2025-11-10/         # pinned scanner/policy versions + raw outputs
      stella/
        findings.json
        runtime-facts.ndjson
        reachability.manifest.json
      trivy/
        findings.json
      ...

File contracts

  • reachability.json is the canonical export from cas://reachability/graphs/... with symbol IDs, call edges, runtime hits, analyzer fingerprints, and CAS references.
  • decision.openvex.json follows OpenVEX v1 with Stella Ops-specific status_notes, justification, impact_statement, and action_statement text.
  • decision.dsse.json is the DSSE envelope returned by Signer (see §3). Always include the PEM cert chain (keyless) or KMS key id.
  • rekor.txt captures {uuid, logIndex, checkpoint} from Attestor when the decision is logged to Rekor.
  • metadata.json binds the DSSE payload back to internal evidence: {policy_revision, reachability_graph_sha256, runtime_trace_sha256, evidence_cas_uri[], analyzer_versions[], createdBy, createdAt}.

2. Evidence production flow

  1. Scanner Worker
    • Generate reachability.json + sbom.cdx.json per prioritized CVE.
    • Store artifacts under CAS and surface URIs via ReachabilityReplayWriter.
  2. Policy Engine / VEXer
    • Evaluate reachability states + policy lattice to produce an OpenVEX statement.
    • Persist decision.openvex.json and forward it to Signer.
  3. Signer & Attestor
    • Sign the OpenVEX payload via DSSE (payloadType: application/vnd.in-toto+json) and return decision.dsse.json.
    • Optionally call Attestor to log the DSSE bundle to Rekor; write {uuid, logIndex, checkpoint} to rekor.txt.
  4. Bench harness
    • Collect SBOM slice, reachability proof, OpenVEX, DSSE, Rekor metadata, and companion metrics into bench/findings/CVE-....
    • Record tool versions + CAS digests under metadata.json.

All steps must be deterministic: repeated scans with the same inputs produce identical artifacts and digests.


3. Signing & transparency requirements

ArtifactProducerFormatNotes
Reachability evidenceScannerCanonical JSON (sorted keys)CAS URI recorded in metadata.
SBOM sliceScannerCycloneDX 1.6 JSONKeep only components relevant to the finding.
OpenVEX decisionPolicy/VEXerOpenVEX v1One statement per (CVE, product) tuple.
DSSE bundleSignerDSSE envelope over OpenVEX payloadInclude Fulcio cert or KMS key id.
Rekor record (optional)AttestorRekor UUID/index/checkpointStore alongside DSSE for offline verification.

Signer must expose a predicate alias stella.ops/vexDecision@v1 (see Sprint task SIGN-VEX-401-018). Payload = OpenVEX JSON. Rekor logging reuses the existing Attestor /rekor/entries pipeline.


4. Verification tooling

The repo ships two verifiers:

  1. tools/verify.sh (bash) — wraps cosign verify-attestation/in-toto verify, Rekor inclusion checks (rekor-cli logproof), and digest comparison.
  2. tools/verify.py — pure-Python offline verifier for air-gapped environments:
    • Validates DSSE signature using the embedded Fulcio cert or configured root.
    • Recomputes sha256 over reachability.json, sbom.cdx.json, and decision.openvex.json to ensure the DSSE payload matches.
    • Optionally replays reachability by invoking stella replay --manifest ... --finding CVE-....

CLI addition (stella decision verify) should shell out to these helpers when --from bench is provided.


5. Metrics & comparison harness

tools/compare.py ingests raw outputs from Stella Ops and baseline scanners (Trivy, Syft, Grype, Snyk, Xray) stored under results/runs/<date>/<scanner>/findings.json. For each target:

  • False-positive reduction (FPR) = 1 - (# of findings confirmed true positives / # of baseline findings).
  • Mean time to decision (MTTD) = average wall-clock time between scan start and DSSE-signed OpenVEX emission.
  • Reproducibility score = 1 if re-running reachability produces identical digests for all artifacts, else 0; aggregated per run.

results/summary.csv columns:

target,cve,baseline_scanner,baseline_hits,stella_hits,fp_reduction,mttd_seconds,reproducible,rekor_uuid

Automate collection via Makefile or bench/run.sh pipeline (task BENCH-AUTO-401-019).


6. Publication & README checklist

bench/README.md must include:

  • High-level workflow diagram (scan → reachability → OpenVEX → DSSE → Rekor → bench).
  • Prerequisites (cosign, rekor-cli, stella CLI).
  • Quickstart commands:
    ./tools/verify.sh CVE-2023-12345 pkg:purl/example@1.2.3
    ./tools/compare.py --target sample/nginx --baseline trivy --run 2025-11-10
    
  • How to recreate a finding: stella replay --manifest results/runs/.../replay.yaml --finding CVE-....
  • Contribution guide (where to place new findings, how to update metrics, required metadata).

7. Implementation tasks (see Sprint 401+)

  • POLICY-VEX-401-010 — emit OpenVEX per finding and publish to bench repo.
  • SIGN-VEX-401-018 — add DSSE predicate + Rekor logging for decision payloads.
  • CLI-VEX-401-011 — new stella decision verbs (export, verify, compare).
  • BENCH-AUTO-401-019 — automation to populate bench/findings/**, run baseline scanners, and update results/summary.csv.
  • DOCS-VEX-401-012 — maintain this playbook + README templates, document verification workflow.

Update docs/implplan/SPRINT_0401_0001_0001_reachability_evidence_chain.md whenever these tasks move state.