component_architecture_scanner.md — Stella Ops Scanner (2025Q4)

Aligned with Epic 6 – Vulnerability Explorer and Epic 10 – Export Center.

Scope. Implementation‑ready architecture for the Scanner subsystem: WebService, Workers, analyzers, SBOM assembly (inventory & usage), per‑layer caching, three‑way diffs, artifact catalog (RustFS default + Mongo, S3-compatible fallback), attestation hand‑off, and scale/security posture. This document is the contract between the scanning plane and everything else (Policy, Excititor, Concelier, UI, CLI).


0) Mission & boundaries

Mission. Produce deterministic, explainable SBOMs and diffs for container images and filesystems, quickly and repeatedly, without guessing. Emit two views: Inventory (everything present) and Usage (entrypoint closure + actually linked libs). Attach attestations through Signer→Attestor→Rekor v2.

Boundaries.

  • Scanner does not produce PASS/FAIL. The backend (Policy + Excititor + Concelier) decides presentation and verdicts.
  • Scanner does not keep third‑party SBOM warehouses. It may bind to existing attestations for exact hashes.
  • Core analyzers are deterministic (no fuzzy identity). Optional heuristic plug‑ins (e.g., patch‑presence) run under explicit flags and never contaminate the core SBOM.

1) Solution & project layout

src/
 ├─ StellaOps.Scanner.WebService/            # REST control plane, catalog, diff, exports
 ├─ StellaOps.Scanner.Worker/                # queue consumer; executes analyzers
 ├─ StellaOps.Scanner.Models/                # DTOs, evidence, graph nodes, CDX/SPDX adapters
 ├─ StellaOps.Scanner.Storage/               # Mongo repositories; RustFS object client (default) + S3 fallback; ILM/GC
 ├─ StellaOps.Scanner.Queue/                 # queue abstraction (Redis/NATS/RabbitMQ)
 ├─ StellaOps.Scanner.Cache/                 # layer cache; file CAS; bloom/bitmap indexes
 ├─ StellaOps.Scanner.EntryTrace/            # ENTRYPOINT/CMD → terminal program resolver (shell AST)
 ├─ StellaOps.Scanner.Analyzers.OS.[Apk|Dpkg|Rpm]/
 ├─ StellaOps.Scanner.Analyzers.Lang.[Java|Node|Python|Go|DotNet|Rust]/
 ├─ StellaOps.Scanner.Analyzers.Native.[ELF|PE|MachO]/   # PE/Mach-O planned (M2)
 ├─ StellaOps.Scanner.Symbols.Native/                    # NEW – native symbol reader/demangler (Sprint 401)
 ├─ StellaOps.Scanner.CallGraph.Native/                  # NEW – function/call-edge builder + CAS emitter
 ├─ StellaOps.Scanner.Emit.CDX/              # CycloneDX (JSON + Protobuf)
 ├─ StellaOps.Scanner.Emit.SPDX/             # SPDX 3.0.1 JSON
 ├─ StellaOps.Scanner.Diff/                  # image→layer→component three‑way diff
 ├─ StellaOps.Scanner.Index/                 # BOM‑Index sidecar (purls + roaring bitmaps)
 ├─ StellaOps.Scanner.Tests.*                # unit/integration/e2e fixtures
 └─ Tools/
     ├─ StellaOps.Scanner.Sbomer.BuildXPlugin/   # BuildKit generator (image referrer SBOMs)
     └─ StellaOps.Scanner.Sbomer.DockerImage/    # CLI‑driven scanner container

Analyzer assemblies and buildx generators are packaged as restart-time plug-ins under plugins/scanner/** with manifests; services must restart to activate new plug-ins.

1.2 Native reachability upgrades (Nov 2026)

  • Stripped-binary pipeline: native analyzers must recover functions even without symbols (prolog patterns, xrefs, PLT/GOT, vtables). Emit a tool-agnostic neutral JSON (NJIF) with functions, CFG/CG, and evidence tags. Keep heuristics deterministic and record toolchain hashes in the scan manifest.
  • Synthetic roots: treat .preinit_array, .init_array, legacy .ctors, and _init as graph entrypoints; add roots for constructors in each DT_NEEDED dependency. Tag edges from these roots with phase=load for explainers.
  • Build-id capture: read .note.gnu.build-id for every ELF, store hex build-id alongside soname/path, propagate into SymbolID/code_id, and expose it to SBOM + runtime joiners. If missing, fall back to file hash and mark source accordingly.
  • PURL-resolved edges: annotate call edges with the callee purl and symbol_digest so graphs merge with SBOM components. See docs/reachability/purl-resolved-edges.md for schema rules and acceptance tests.
  • Unknowns emission: when symbol → purl mapping or edge targets remain unresolved, emit structured Unknowns to Signals (see docs/signals/unknowns-registry.md) instead of dropping evidence.
  • Hybrid attestation: emit graph-level DSSE for every richgraph-v1 (mandatory) and optional edge-bundle DSSE (≤512 edges) for runtime/init-root/contested edges or third-party provenance. Publish graph DSSE digests to Rekor by default; edge-bundle Rekor publish is policy-driven. CAS layout: cas://reachability/graphs/{blake3} for graph body, .../{blake3}.dsse for envelope, and cas://reachability/edges/{graph_hash}/{bundle_id}[.dsse] for bundles. Deterministic ordering before hashing/signing is required.
  • Deterministic call-graph manifest: capture analyzer versions, feed hashes, toolchain digests, and flags in a manifest stored alongside richgraph-v1; replaying with the same manifest MUST yield identical node/edge sets and hashes (see docs/reachability/lead.md).

1.1 Queue backbone (Redis / NATS)

StellaOps.Scanner.Queue exposes a transport-agnostic contract (IScanQueue/IScanQueueLease) used by the WebService producer and Worker consumers. Sprint 9 introduces two first-party transports:

  • Redis Streams (default). Uses consumer groups, deterministic idempotency keys (scanner:jobs:idemp:*), and supports lease claim (XCLAIM), renewal, exponential-backoff retries, and a scanner:jobs:dead stream for exhausted attempts.
  • NATS JetStream. Provisions the SCANNER_JOBS work-queue stream + durable consumer scanner-workers, publishes with MsgId for dedupe, applies backoff via NAK delays, and routes dead-lettered jobs to SCANNER_JOBS_DEAD.

Metrics are emitted via Meter counters (scanner_queue_enqueued_total, scanner_queue_retry_total, scanner_queue_deadletter_total), and ScannerQueueHealthCheck pings the active backend (Redis PING, NATS PING). Configuration is bound from scanner.queue:

scanner:
  queue:
    kind: redis # or nats
    redis:
      connectionString: "redis://queue:6379/0"
      streamName: "scanner:jobs"
    nats:
      url: "nats://queue:4222"
      stream: "SCANNER_JOBS"
      subject: "scanner.jobs"
      durableConsumer: "scanner-workers"
      deadLetterSubject: "scanner.jobs.dead"
    maxDeliveryAttempts: 5
    retryInitialBackoff: 00:00:05
    retryMaxBackoff: 00:02:00

The DI extension (AddScannerQueue) wires the selected transport, so future additions (e.g., RabbitMQ) only implement the same contract and register.

Runtime form‑factor: two deployables

  • Scanner.WebService (stateless REST)
  • Scanner.Worker (N replicas; queue‑driven)

2) External dependencies

  • OCI registry with Referrers API (discover attached SBOMs/signatures).
  • RustFS (default, offline-first) for SBOM artifacts; optional S3/MinIO compatibility retained for migration; Object Lock semantics emulated via retention headers; ILM for TTL.
  • MongoDB for catalog, job state, diffs, ILM rules.
  • Queue (Redis Streams/NATS/RabbitMQ).
  • Authority (on‑prem OIDC) for OpToks (DPoP/mTLS).
  • Signer + Attestor (+ Fulcio/KMS + Rekor v2) for DSSE + transparency.

3) Contracts & data model

3.1 Evidence‑first component model

Nodes

  • Image, Layer, File
  • Component (purl?, name, version?, type, id — may be bin:{sha256})
  • Executable (ELF/PE/Mach‑O), Library (native or managed), EntryScript (shell/launcher)

Edges (all carry Evidence)

  • contains(Image|Layer → File)
  • installs(PackageDB → Component) (OS database row)
  • declares(InstalledMetadata → Component) (dist‑info, pom.properties, deps.json…)
  • links_to(Executable → Library) (ELF DT_NEEDED, PE imports)
  • calls(EntryScript → Program) (file:line from shell AST)
  • attests(Rekor → Component|Image) (SBOM/predicate binding)
  • bound_from_attestation(Component_attested → Component_observed) (hash equality proof)

Evidence

{ source: enum, locator: (path|offset|line), sha256?, method: enum, timestamp }

No confidences. Either a fact is proven with listed mechanisms, or it is not claimed.

3.2 Catalog schema (Mongo)

  • artifacts

    { _id, type: layer-bom|image-bom|diff|index,
      format: cdx-json|cdx-pb|spdx-json,
      bytesSha256, size, rekor: { uuid,index,url }?,
      ttlClass, immutable, refCount, createdAt }
    
  • images { imageDigest, repo, tag?, arch, createdAt, lastSeen }

  • layers { layerDigest, mediaType, size, createdAt, lastSeen }

  • links { fromType, fromDigest, artifactId } // image/layer -> artifact

  • jobs { _id, kind, args, state, startedAt, heartbeatAt, endedAt, error }

  • lifecycleRules { ruleId, scope, ttlDays, retainIfReferenced, immutable }

  • ruby.packages { _id: scanId, imageDigest, generatedAtUtc, packages[] } // decoded RubyPackageInventory documents for CLI/Policy reuse

3.3 Object store layout (RustFS)

layers/<sha256>/sbom.cdx.json.zst
layers/<sha256>/sbom.spdx.json.zst
images/<imgDigest>/inventory.cdx.pb            # CycloneDX Protobuf
images/<imgDigest>/usage.cdx.pb
indexes/<imgDigest>/bom-index.bin              # purls + roaring bitmaps
diffs/<old>_<new>/diff.json.zst
attest/<artifactSha256>.dsse.json              # DSSE bundle (cert chain + Rekor proof)

RustFS exposes a deterministic HTTP API (PUT|GET|DELETE /api/v1/buckets/{bucket}/objects/{key}). Scanner clients tag immutable uploads with X-RustFS-Immutable: true and, when retention applies, X-RustFS-Retain-Seconds: <ttlSeconds>. Additional headers can be injected via scanner.artifactStore.headers to support custom auth or proxy requirements. Legacy MinIO/S3 deployments remain supported by setting scanner.artifactStore.driver = "s3" during phased migrations.


4) REST API (Scanner.WebService)

All under /api/v1/scanner. Auth: OpTok (DPoP/mTLS); RBAC scopes.

POST /scans                        { imageRef|digest, force?:bool } → { scanId }
GET  /scans/{id}                   → { status, imageDigest, artifacts[], rekor? }
GET  /sboms/{imageDigest}          ?format=cdx-json|cdx-pb|spdx-json&view=inventory|usage → bytes
GET  /scans/{id}/ruby-packages     → { scanId, imageDigest, generatedAt, packages[] }
GET  /diff?old=<digest>&new=<digest>&view=inventory|usage → diff.json
POST /exports                      { imageDigest, format, view, attest?:bool } → { artifactId, rekor? }
POST /reports                      { imageDigest, policyRevision? } → { reportId, rekor? }   # delegates to backend policy+vex
GET  /catalog/artifacts/{id}       → { meta }
GET  /healthz | /readyz | /metrics

Report events

When scanner.events.enabled = true, the WebService serialises the signed report (canonical JSON + DSSE envelope) with NotifyCanonicalJsonSerializer and publishes two Redis Stream entries (scanner.report.ready, scanner.scan.completed) to the configured stream (default stella.events). The stream fields carry the whole envelope plus lightweight headers (kind, tenant, ts) so Notify and UI timelines can consume the event bus without recomputing signatures. Publish timeouts and bounded stream length are controlled via scanner:events:publishTimeoutSeconds and scanner:events:maxStreamLength. If the queue driver is already Redis and no explicit events DSN is provided, the host reuses the queue connection and auto-enables event emission so deployments get live envelopes without extra wiring. Compose/Helm bundles expose the same knobs via the SCANNER__EVENTS__* environment variables for quick tuning.


5) Execution flow (Worker)

5.1 Acquire & verify

  1. Resolve image (prefer repo@sha256:…).
  2. (Optional) verify image signature per policy (cosign).
  3. Pull blobs, compute layer digests; record metadata.

5.2 Layer union FS

  • Apply whiteouts; materialize final filesystem; map file → first introducing layer.
  • Windows layers (MSI/SxS/GAC) planned in M2.

5.3 Evidence harvest (parallel analyzers; deterministic only)

A) OS packages

  • apk: /lib/apk/db/installed
  • dpkg: /var/lib/dpkg/status, /var/lib/dpkg/info/*.list
  • rpm: /var/lib/rpm/Packages (via librpm or parser)
  • Record name, version (epoch/revision), arch, source package where present, and declared file lists.

Data flow note: Each OS analyzer now writes its canonical output into the shared ScanAnalysisStore under analysis.os.packages (raw results), analysis.os.fragments (per-analyzer layer fragments), and contributes to analysis.layers.fragments (the aggregated view consumed by emit/diff pipelines). Helpers in ScanAnalysisCompositionBuilder convert these fragments into SBOM composition requests and component graphs so the diff/emit stages no longer reach back into individual analyzer implementations.

B) Language ecosystems (installed state only)

  • Java: META-INF/maven/*/pom.properties, MANIFEST → pkg:maven/...
  • Node: node_modules/**/package.jsonpkg:npm/...
  • Python: *.dist-info/{METADATA,RECORD}pkg:pypi/...
  • Go: Go buildinfo in binaries → pkg:golang/...
  • .NET: *.deps.json + assembly metadata → pkg:nuget/...
  • Rust: crates only when explicitly present (embedded metadata or cargo/registry traces); otherwise binaries reported as bin:{sha256}.

Rule: We only report components proven on disk with authoritative metadata. Lockfiles are evidence only.

C) Native link graph

  • ELF: parse PT_INTERP, DT_NEEDED, RPATH/RUNPATH, GNU symbol versions; map SONAMEs to file paths; link executables → libs.
  • PE/Mach‑O (planned M2): import table, delay‑imports; version resources; code signatures.
  • Map libs back to OS packages if possible (via file lists); else emit bin:{sha256} components.
  • The exported metadata (stellaops.os.* properties, license list, source package) feeds policy scoring and export pipelines directly – Policy evaluates quiet rules against package provenance while Exporters forward the enriched fields into downstream JSON/Trivy payloads.
  • Reachability lattice: analyzers + runtime probes emit Evidence/Mitigation records (see docs/reachability/lattice.md). The lattice engine joins static path evidence, runtime hits (EventPipe/JFR), taint flows, environment gates, and mitigations into ReachDecision documents that feed VEX gating and event graph storage.
  • Sprint 401 introduces StellaOps.Scanner.Symbols.Native (DWARF/PDB reader + demangler) and StellaOps.Scanner.CallGraph.Native (function boundary detector + call-edge builder). These libraries feed FuncNode/CallEdge CAS bundles and enrich reachability graphs with {code_id, confidence, evidence} so Signals/Policy/UI can cite function-level justifications.

D) EntryTrace (ENTRYPOINT/CMD → terminal program)

  • Read image config; parse shell (POSIX/Bash subset) with AST: source/. includes; case/if; exec/command; run‑parts.
  • Resolve commands via PATH within the built rootfs; follow language launchers (Java/Node/Python) to identify the terminal program (ELF/JAR/venv script).
  • Record file:line and choices for each hop; output chain graph.
  • Unresolvable dynamic constructs are recorded as unknown edges with reasons (e.g., $FOO unresolved).

E) Attestation & SBOM bind (optional)

  • For each file hash or binary hash, query local cache of Rekor v2 indices; if an SBOM attestation is found for exact hash, bind it to the component (origin=attested).
  • For the image digest, likewise bind SBOM attestations (build‑time referrers).

5.4 Component normalization (exact only)

  • Create Component nodes only with deterministic identities: purl, or bin:{sha256} for unlabeled binaries.
  • Record origin (OS DB, installed metadata, linker, attestation).

5.5 SBOM assembly & emit

  • Per-layer SBOM fragments: components introduced by the layer (+ relationships).
  • Image SBOMs: merge fragments; refer back to them via CycloneDX BOM‑Link (or SPDX ExternalRef).
  • Emit both Inventory & Usage views.
  • When the native analyzer reports an ELF buildId, attach it to component metadata and surface it as stellaops:buildId in CycloneDX properties (and diff metadata). This keeps SBOM/diff output in lockstep with runtime events and the debug-store manifest.
  • Serialize CycloneDX JSON and CycloneDX Protobuf; optionally SPDX 3.0.1 JSON.
  • Build BOM‑Index sidecar: purl table + roaring bitmap; flag usedByEntrypoint components for fast backend joins.

The emitted buildId metadata is preserved in component hashes, diff payloads, and /policy/runtime responses so operators can pivot from SBOM entries → runtime events → debug/.build-id/<aa>/<rest>.debug within the Offline Kit or release bundle.

5.6 DSSE attestation (via Signer/Attestor)

  • WebService constructs predicate with image_digest, stellaops_version, license_id, policy_digest? (when emitting final reports), timestamps.
  • Calls Signer (requires OpTok + PoE); Signer verifies entitlement + scanner image integrity and returns DSSE bundle.
  • Attestor logs to Rekor v2; returns {uuid,index,proof} → stored in artifacts.rekor.
  • Operator enablement runbooks (toggles, env-var map, rollout guidance) live in operations/dsse-rekor-operator-guide.md per SCANNER-ENG-0015.

6) Three‑way diff (image → layer → component)

6.1 Keys & classification

  • Component key: purl when present; else bin:{sha256}.
  • Diff classes: added, removed, version_changed (upgraded|downgraded), metadata_changed (e.g., origin from attestation vs observed).
  • Layer attribution: for each change, resolve the introducing/removing layer.

6.2 Algorithm (outline)

A = components(imageOld, key)
B = components(imageNew, key)

added   = B \ A
removed = A \ B
changed = { k in A∩B : version(A[k]) != version(B[k]) || origin changed }

for each item in added/removed/changed:
   layer = attribute_to_layer(item, imageOld|imageNew)
   usageFlag = usedByEntrypoint(item, imageNew)
emit diff.json (grouped by layer with badges)

Diffs are stored as artifacts and feed UI and CLI.


7) Build‑time SBOMs (fast CI path)

Scanner.Sbomer.BuildXPlugin can act as a BuildKit generator:

  • During docker buildx build --attest=type=sbom,generator=stellaops/sbom-indexer, run analyzers on the build context/output; attach SBOMs as OCI referrers to the built image.
  • Optionally request Signer/Attestor to produce Stella Ops‑verified attestation immediately; else, Scanner.WebService can verify and re‑attest post‑push.
  • Scanner.WebService trusts build‑time SBOMs per policy, enabling no‑rescan for unchanged bases.

8) Configuration (YAML)

scanner:
  queue:
    kind: redis
    url: "redis://queue:6379/0"
  mongo:
    uri: "mongodb://mongo/scanner"
  s3:
    endpoint: "http://minio:9000"
    bucket: "stellaops"
    objectLock: "governance"   # or 'compliance'
  analyzers:
    os: { apk: true, dpkg: true, rpm: true }
    lang: { java: true, node: true, python: true, go: true, dotnet: true, rust: true }
    native: { elf: true, pe: false, macho: false }    # PE/Mach-O in M2
    entryTrace: { enabled: true, shellMaxDepth: 64, followRunParts: true }
  emit:
    cdx: { json: true, protobuf: true }
    spdx: { json: true }
    compress: "zstd"
  rekor:
    url: "https://rekor-v2.internal"
  signer:
    url: "https://signer.internal"
  limits:
    maxParallel: 8
    perRegistryConcurrency: 2
  policyHints:
    verifyImageSignature: false
    trustBuildTimeSboms: true

9) Scale & performance

  • Parallelism: per‑analyzer concurrency; bounded directory walkers; file CAS dedupe by sha256.

  • Distributed locks per layer digest to prevent duplicate work across Workers.

  • Registry throttles: per‑host concurrency budgets; exponential backoff on 429/5xx.

  • Targets:

    • Build‑time: P95 ≤ 3–5 s on warmed bases (CI generator).
    • Post‑build delta: P95 ≤ 10 s for 200 MB images with cache hit.
    • Emit: CycloneDX Protobuf ≤ 150 ms for 5k components; JSON ≤ 500 ms.
    • Diff: ≤ 200 ms for 5k vs 5k components.

10) Security posture

  • AuthN: Authority‑issued short OpToks (DPoP/mTLS).
  • AuthZ: scopes (scanner.scan, scanner.export, scanner.catalog.read).
  • mTLS to Signer/Attestor; only Signer can sign.
  • No network fetches during analysis (except registry pulls and optional Rekor index reads).
  • Sandboxing: non‑root containers; read‑only FS; seccomp profiles; disable execution of scanned content.
  • Release integrity: all first‑party images are cosign‑signed; Workers/WebService self‑verify at startup.

11) Observability & audit

  • Metrics:

    • scanner.jobs_inflight, scanner.scan_latency_seconds
    • scanner.layer_cache_hits_total, scanner.file_cas_hits_total
    • scanner.artifact_bytes_total{format}
    • scanner.attestation_latency_seconds, scanner.rekor_failures_total
    • scanner_analyzer_golang_heuristic_total{indicator,version_hint} — increments whenever the Go analyzer falls back to heuristics (build-id or runtime markers). Grafana panel: sum by (indicator) (rate(scanner_analyzer_golang_heuristic_total[5m])); alert when the rate is ≥ 1 for 15 minutes to highlight unexpected stripped binaries.
  • Tracing: spans for acquire→union→analyzers→compose→emit→sign→log.

  • Audit logs: DSSE requests log license_id, image_digest, artifactSha256, policy_digest?, Rekor UUID on success.


12) Testing matrix

  • Determinism: given same image + analyzers → byte‑identical CDX Protobuf; JSON normalized.
  • OS packages: ground‑truth images per distro; compare to package DB.
  • Lang ecosystems: sample images per ecosystem (Java/Node/Python/Go/.NET/Rust) with installed metadata; negative tests w/ lockfile‑only.
  • Native & EntryTrace: ELF graph correctness; shell AST cases (includes, run‑parts, exec, case/if).
  • Diff: layer attribution against synthetic two‑image sequences.
  • Performance: cold vs warm cache; large node_modules and site‑packages.
  • Security: ensure no code execution from image; fuzz parser inputs; path traversal resistance on layer extract.

13) Failure modes & degradations

  • Missing OS DB (files exist, DB removed): record files; do not fabricate package components; emit bin:{sha256} where unavoidable; flag in evidence.
  • Unreadable metadata (corrupt dist‑info): record file evidence; skip component creation; annotate.
  • Dynamic shell constructs: mark unresolved edges with reasons (env var unknown) and continue; Usage view may be partial.
  • Registry rate limits: honor backoff; queue job retries with jitter.
  • Signer refusal (license/plan/version): scan completes; artifact produced; no attestation; WebService marks result as unverified.

14) Optional plug‑ins (off by default)

  • Patch‑presence detector (signature‑based backport checks). Reads curated function‑level signatures from advisories; inspects binaries for patched code snippets to lower false‑positives for backported fixes. Runs as a sidecar analyzer that annotates components; never overrides core identities.
  • Runtime probes (with Zastava): when allowed, compare /proc//maps(DSOs actually loaded) with static Usage view for precision.

15) DevOps & operations

  • HA: WebService horizontal scale; Workers autoscale by queue depth & CPU; distributed locks on layers.
  • Retention: ILM rules per artifact class (short, default, compliance); Object Lock for compliance artifacts (reports, signed SBOMs).
  • Upgrades: bump cache schema when analyzer outputs change; WebService triggers refresh of dependent artifacts.
  • Backups: Mongo (daily dumps); RustFS snapshots (filesystem-level rsync/ZFS) or S3 versioning when legacy driver enabled; Rekor v2 DB snapshots.

16) CLI & UI touch points

  • CLI: stellaops scan <ref>, stellaops diff --old --new, stellaops export, stellaops verify attestation <bundle|url>.
  • UI: Scan detail shows Inventory/Usage toggles, Diff by Layer, Attestation badge (verified/unverified), Rekor link, and EntryTrace chain with file:line breadcrumbs.

17) Roadmap (Scanner)

  • M2: Windows containers (MSI/SxS/GAC analyzers), PE/Mach‑O native analyzer, deeper Rust metadata.
  • M2: Buildx generator GA (certified external registries), cross‑registry trust policies.
  • M3: Patch‑presence plug‑in GA (opt‑in), cross‑image corpus clustering (evidence‑only; not identity).
  • M3: Advanced EntryTrace (POSIX shell features breadth, busybox detection).

Appendix A — EntryTrace resolution (pseudo)

ResolveEntrypoint(ImageConfig cfg, RootFs fs):
  cmd = Normalize(cfg.ENTRYPOINT, cfg.CMD)
  stack = [ Script(cmd, path=FindOnPath(cmd[0], fs)) ]
  visited = set()

  while stack not empty and depth < MAX:
    cur = stack.pop()
    if cur in visited: continue
    visited.add(cur)

    if IsShellScript(cur.path):
       ast = ParseShell(cur.path)
       foreach directive in ast:
         if directive is Source include:
            p = ResolveInclude(include.path, cur.env, fs)
            stack.push(Script(p))
         if directive is Exec call:
            p = ResolveExec(call.argv[0], cur.env, fs)
            stack.push(Program(p, argv=call.argv))
         if directive is Interpreter (python -m / node / java -jar):
            term = ResolveInterpreterTarget(call, fs)
            stack.push(Program(term))
    else:
       return Terminal(cur.path)

  return Unknown(reason)

Appendix A.1 — EntryTrace Explainability

EntryTrace emits structured diagnostics and metrics so operators can quickly understand why resolution succeeded or degraded:

ReasonDescriptionTypical Mitigation
CommandNotFoundA command referenced in the script cannot be located in the layered root filesystem or PATH.Ensure binaries exist in the image or extend PATH hints.
MissingFilesource/./run-parts targets are missing.Bundle the script or guard the include.
DynamicEnvironmentReferencePath depends on $VARS that are unknown at scan time.Provide defaults via scan metadata or accept partial usage.
RecursionLimitReachedNested includes exceeded the analyzer depth limit (default 64).Flatten indirection or increase the limit in options.
RunPartsEmptyrun-parts directory contained no executable entries.Remove empty directories or ignore if intentional.
JarNotFound / ModuleNotFoundJava/Python targets missing, preventing interpreter tracing.Ship the jar/module with the image or adjust the launcher.

Diagnostics drive two metrics published by EntryTraceMetrics:

  • entrytrace_resolutions_total{outcome} — resolution attempts segmented by outcome (resolved, partiallyresolved, unresolved).
  • entrytrace_unresolved_total{reason} — diagnostic counts keyed by reason.

Structured logs include entrytrace.path, entrytrace.command, entrytrace.reason, and entrytrace.depth, all correlated with scan/job IDs. Timestamps are normalized to UTC (microsecond precision) to keep DSSE attestations and UI traces explainable.

Appendix B — BOM‑Index sidecar

struct Header { magic, version, imageDigest, createdAt }
vector<string> purls
map<purlIndex, roaring_bitmap> components
optional map<purlIndex, roaring_bitmap> usedByEntrypoint