Advisory AI architecture

Captures the retrieval, guardrail, and inference packaging requirements defined in the Advisory AI implementation plan and related module guides. Configuration knobs (inference modes, guardrails, cache/queue budgets) now live in docs/policy/assistant-parameters.md per DOCS-AIAI-31-006.

1) Goals

  • Summarise advisories/VEX evidence into operator-ready briefs with citations.
  • Explain conflicting statements with provenance and trust weights (using VEX Lens & Excititor data).
  • Suggest remediation plans aligned with Offline Kit deployment models and scheduler follow-ups.
  • Operate deterministically where possible; cache generated artefacts with digests for audit.

2) Pipeline overview

                       +---------------------+
   Concelier/VEX Lens  |  Evidence Retriever |
   Policy Engine ----> |  (vector + keyword) | ---> Context Pack (JSON)
   Zastava runtime     +---------------------+
                               |
                               v
                        +-------------+
                        | Prompt      |
                        | Assembler   |
                        +-------------+
                               |
                               v
                        +-------------+
                        | Guarded LLM |
                        | (local/host)|
                        +-------------+
                               |
                               v
                        +-----------------+
                        | Citation &     |
                        | Validation      |
                        +-----------------+
                               |
                               v
                        +----------------+
                        | Output cache   |
                        | (hash, bundle) |
                        +----------------+

3) Retrieval & context

  • Hybrid search: vector embeddings (SBERT-compatible) + keyword filters for advisory IDs, PURLs, CVEs.

  • Context packs include:

    • Advisory raw excerpts with highlighted sections and source URLs.
    • VEX statements (normalized tuples + trust metadata).
    • Policy explain traces for the affected finding.
    • Runtime/impact hints from Zastava (exposure, entrypoints).
    • Export-ready remediation data (fixed versions, patches).
  • SBOM context retriever (AIAI-31-002) hydrates:

    • Version timelines (first/last observed, status, fix availability).
    • Dependency paths (runtime vs build/test, deduped by coordinate chain).
    • Tenant environment flags (prod/stage toggles) with optional blast radius summary.
    • Service-side clamps: max 500 timeline entries, 200 dependency paths, with client-provided toggles for env/blast data.
    • AddSbomContextHttpClient(...) registers the typed HTTP client that calls /v1/sbom/context, while NullSbomContextClient remains the safe default for environments that have not yet exposed the SBOM service.

    Sample configuration (wire real SBOM base URL + API key):

    services.AddSbomContextHttpClient(options =>
    {
        options.BaseAddress = new Uri("https://sbom-service.internal");
        options.Endpoint = "/v1/sbom/context";
        options.ApiKey = configuration["SBOM_SERVICE_API_KEY"];
        options.UserAgent = "stellaops-advisoryai/1.0";
        options.Tenant = configuration["TENANT_ID"];
    });
    
    services.AddAdvisoryPipeline();
    

    After configuration, issue a smoke request (e.g., ISbomContextRetriever.RetrieveAsync) during deployment validation to confirm end-to-end connectivity and credentials before enabling Advisory AI endpoints.

Retriever requests and results are trimmed/normalized before hashing; metadata (counts, provenance keys) is returned for downstream guardrails. Unit coverage ensures deterministic ordering and flag handling.

All context references include content_hash and source_id enabling verifiable citations.

4) Guardrails

  • Prompt templates enforce structure: summary, conflicts, remediation, references.
  • Response validator ensures:
    • No hallucinated advisories (every fact must map to input context).
    • Citations follow [n] indexing referencing actual sources.
    • Remediation suggestions only cite policy-approved sources (fixed versions, vendor hotfixes).
  • Moderation/PII filters prevent leaking secrets; responses failing validation are rejected and logged.
  • Pre-flight guardrails redact secrets (AWS keys, generic API tokens, PEM blobs), block “ignore previous instructions”-style prompt injection attempts, enforce citation presence, and cap prompt payload length (default 16 kB). Guardrail outcomes and redaction counts surface via advisory_guardrail_blocks / advisory_outputs_stored metrics.

5) Deterministic tooling

  • Version comparators — offline semantic version + RPM EVR parsers with range evaluators. Supports chained constraints (>=, <=, !=) used by remediation advice and blast radius calcs.
    • Registered via AddAdvisoryDeterministicToolset for reuse across orchestrator, CLI, and services.
  • Orchestration pipeline — see orchestration-pipeline.md for prerequisites, task breakdown, and cross-guild responsibilities before wiring the execution flows.
  • Planned extensions — NEVRA/EVR comparators, ecosystem-specific normalisers, dependency chain scorers (AIAI-31-003 scope).
  • Exposed via internal interfaces to allow orchestrator/toolchain reuse; all helpers stay side-effect free and deterministic for golden testing.

6) Output persistence

  • Cached artefacts stored in advisory_ai_outputs with fields:
    • output_hash (sha256 of JSON response).
    • input_digest (hash of context pack).
    • summary, conflicts, remediation, citations.
    • generated_at, model_id, profile (Sovereign/FIPS etc.).
    • signatures (optional DSSE if run in deterministic mode).
  • Offline bundle format contains summary.md, citations.json, context_manifest.json, signatures/.

7) Profiles & sovereignty

  • Profiles: default, fips-local (FIPS-compliant local model), gost-local, cloud-openai (optional, disabled by default). Each profile defines allowed models, key management, and telemetry endpoints.
  • CryptoProfile/RootPack integration: generated artefacts can be signed using configured CryptoProfile to satisfy procurement/trust requirements.

8) APIs

  • POST /api/v1/advisory/{task} — executes Summary/Conflict/Remediation pipeline (tasksummary|conflict|remediation). Requests accept {advisoryKey, artifactId?, policyVersion?, profile, preferredSections?, forceRefresh} and return sanitized prompt payloads, citations, guardrail metadata, provenance hash, and cache hints.
  • GET /api/v1/advisory/outputs/{cacheKey}?taskType=SUMMARY&profile=default — retrieves cached artefacts for downstream consumers (Console, CLI, Export Center). Guardrail state and provenance hash accompany results.

All endpoints accept profile parameter (default fips-local) and return output_hash, input_digest, and citations for verification.

9) Observability

  • Metrics: advisory_ai_requests_total{profile,type}, advisory_ai_latency_seconds, advisory_ai_validation_failures_total.
  • Logs: include output_hash, input_digest, profile, model_id, tenant, artifacts. Sensitive context is not logged.
  • Traces: spans for retrieval, prompt assembly, model inference, validation, cache write.

10) Operational controls

  • Feature flags per tenant (ai.summary.enabled, ai.remediation.enabled).
  • Rate limits (per tenant, per profile) enforced by Orchestrator to prevent runaway usage.
  • Offline/air-gapped deployments run local models packaged with Offline Kit; model weights validated via manifest digests.

11) Hosting surfaces

  • WebService — exposes /v1/advisory-ai/pipeline/{task} to materialise plans and enqueue execution messages.
  • Worker — background service draining the advisory pipeline queue (file-backed stub) pending integration with shared transport.
  • Both hosts register AddAdvisoryAiCore, which wires the SBOM context client, deterministic toolset, pipeline orchestrator, and queue metrics.
  • SBOM base address + tenant metadata are configured via AdvisoryAI:SbomBaseAddress and propagated through AddSbomContext.

12) QA harness & determinism (Sprint 110 refresh)

  • Injection fixtures: src/AdvisoryAI/__Tests/StellaOps.AdvisoryAI.Tests/TestData/guardrail-injection-cases.json now enumerates both blocked and allow-listed prompts (redactions, citation checks, prompt-length clamps) while the legacy prompt-injection-fixtures.txt file continues to supply quick block-only payloads. AdvisoryGuardrailInjectionTests consumes both datasets so guardrail regressions surface with metadata (blocked phrase counts, redaction counters, citation enforcement) instead of single-signal failures.
  • Golden prompts: summary-prompt.json now pairs with conflict-prompt.json; AdvisoryPromptAssemblerTests load both to enforce deterministic JSON payloads across task types and verify vector preview truncation (600 characters + ellipsis) keeps prompts under the documented perf ceiling.
  • Plan determinism: AdvisoryPipelineOrchestratorTests shuffle structured/vector/SBOM inputs and assert cache keys + metadata remain stable, proving that seeded plan caches stay deterministic even when retrievers emit out-of-order results.
  • Execution telemetry: AdvisoryPipelineExecutorTests exercise partial citation coverage (target ≥0.5 when only half the structured chunks are cited) so advisory_ai_citation_coverage_ratio reflects real guardrail quality.
  • Plan cache stability: AdvisoryPlanCacheTests now seed the in-memory cache with a fake time provider to confirm TTL refresh when plans are replaced, guaranteeing reproducible eviction under air-gapped runs.

13) Deployment profiles, scaling, and remote inference

  • Local inference containers. advisory-ai-web exposes the API/plan cache endpoints while advisory-ai-worker drains the queue and executes prompts. Both containers mount the same RWX volume that hosts three deterministic paths: /var/lib/advisory-ai/queue, /var/lib/advisory-ai/plans, /var/lib/advisory-ai/outputs. Compose bundles create named volumes (advisory-ai-{queue,plans,outputs}) and the Helm chart mounts the stellaops-advisory-ai-data PVC so web + worker remain in lockstep.
  • Remote inference toggle. Set AdvisoryAI:Inference:Mode (env: ADVISORYAI__AdvisoryAI__Inference__Mode) to Remote when you want prompts to be executed by an external inference tier. Provide AdvisoryAI:Inference:Remote:BaseAddress and, optionally, ...:ApiKey. When remote calls fail the executor falls back to the sanitized prompt and sets inference.fallback_* metadata so CLI/Console surface a warning.
  • Scalability. Start with 1 web replica + 1 worker for up to ~10 requests/minute. For higher throughput, scale advisory-ai-worker horizontally; each worker is CPU-bound (2 vCPU / 4 GiB RAM recommended) while the web front end is I/O-bound (1 vCPU / 1 GiB). Because the queue/plan/output stores are content-addressed files, ensure the shared volume delivers ≥500 IOPS and <5 ms latency; otherwise queue depth will lag.
  • Offline & air-gapped stance. The Compose/Helm manifests avoid external network calls by default and the Offline Kit now publishes the advisory-ai-web and advisory-ai-worker images alongside their SBOMs/provenance. Operators can rehydrate the RWX volume from the kit to pre-prime cache directories before enabling the service.