Surface.Env Design (Epic: SURFACE-SHARING)

Status: Draft v1.0 — aligns with tasks SURFACE-ENV-01..05, SCANNER-ENV-01..03, ZASTAVA-ENV-01..02, OPS-ENV-01.

Audience: Scanner Worker/WebService engineers, Zastava engineers, DevOps/Ops teams.

1. Goals

Surface.Env centralises configuration discovery for every component that touches the shared Scanner “surface” (cache, manifests, secrets). The library replaces ad-hoc environment lookups with a deterministic, validated contract that:

  1. Works identically across Scanner Worker, Scanner WebService, BuildX plug-ins, Zastava Observer/Webhook, and future consumers (Scheduler planners, CLI runners).
  2. Supports both connected and air-gapped deployments with clear defaults.
  3. Records configuration intent (tenant isolation, cache limits, TLS, feature flags) so Surface.Validation can enforce preconditions before any work executes.

2. Architecture Overview

+-----------------------+
| Host (Worker/WebSvc)  |
|  - IConfiguration     |
|  - ILogger            |
|                       |
|  +-----------------+  |
|  | SurfaceEnv      |  |   loads env vars / config file
|  |  - Provider     |--+------------------------------+
|  |  - Validators   |                                 |
|  +-----------------+                                 |
|            |                                         |
|            | IResolvedSurfaceConfiguration           |
|            v                                         v
|  Surface.FS / Surface.Secrets / Surface.Validation consumers
+-------------------------------------------------------------

Surface.Env exposes ISurfaceEnvironment which returns an immutable SurfaceEnvironmentSettings record. Hosts call SurfaceEnvBuilder.Build() during startup, passing optional configuration overrides (for example, Helm chart values). The builder resolves environment variables, applies defaults, and executes Surface.Validation rules before handing settings to downstream services.

3. Configuration Schema

3.1 Common keys

VariableDescriptionDefaultNotes
SCANNER_SURFACE_FS_ENDPOINTBase URI for Surface.FS / RustFS / S3-compatible store.requiredThrows SurfaceEnvironmentException when RequireSurfaceEndpoint = true. When disabled (tests), builder falls back to https://surface.invalid so validation can fail fast. Also binds Surface:Fs:Endpoint from IConfiguration.
SCANNER_SURFACE_FS_BUCKETBucket/container used for manifests and artefacts.surface-cacheMust be unique per tenant; validators enforce non-empty value.
SCANNER_SURFACE_FS_REGIONOptional region for S3-compatible stores.nullNeeded only when the backing store requires it (AWS/GCS).
SCANNER_SURFACE_CACHE_ROOTLocal directory for warm caches.<temp>/stellaops/surfaceDirectory is created if missing. Override to /var/lib/stellaops/surface (or another fast SSD) in production.
SCANNER_SURFACE_CACHE_QUOTA_MBSoft limit for on-disk cache usage.4096Enforced range 64–262144 MB; validation emits SURFACE_ENV_CACHE_QUOTA_INVALID outside the range.
SCANNER_SURFACE_PREFETCH_ENABLEDEnables manifest prefetch threads.falseWorkers honour this before analyzer execution.
SCANNER_SURFACE_TENANTTenant namespace used by cache + secret resolvers.TenantResolver(...) or "default"Default resolver may pull from Authority claims; you can override via env for multi-tenant pools.
SCANNER_SURFACE_FEATURESComma-separated feature switches.""Compared against SurfaceEnvironmentOptions.KnownFeatureFlags; unknown flags raise warnings.
SCANNER_SURFACE_TLS_CERT_PATHPath to PEM/PKCS#12 file for client auth.nullWhen present, SurfaceEnvironmentBuilder loads the certificate into SurfaceTlsConfiguration.
SCANNER_SURFACE_TLS_KEY_PATHOptional private-key path when cert/key are stored separately.nullStored in SurfaceTlsConfiguration for hosts that need to hydrate the key themselves.

3.2 Secrets provider keys

VariableDescriptionNotes
SCANNER_SURFACE_SECRETS_PROVIDERProvider ID (kubernetes, file, inline, future back-ends).Defaults to kubernetes; validators reject unknown values via SURFACE_SECRET_PROVIDER_UNKNOWN.
SCANNER_SURFACE_SECRETS_ROOTPath or base namespace for the provider.Required for the file provider (e.g., /etc/stellaops/secrets).
SCANNER_SURFACE_SECRETS_NAMESPACEKubernetes namespace used by the secrets provider.Mandatory when provider = kubernetes.
SCANNER_SURFACE_SECRETS_FALLBACK_PROVIDEROptional secondary provider ID.Enables tiered lookups (e.g., kubernetesinline) without changing code.
SCANNER_SURFACE_SECRETS_ALLOW_INLINEAllows returning inline secrets (useful for tests).Defaults to false; Production deployments should keep this disabled.
SCANNER_SURFACE_SECRETS_TENANTTenant override for secret lookups.Defaults to SCANNER_SURFACE_TENANT or the tenant resolver result.

3.3 Component-specific prefixes

SurfaceEnvironmentOptions.Prefixes controls the order in which suffixes are probed. Every suffix listed above is combined with each prefix (e.g., SCANNER_SURFACE_FS_ENDPOINT, ZASTAVA_SURFACE_FS_ENDPOINT) and finally the bare suffix (SURFACE_FS_ENDPOINT). Configure prefixes per host so local overrides win but global scanner defaults remain available:

ComponentSuggested prefixes (first match wins)Notes
Scanner.Worker / WebServiceSCANNERDefault – already added by AddSurfaceEnvironment.
Zastava Observer/Webhook (planned)ZASTAVA, SCANNERCall options.AddPrefix("ZASTAVA") before relying on ZASTAVA_* overrides.
Future CLI / BuildX plug-insCLI, SCANNERAllows per-user overrides without breaking shared env files.

This approach means operators can define a single env file (SCANNER_*) and only override the handful of settings that diverge for a specific component by introducing an additional prefix.

3.4 Configuration precedence

The builder resolves every suffix using the following precedence:

  1. Environment variables using the configured prefixes (e.g., ZASTAVA_SURFACE_FS_ENDPOINT, then SCANNER_SURFACE_FS_ENDPOINT, then the bare SURFACE_FS_ENDPOINT).
  2. Configuration values under the Surface:* section (for example Surface:Fs:Endpoint, Surface:Cache:Root in appsettings.json or Helm values).
  3. Hard-coded defaults baked into SurfaceEnvironmentBuilder (temporary directory, surface-cache bucket, etc.).

SurfaceEnvironmentOptions.RequireSurfaceEndpoint controls whether a missing endpoint results in an exception (default: true). Other values fall back to the default listed in § 3.1/3.2 and are further validated by the Surface.Validation pipeline.

4. API Surface

public interface ISurfaceEnvironment
{
    SurfaceEnvironmentSettings Settings { get; }
    IReadOnlyDictionary<string, string> RawVariables { get; }
}

public sealed record SurfaceEnvironmentSettings(
    Uri SurfaceFsEndpoint,
    string SurfaceFsBucket,
    string? SurfaceFsRegion,
    DirectoryInfo CacheRoot,
    int CacheQuotaMegabytes,
    bool PrefetchEnabled,
    IReadOnlyCollection<string> FeatureFlags,
    SurfaceSecretsConfiguration Secrets,
    string Tenant,
    SurfaceTlsConfiguration Tls)
{
    public DateTimeOffset CreatedAtUtc { get; init; }
}

public sealed record SurfaceSecretsConfiguration(
    string Provider,
    string Tenant,
    string? Root,
    string? Namespace,
    string? FallbackProvider,
    bool AllowInline);

public sealed record SurfaceTlsConfiguration(
    string? CertificatePath,
    string? PrivateKeyPath,
    X509Certificate2Collection? ClientCertificates);

ISurfaceEnvironment.RawVariables captures the exact env/config keys that produced the snapshot so operators can export them in diagnostics bundles.

SurfaceEnvironmentOptions configures how the snapshot is built:

  • ComponentName – used in logs/validation output.
  • Prefixes – ordered list of env prefixes (see § 3.3). Defaults to ["SCANNER"].
  • RequireSurfaceEndpoint – throw when no endpoint is provided (default true).
  • TenantResolver – delegate invoked when SCANNER_SURFACE_TENANT is absent.
  • KnownFeatureFlags – recognised feature switches; unexpected values raise warnings.

Example registration:

builder.Services.AddSurfaceEnvironment(options =>
{
    options.ComponentName = "Scanner.Worker";
    options.AddPrefix("ZASTAVA"); // optional future override
    options.KnownFeatureFlags.Add("validation");
    options.TenantResolver = sp => sp.GetRequiredService<ITenantContext>().TenantId;
});

Consumers access ISurfaceEnvironment.Settings and pass the record into Surface.FS, Surface.Secrets, cache, and validation helpers. The interface memoises results so repeated access is cheap.

5. Validation

SurfaceEnvironmentBuilder only throws SurfaceEnvironmentException for malformed inputs (non-integer quota, invalid URI, missing required variable when RequireSurfaceEndpoint = true). The richer validation pipeline lives in StellaOps.Scanner.Surface.Validation and runs via services.AddSurfaceValidation():

  1. SurfaceEndpointValidator – checks for a non-placeholder endpoint and bucket (SURFACE_ENV_MISSING_ENDPOINT, SURFACE_FS_BUCKET_MISSING).
  2. SurfaceCacheValidator – verifies the cache directory exists/is writable and that the quota is positive (SURFACE_ENV_CACHE_DIR_UNWRITABLE, SURFACE_ENV_CACHE_QUOTA_INVALID).
  3. SurfaceSecretsValidator – validates provider names, required namespace/root fields, and tenant presence (SURFACE_SECRET_PROVIDER_UNKNOWN, SURFACE_SECRET_CONFIGURATION_MISSING, SURFACE_ENV_TENANT_MISSING).

Validators emit SurfaceValidationIssue instances with codes defined in SurfaceValidationIssueCodes. LoggingSurfaceValidationReporter writes structured log entries (Info/Warning/Error) using the component name, issue code, and remediation hint. Hosts fail startup if any issue has Error severity; warnings allow startup but surface actionable hints.

6. Integration Guidance

  • Scanner Worker: register AddSurfaceEnvironment, AddSurfaceValidation, AddSurfaceFileCache, and AddSurfaceSecrets before analyzer/services (see src/Scanner/StellaOps.Scanner.Worker/Program.cs). SurfaceCacheOptionsConfigurator already binds the cache root from ISurfaceEnvironment.
  • Scanner WebService: identical wiring, plus SurfacePointerService/ScannerSurfaceSecretConfigurator reuse the resolved settings (Program.cs demonstrates the pattern).
  • Zastava Observer/Webhook: will reuse the same helper once the service adds AddSurfaceEnvironment(options => options.AddPrefix("ZASTAVA")) so per-component overrides function without diverging defaults.
  • Scheduler / CLI / BuildX (future): treat ISurfaceEnvironment as read-only input; secret lookup, cache plumbing, and validation happen before any queue/enqueue work.

Readiness probes should invoke ISurfaceValidatorRunner (registered by AddSurfaceValidation) and fail the endpoint when any issue is returned. The Scanner Worker/WebService hosted services already run the validators on startup; other consumers should follow the same pattern.

6.1 Validation output

LoggingSurfaceValidationReporter produces log entries that include:

Surface validation issue for component Scanner.Worker: SURFACE_ENV_MISSING_ENDPOINT - Surface FS endpoint is missing or invalid. Hint: Set SCANNER_SURFACE_FS_ENDPOINT to the RustFS/S3 endpoint.

Treat SurfaceValidationIssueCodes.* with severity Error as hard blockers (readiness must fail). Warning entries flag configuration drift (for example, missing namespaces) but allow startup so staging/offline runs can proceed. The codes appear in both the structured log state and the reporter payload, making it easy to alert on them.

7. Security & Observability

  • Surface.Env never logs raw values; only suffix names and issue codes appear in logs. RawVariables is intended for diagnostics bundles and should be treated as sensitive metadata.
  • TLS certificates are loaded into memory and not re-serialised; only the configured paths are exposed to downstream services.
  • To emit metrics, register a custom ISurfaceValidationReporter (e.g., wrapping Prometheus counters) in addition to the logging reporter.

8. Offline & Air-Gap Support

  • Defaults assume no public network access; point SCANNER_SURFACE_FS_ENDPOINT at an internal RustFS/S3 mirror.
  • Offline bundles must capture an env file (Ops track this under the Offline Kit tasks) so operators can seed SCANNER_* values before first boot.
  • Keep docs/modules/devops/runbooks/zastava-deployment.md in sync so Zastava deployments reuse the same env contract.

9. Testing Strategy

  • Unit tests for each resolver/validator.
  • Integration tests for Worker & Observer verifying that missing configuration causes deterministic failures.
  • Golden tests for configuration precedence (component overrides, defaults).

10. Open Questions / Future Work

  • Dynamic refresh of environment (watch ConfigMap) is out of scope for v1.
  • Evaluate adding support for environment discovery via IConfiguration only (no env vars) for Windows service deployments.

11. References

  • Surface.FS Design (docs/modules/scanner/design/surface-fs.md)
  • Surface.Secrets Design (docs/modules/scanner/design/surface-secrets.md)
  • Surface.Validation Design (docs/modules/scanner/design/surface-validation.md)
  • AirGap mode overview (docs/airgap/airgap-mode.md)