17 · Security Hardening Guide — Stella Ops

(v2.0 — 12 Jul 2025)

Audience — Site‑reliability and platform teams deploying the open‑source Core in production or restricted networks.


0 Table of Contents

  1. Threat model (summary)
  2. Host‑OS baseline
  3. Container & runtime hardening
  4. Network‑plane guidance
  5. Secrets & key management
  6. Image, SBOM & plug‑in supply‑chain controls
  7. Logging, monitoring & audit
  8. Update & patch strategy
  9. Incident‑response workflow
  10. Pen‑testing & continuous assurance
  11. Contacts & vulnerability disclosure
  12. Change log

1 Threat model (summary)

Asset Threats Mitigations
SBOMs & scan results Disclosure, tamper TLS‑in‑transit, read‑only Valkey volume, RBAC, Cosign‑verified plug‑ins
Backend container RCE, code‑injection Distroless image, non‑root UID, read‑only FS, seccomp + CAP_DROP:ALL
Update artefacts Supply‑chain attack Cosign‑signed images & SBOMs, enforced by admission controller
Admin credentials Phishing, brute force OAuth 2.0 with 12‑h token TTL, optional mTLS

2 Host‑OS baseline checklist

Item Recommended setting
OS Ubuntu 22.04 LTS (kernel ≥ 5.15) or Alma 9
Patches unattended‑upgrades or vendor‑equivalent enabled
Filesystem noexec,nosuid on /tmp, /var/tmp
Docker Engine v24.*, API socket root‑owned (0660)
Auditd Watch /etc/docker, /usr/bin/docker* and Compose files
Time sync chrony or systemd‑timesyncd

3 Container & runtime hardening

3.1 Docker Compose reference (compose-core.yml)

services:
  backend:
    image: registry.stella-ops.org/stella-ops/stella-ops:<PINNED_TAG_OR_DIGEST>
    user: "101:101"              # non‑root
    read_only: true
    security_opt:
      - "no-new-privileges:true"
      - "seccomp:./seccomp-backend.json"
    cap_drop: [ALL]
    tmpfs:
      - /tmp:size=64m,exec,nosymlink
    environment:
      - ASPNETCORE_URLS=https://+:8080
      - TLSPROVIDER=OpenSslGost
    depends_on: [valkey]
    networks: [core-net]
    healthcheck:
      test: ["CMD", "wget", "-qO-", "https://localhost:8080/health"]
      interval: 30s
      timeout: 5s
      retries: 5

  valkey:
    image: valkey/valkey:8.0-alpine
    command: ["valkey-server", "--requirepass", "${VALKEY_PASS}", "--rename-command", "FLUSHALL", ""]
    user: "valkey"
    read_only: true
    cap_drop: [ALL]
    tmpfs:
      - /data
    networks: [core-net]

networks:
  core-net:
    driver: bridge

No dedicated “Valkey” or “PostgreSQL” sub-nets are declared; the single bridge network suffices for the default stack.

3.2 Kubernetes deployment highlights

Use a separate NetworkPolicy that only allows egress from backend to Valkey (Redis-compatible) :6379. securityContext: runAsNonRoot, readOnlyRootFilesystem, allowPrivilegeEscalation: false, drop all capabilities. PodDisruptionBudget of minAvailable: 1. Optionally add CosignVerified=true label enforced by an admission controller (e.g. Kyverno or Connaisseur).

4 Network‑plane guidance

Plane Recommendation
North‑south Terminate TLS 1.2+ (OpenSSL‑GOST default). Use LetsEncrypt or internal CA.
East-west Compose bridge or K8s ClusterIP only; no public Valkey/PostgreSQL ports.
Ingress controller Limit methods to GET, POST, PATCH (no TRACE).
Rate‑limits 40 rps default; tune ScannerPool.Workers and ingress limit‑req to match.

5 Secrets & key management

Secret Storage Rotation
Client‑JWT (offline) /var/lib/stella/tokens/client.jwt (root : 600) 30 days – provided by each OUK
VALKEY_PASS Docker/K8s secret 90 days
OAuth signing key /keys/jwt.pem (read‑only mount) 180 days
Cosign public key /keys/cosign.pub baked into image; change on every major release
Trivy DB mirror token (if remote) Secret + read‑only 30 days

Never bake secrets into images; always inject at runtime.

Operational tip: schedule a cron reminding ops 5 days before client.jwt expiry. The backend also emits a Prometheus metric stella_quota_token_days_remaining.

6 Image, SBOM & plug‑in supply‑chain controls

  • Images — Pull by digest not latest; verify:
cosign verify ghcr.io/stellaops/backend@sha256:<DIGEST> \
  --key https://stella-ops.org/keys/cosign.pub
  • SBOM — Each release ships an SPDX file; store alongside images for audit.
  • Third‑party plug‑ins — Place in /plugins/; backend will:
  • Validate Cosign signature.
  • Check [StellaPluginVersion(“major.minor”)].
  • Refuse to start if Security.DisablePluginUnsigned=false (default).

7 Logging, monitoring & audit

Control Implementation
Log format Serilog JSON; ship via Fluent‑Bit to ELK or Loki
Metrics Prometheus /metrics endpoint; default Grafana dashboard in infra/
Audit events Valkey (Redis-compatible) stream audit; export daily to SIEM
Alert rules Feed age ≥ 48 h, P95 wall‑time > 5 s, Valkey used memory > 75 %

7.1 Concelier authorization audits

  • Enable the Authority integration for Concelier (authority.enabled=true). Keep authority.allowAnonymousFallback set to true only during migration and plan to disable it before 2025-12-31 UTC so the /jobs* surface always demands a bearer token.
  • Store the Authority client secret using Docker/Kubernetes secrets and point authority.clientSecretFile at the mounted path; the value is read at startup and never logged.
  • Watch the Concelier.Authorization.Audit logger. Each entry contains the HTTP status, subject, client ID, scopes, remote IP, and a boolean bypass flag showing whether a network bypass CIDR allowed the request. Configure your SIEM to alert when unauthenticated requests (status=401) appear with bypass=true, or when unexpected scopes invoke job triggers. Detailed monitoring and response guidance lives in docs/modules/concelier/operations/authority-audit-runbook.md.

8 Update & patch strategy

Layer Cadence Method
Backend & CLI images Monthly or CVE‑driven docker pull + docker compose up -d
Trivy DB 24 h scheduler via Concelier (vulnerability ingest/merge/export service) configurable via Concelier scheduler options
Docker Engine vendor LTS distro package manager
Host OS security repos enabled unattended‑upgrades

9 Incident‑response workflow

  • Detect — PagerDuty alert from Prometheus or SIEM.
  • Contain — Stop affected Backend container; isolate Valkey RDB snapshot.
  • Eradicate — Pull verified images, redeploy, rotate secrets.
  • Recover — Restore RDB, replay SBOMs if history lost.
  • Review — Post‑mortem within 72 h; create follow‑up issues.
  • Escalate P1 incidents to <security@stella‑ops.org> (24 × 7).

10 Pen‑testing & continuous assurance

Control Frequency Tool/Runner
OWASP ZAP baseline Each merge to main GitHub Action zap-baseline-scan
Dependency scanning Per pull request Trivy FS + Dependabot
External red‑team Annual or pre‑GA CREST‑accredited third‑party

11 Vulnerability disclosure & contact

  • Preferred channel: security@stella‑ops.org (GPG key on website).
  • Coordinated disclosure reward: public credit and swag (no monetary bounty at this time).

12 Change log

Version Date Notes
v2.0 2025‑07‑12 Full overhaul: host‑OS baseline, supply‑chain signing, removal of unnecessary sub‑nets, role‑based contact e‑mail, K8s guidance.
v1.1 2025‑07‑09 Minor fence fixes.
v1.0 2025‑07‑09 Original draft.