19 · Test‑Suite Overview — Stella Ops

(v2.0 — 12 Jul 2025)

Purpose — Describe the multi‑layer automated‑test strategy that guards Stella Ops’ five‑second performance promise, security posture and API stability, and show how each layer maps to CI gates and release criteria.


0 Table of Contents

  1. Test‑pyramid at a glance
  2. Layer definitions & tooling
  3. Directory & naming conventions
  4. CI workflows & failure policy
  5. Quality gates & coverage budgets
  6. Evidence retention & auditability
  7. Local developer quick‑start
  8. Flaky‑test triage & escalation
  9. Change log

1 Test‑pyramid at a glance

Layer Framework(s) Scope CI frequency
Unit xUnit + FluentAssertions Pure C# methods, guard clauses, mapping Every PR
Mutation Stryker.NET Critical algorithm branches Nightly
Static analysis CodeQL, Semgrep OWASP, injection, secrets Every PR
Integration Testcontainers + xUnit Redis, Trivy exec, plug‑in hot‑load Every PR
Quota / throttle Testcontainers + Clock‑mock 333‑scan counter, 5 s & 60 s retry‑after headers Every PR
End‑to‑End (UI) Playwright C# Login, scan list, mute flow Merge→main
Performance Hyperfine + K6 P95 latency, 40 rps throughput Nightly
Security DAST OWASP ZAP baseline TLS headers, auth, XSS Nightly + RC
Chaos / Resilience Pumba & Toxiproxy Redis latency, container kill Weekly
Compliance smoke Spectral + JSON‑Schema SBOM & API payloads Every PR
Token validity xUnit + ClockMock Expiry warning, OUK update refresh, /token/offline flow Every PR

2 Layer definitions & tooling

2.1 Unit

2.2 Mutation

2.3 Integration

### 2.4 Quota / throttle layer (explicit)

2.4 End‑to‑End

2.5 Performance

2.6 Security (DAST + SAST)

2.7 Chaos / Resilience


3 Repository layout

tests/
├─ unit/                 # *.Unit.csproj
├─ mutation/stryker.conf.json
├─ integration/          # *.Integration.csproj
│   └─ fixtures/
├─ e2e/
│   ├─ api/pytest/       # test_*.py
│   └─ ui/playwright/    # *.spec.ts
├─ perf/
│   ├─ compose-perf.yml
│   ├─ hyperfine/
│   └─ k6/
├─ security/
│   ├─ zap-baseline.conf
│   └─ semgrep/
└─ chaos/
    ├─ toxiproxy/
    └─ pumba/

Tests mirror the module namespaces; each src project owns a matching test project.

4 CI workflows

File Trigger Stages
ci.yml Push / PR Lint → Unit → Static analysis → Integration
e2e.yml Merge→main Compose stack → API+UI Playwright
perf.yml Nightly Hyperfine + K6; update Grafana JSON
security.yml Nightly ZAP baseline, Trivy FS, CodeQL
mutation.yml Nightly Stryker.NET; comment PR if < threshold
chaos.yml Weekly (cron) Toxiproxy + Pumba scenarios
release.yml Tag Run all above + evidence bundling
Failure policy: any Red gate blocks merge; nightly failures ping #stella-ci.

5 Quality gates & budgets

Metric Threshold Source Maps to KPI
Line coverage ≥ 80 % Unit, Integration Maintainability
Mutation score ≥ 60 % Stryker Defect escape
P95 SBOM‑first ≤ 5 s Hyperfine Product promise
P95 QUOTA_WAIT (soft) ≤ 10 s Hyperfine + Clock‑mock Predictable throttling
Hard wait‑wall accuracy 60 ± 1 s Hyperfine Compliance with spec
P95 image‑unpack ≤ 10 s Hyperfine SRS FR‑IMG‑1
/scan error‑rate 0 K6 Reliability
ZAP High alerts 0 ZAP JSON Security NFR
Trivy Critical CVEs in release SBOM 0 Trivy FS NFR‑SEC‑1
Offline token expiry warning lead‑time ≥ 7 days Token tests

Coverage & perf budgets live in tests/budgets/*.json; CI actions fail on regression.

6 Evidence retention

Artefact Retention Storage
Hyperfine & K6 CSV 18 months GitHub artefacts → S3
Mutation reports 6 months S3
ZAP & Trivy SARIF 18 months GitHub Security tab
Playwright videos Last 50 builds MinIO

Test logs (JUnit/Allure) 12 months S3, lifecycle policy

7 Developer quick‑start

Bring up full stack for e2e on a laptop

docker compose -f tests/e2e/compose-core.yml up -d

Run unit + integration

dotnet test --collect:"XPlat Code Coverage"

# API e2e
cd tests/e2e/api
pytest -q

# UI e2e
cd tests/e2e/ui
npx playwright install
npm test

8 Flaky‑test triage & escalation

Label failing test with flaky and open GitHub Discussion. After 3 consecutive nightly failures, auto‑page ops@stella-ops.org. Root‑cause within next sprint or quarantine behind feature flag (max 2 weeks). Token‑expiry tests cannot be quarantined — they guard offline operability.

9 Change log

Version Date Notes
v2.0 2025‑07‑12 Full overhaul: mutation tests, CodeQL/Semgrep, chaos layer, role‑based escalation, perf/security budgets aligned with SRS.
v1.0 2025‑07‑09 Original minimal overview

(End of Test‑Suite Overview v2.0)