19 · Test‑Suite Overview — Stella Ops

(v2.0 — 12 Jul 2025)

Purpose — Describe the multi‑layer automated‑test strategy that guards Stella Ops’ five‑second performance promise, security posture and API stability, and show how each layer maps to CI gates and release criteria.

0 Table of Contents

Test‑pyramid at a glance
Layer definitions & tooling
Directory & naming conventions
CI workflows & failure policy
Quality gates & coverage budgets
Evidence retention & auditability
Local developer quick‑start
Flaky‑test triage & escalation
Change log

1 Test‑pyramid at a glance

Layer	Framework(s)	Scope	CI frequency
Unit	xUnit + FluentAssertions	Pure C# methods, guard clauses, mapping	Every PR
Mutation	Stryker.NET	Critical algorithm branches	Nightly
Static analysis	CodeQL, Semgrep	OWASP, injection, secrets	Every PR
Integration	Testcontainers + xUnit	Redis, Trivy exec, plug‑in hot‑load	Every PR
Quota / throttle	Testcontainers + Clock‑mock	333‑scan counter, 5 s & 60 s retry‑after headers	Every PR
End‑to‑End (UI)	Playwright C#	Login, scan list, mute flow	Merge→main
Performance	Hyperfine + K6	P95 latency, 40 rps throughput	Nightly
Security DAST	OWASP ZAP baseline	TLS headers, auth, XSS	Nightly + RC
Chaos / Resilience	Pumba & Toxiproxy	Redis latency, container kill	Weekly
Compliance smoke	Spectral + JSON‑Schema	SBOM & API payloads	Every PR
Token validity	xUnit + ClockMock	Expiry warning, OUK update refresh, `/token/offline` flow	Every PR

2 Layer definitions & tooling

2.1 Unit

Target ≥ 80 % line and ≥ 60 % branch coverage (coverlet + ReportGenerator).
Naming: Method_ShouldExpected_WhenCondition.

2.2 Mutation

Stryker.NET runs only on projects tagged critical‑logic=true in Directory.Build.props.
Threshold: ≥ 60 % mutation score; red build < 55 %.

2.3 Integration

RedisTestcontainer, TrivyServerTestcontainer, TestcontainersNetwork for realistic wiring.
Each test cleans keys and volumes; parallelisable.
Quota & throttle tests (new) — spin up Redis container, fix system clock to just before UTC midnight, hammer /scan with a stub token to validate:
1. Counter hits 200 → header X‑Stella‑Quota‑Remaining: 133; banner socket event emitted. Delay of 5 secs is added.
2. Counter hits 333 → Delay of 60 secs is added.
3. At UTC midnight rollover key expires → counter resets to 0.

### 2.4 Quota / throttle layer (explicit)

Uses the same fixture but runs in isolation to keep CI time predictable.
Fails the pipeline if any of the four behaviours above mis‑fires.

2.4 End‑to‑End

API suite asserts presence of X‑Stella‑Quota‑Remaining on every successful /scan.
API suite uses async httpx for accurate latency numbers.
UI suite uses Playwright headless Chromium; Lighthouse a11y snapshot recorded.

2.5 Performance

Hyperfine measures CLI workflows (SBOM_LOCAL, SBOM_REMOTE, IMAGE_WARM).
K6 hits /scan at 40 rps for 3 min; checks P95 ≤ 5 s and error‑rate = 0.

2.6 Security (DAST + SAST)

PHASE QUOTA_WAIT benchmark:
- ≤ 5 s median for first 30 blocked requests (soft back‑off).
- Exactly 60 s wall for hard wait‑wall.
SAST: CodeQL (GitHub native) + Semgrep OSS ruleset.
DAST: ZAP baseline spider + passive rules; fails on High risk alerts.

2.7 Chaos / Resilience

Pumba randomly kills Trivy side‑car; test asserts queue retry.
Toxiproxy injects 150 ms latency on Redis; perf budget still ≤ 6 s.

3 Repository layout

tests/
├─ unit/                 # *.Unit.csproj
├─ mutation/stryker.conf.json
├─ integration/          # *.Integration.csproj
│   └─ fixtures/
├─ e2e/
│   ├─ api/pytest/       # test_*.py
│   └─ ui/playwright/    # *.spec.ts
├─ perf/
│   ├─ compose-perf.yml
│   ├─ hyperfine/
│   └─ k6/
├─ security/
│   ├─ zap-baseline.conf
│   └─ semgrep/
└─ chaos/
    ├─ toxiproxy/
    └─ pumba/

Tests mirror the module namespaces; each src project owns a matching test project.

4 CI workflows

File	Trigger	Stages
ci.yml	Push / PR Lint → Unit → Static analysis → Integration
e2e.yml	Merge→main	Compose stack → API+UI Playwright
perf.yml	Nightly	Hyperfine + K6; update Grafana JSON
security.yml	Nightly	ZAP baseline, Trivy FS, CodeQL
mutation.yml	Nightly	Stryker.NET; comment PR if < threshold
chaos.yml	Weekly (cron)	Toxiproxy + Pumba scenarios
release.yml	Tag	Run all above + evidence bundling
Failure policy: any Red gate blocks merge; nightly failures ping #stella-ci.

5 Quality gates & budgets

Metric	Threshold	Source	Maps to KPI
Line coverage	≥ 80 %	Unit, Integration Maintainability
Mutation score	≥ 60 %	Stryker Defect escape
P95 SBOM‑first	≤ 5 s	Hyperfine	Product promise
P95 QUOTA_WAIT (soft)	≤ 10 s	Hyperfine + Clock‑mock	Predictable throttling
Hard wait‑wall accuracy	60 ± 1 s	Hyperfine	Compliance with spec
P95 image‑unpack	≤ 10 s	Hyperfine	SRS FR‑IMG‑1
/scan error‑rate	0	K6	Reliability
ZAP High alerts	0	ZAP JSON	Security NFR
Trivy Critical CVEs in release SBOM	0 Trivy FS	NFR‑SEC‑1
Offline token expiry warning lead‑time	≥ 7 days	Token tests

Coverage & perf budgets live in tests/budgets/*.json; CI actions fail on regression.

6 Evidence retention

Artefact	Retention	Storage
Hyperfine & K6 CSV	18 months	GitHub artefacts → S3
Mutation reports	6 months	S3
ZAP & Trivy SARIF	18 months	GitHub Security tab
Playwright videos	Last 50 builds	MinIO

Test logs (JUnit/Allure) 12 months S3, lifecycle policy

7 Developer quick‑start

Bring up full stack for e2e on a laptop

docker compose -f tests/e2e/compose-core.yml up -d

Run unit + integration

dotnet test --collect:"XPlat Code Coverage"

# API e2e
cd tests/e2e/api
pytest -q

# UI e2e
cd tests/e2e/ui
npx playwright install
npm test

8 Flaky‑test triage & escalation

Label failing test with flaky and open GitHub Discussion. After 3 consecutive nightly failures, auto‑page ops@stella-ops.org. Root‑cause within next sprint or quarantine behind feature flag (max 2 weeks). Token‑expiry tests cannot be quarantined — they guard offline operability.

9 Change log

Version	Date	Notes
v2.0	2025‑07‑12	Full overhaul: mutation tests, CodeQL/Semgrep, chaos layer, role‑based escalation, perf/security budgets aligned with SRS.
v1.0	2025‑07‑09	Original minimal overview

(End of Test‑Suite Overview v2.0)

19 · Test‑Suite Overview — Stella Ops

0 Table of Contents

1 Test‑pyramid at a glance

2 Layer definitions & tooling

2.1 Unit

2.2 Mutation

2.3 Integration

2.4 End‑to‑End

2.5 Performance

2.6 Security (DAST + SAST)

2.7 Chaos / Resilience

3 Repository layout

4 CI workflows

5 Quality gates & budgets

6 Evidence retention

7 Developer quick‑start