19 · Test‑Suite Overview — Stella Ops
(v2.0 — 12 Jul 2025)
Purpose — Describe the multi‑layer automated‑test strategy that guards Stella Ops’ five‑second performance promise, security posture and API stability, and show how each layer maps to CI gates and release criteria.
0 Table of Contents
- Test‑pyramid at a glance
- Layer definitions & tooling
- Directory & naming conventions
- CI workflows & failure policy
- Quality gates & coverage budgets
- Evidence retention & auditability
- Local developer quick‑start
- Flaky‑test triage & escalation
- Change log
1 Test‑pyramid at a glance
Layer | Framework(s) | Scope | CI frequency |
---|---|---|---|
Unit | xUnit + FluentAssertions | Pure C# methods, guard clauses, mapping | Every PR |
Mutation | Stryker.NET | Critical algorithm branches | Nightly |
Static analysis | CodeQL, Semgrep | OWASP, injection, secrets | Every PR |
Integration | Testcontainers + xUnit | Redis, Trivy exec, plug‑in hot‑load | Every PR |
Quota / throttle | Testcontainers + Clock‑mock | 333‑scan counter, 5 s & 60 s retry‑after headers | Every PR |
End‑to‑End (UI) | Playwright C# | Login, scan list, mute flow | Merge→main |
Performance | Hyperfine + K6 | P95 latency, 40 rps throughput | Nightly |
Security DAST | OWASP ZAP baseline | TLS headers, auth, XSS | Nightly + RC |
Chaos / Resilience | Pumba & Toxiproxy | Redis latency, container kill | Weekly |
Compliance smoke | Spectral + JSON‑Schema | SBOM & API payloads | Every PR |
Token validity | xUnit + ClockMock | Expiry warning, OUK update refresh, /token/offline flow |
Every PR |
2 Layer definitions & tooling
2.1 Unit
- Target ≥ 80 % line and ≥ 60 % branch coverage (
coverlet
+ ReportGenerator). - Naming:
Method_ShouldExpected_WhenCondition
.
2.2 Mutation
- Stryker.NET runs only on projects tagged
critical‑logic=true
inDirectory.Build.props
. - Threshold: ≥ 60 % mutation score; red build < 55 %.
2.3 Integration
-
RedisTestcontainer
,TrivyServerTestcontainer
,TestcontainersNetwork
for realistic wiring. -
Each test cleans keys and volumes; parallelisable.
-
Quota & throttle tests (new) — spin up Redis container, fix system clock to just before UTC midnight, hammer
/scan
with a stub token to validate:- Counter hits 200 → header
X‑Stella‑Quota‑Remaining: 133
; banner socket event emitted. Delay of 5 secs is added. - Counter hits 333 → Delay of 60 secs is added.
- At UTC midnight rollover key expires → counter resets to 0.
- Counter hits 200 → header
### 2.4 Quota / throttle layer (explicit)
- Uses the same fixture but runs in isolation to keep CI time predictable.
- Fails the pipeline if any of the four behaviours above mis‑fires.
2.4 End‑to‑End
- API suite asserts presence of
X‑Stella‑Quota‑Remaining
on every successful/scan
. - API suite uses async httpx for accurate latency numbers.
- UI suite uses Playwright headless Chromium; Lighthouse a11y snapshot recorded.
2.5 Performance
- Hyperfine measures CLI workflows (
SBOM_LOCAL
,SBOM_REMOTE
,IMAGE_WARM
). - K6 hits
/scan
at 40 rps for 3 min; checks P95 ≤ 5 s and error‑rate = 0.
2.6 Security (DAST + SAST)
- PHASE QUOTA_WAIT benchmark:
- ≤ 5 s median for first 30 blocked requests (soft back‑off).
- Exactly 60 s wall for hard wait‑wall.
- SAST: CodeQL (GitHub native) + Semgrep OSS ruleset.
- DAST: ZAP baseline spider + passive rules; fails on High risk alerts.
2.7 Chaos / Resilience
- Pumba randomly kills Trivy side‑car; test asserts queue retry.
- Toxiproxy injects 150 ms latency on Redis; perf budget still ≤ 6 s.
3 Repository layout
tests/
├─ unit/ # *.Unit.csproj
├─ mutation/stryker.conf.json
├─ integration/ # *.Integration.csproj
│ └─ fixtures/
├─ e2e/
│ ├─ api/pytest/ # test_*.py
│ └─ ui/playwright/ # *.spec.ts
├─ perf/
│ ├─ compose-perf.yml
│ ├─ hyperfine/
│ └─ k6/
├─ security/
│ ├─ zap-baseline.conf
│ └─ semgrep/
└─ chaos/
├─ toxiproxy/
└─ pumba/
Tests mirror the module namespaces; each src project owns a matching test project.
4 CI workflows
File | Trigger | Stages |
---|---|---|
ci.yml | Push / PR Lint → Unit → Static analysis → Integration | |
e2e.yml | Merge→main | Compose stack → API+UI Playwright |
perf.yml | Nightly | Hyperfine + K6; update Grafana JSON |
security.yml | Nightly | ZAP baseline, Trivy FS, CodeQL |
mutation.yml | Nightly | Stryker.NET; comment PR if < threshold |
chaos.yml | Weekly (cron) | Toxiproxy + Pumba scenarios |
release.yml | Tag | Run all above + evidence bundling |
Failure policy: any Red gate blocks merge; nightly failures ping #stella-ci. |
5 Quality gates & budgets
Metric | Threshold | Source | Maps to KPI |
---|---|---|---|
Line coverage | ≥ 80 % | Unit, Integration Maintainability | |
Mutation score | ≥ 60 % | Stryker Defect escape | |
P95 SBOM‑first | ≤ 5 s | Hyperfine | Product promise |
P95 QUOTA_WAIT (soft) | ≤ 10 s | Hyperfine + Clock‑mock | Predictable throttling |
Hard wait‑wall accuracy | 60 ± 1 s | Hyperfine | Compliance with spec |
P95 image‑unpack | ≤ 10 s | Hyperfine | SRS FR‑IMG‑1 |
/scan error‑rate | 0 | K6 | Reliability |
ZAP High alerts | 0 | ZAP JSON | Security NFR |
Trivy Critical CVEs in release SBOM | 0 Trivy FS | NFR‑SEC‑1 | |
Offline token expiry warning lead‑time | ≥ 7 days | Token tests |
Coverage & perf budgets live in tests/budgets/*.json; CI actions fail on regression.
6 Evidence retention
Artefact | Retention | Storage |
---|---|---|
Hyperfine & K6 CSV | 18 months | GitHub artefacts → S3 |
Mutation reports | 6 months | S3 |
ZAP & Trivy SARIF | 18 months | GitHub Security tab |
Playwright videos | Last 50 builds | MinIO |
Test logs (JUnit/Allure) 12 months S3, lifecycle policy
7 Developer quick‑start
Bring up full stack for e2e on a laptop
docker compose -f tests/e2e/compose-core.yml up -d
Run unit + integration
dotnet test --collect:"XPlat Code Coverage"
# API e2e
cd tests/e2e/api
pytest -q
# UI e2e
cd tests/e2e/ui
npx playwright install
npm test
8 Flaky‑test triage & escalation
Label failing test with flaky and open GitHub Discussion. After 3 consecutive nightly failures, auto‑page ops@stella-ops.org. Root‑cause within next sprint or quarantine behind feature flag (max 2 weeks). Token‑expiry tests cannot be quarantined — they guard offline operability.
9 Change log
Version | Date | Notes |
---|---|---|
v2.0 | 2025‑07‑12 | Full overhaul: mutation tests, CodeQL/Semgrep, chaos layer, role‑based escalation, perf/security budgets aligned with SRS. |
v1.0 | 2025‑07‑09 | Original minimal overview |
(End of Test‑Suite Overview v2.0)