tests/stress/live_accuracy.mjs: classroom-scale accuracy + latency
test that targets the deployed server (single-session, sid=main).
Logs in as admin via /admin/login, resets the session, joins N
students serially over HTTP, opens N student WebSockets in batches
of 8 (250ms apart) plus the instructor WS, then drives every
question through the admin "next" command. Each student picks
uniformly random A-D, sends the submit, waits for the submit_ack,
and records the round-trip latency. After session_ended, the script
verifies that every student whose pick == correct got score > 0,
every other submission got score == 0, and reports p50/p95/p99
ack latency. First live run: 50 students, 100 submits, 100% acks,
100% accuracy match, p99 555ms (≈intercontinental RTT to HK).
tests/stress/live_loop.sh: tmux-friendly loop that runs the live
test every 60s and appends a JSONL summary line per cycle to
runs/live_summary.jsonl. Mirrors the morning's api_stress run_loop
shape so per-cycle aggregates are easy to scrape.
app/rate_limit.py: tiny in-memory token bucket. Capacity + refill
in tokens/minute, keyed by client IP via X-Forwarded-For (with a
fallback to request.client.host). Process-local state — admin
login is the only user.
POST /admin/login: rate-limited at 10 attempts/minute/IP. Generous
for the legit instructor (who succeeds in 1-2 tries) and prohibitive
for brute force from a single attacker IP. Student endpoints
deliberately NOT rate-limited because campus students share NAT
gateways and IP-level limits would false-positive a whole class.
The bucket is per-app-instance (instantiated inside the router
factory), so test apps each get a fresh one and tests don't poison
each other.
Two suites under tests/stress/, plus a tmux-friendly run_loop.sh
runner. Both boot a fresh uvicorn on an isolated DB per cycle and
log JSON line summaries to runs/.
api_stress.mjs covers WS-level scenarios that the existing pytest
suite does not exercise: 20-student happy path, late joiners with
correct remaining_ms, mid-question disconnect, browser-sleep + wake
to a different question_idx, cookie tampering and cross-session
cookie reuse, duplicate student_id, bad submit (out-of-order, wrong
idx, resubmit no-op), close-boundary race with auto-close, malformed
JSON fuzz, and flaky reconnect.
ui_stress.mjs drives the same flows in a real Chromium context via
playwright: happy UI, sleep/wake by closing+reopening a context with
the persisted cookie, document.cookie tampering attempt, and two
browser contexts joining with the same student_id.
Findings will be summarised in runs/summary.jsonl over time. One known
issue surfaces from the fuzz scenario: app/room.py student_ws's
receive_json call propagates JSONDecodeError out of the only
try/except (which catches WebSocketDisconnect), killing that client's
WS handler. Other clients are unaffected.