tests/stress/live_accuracy.mjs: classroom-scale accuracy + latency
test that targets the deployed server (single-session, sid=main).
Logs in as admin via /admin/login, resets the session, joins N
students serially over HTTP, opens N student WebSockets in batches
of 8 (250ms apart) plus the instructor WS, then drives every
question through the admin "next" command. Each student picks
uniformly random A-D, sends the submit, waits for the submit_ack,
and records the round-trip latency. After session_ended, the script
verifies that every student whose pick == correct got score > 0,
every other submission got score == 0, and reports p50/p95/p99
ack latency. First live run: 50 students, 100 submits, 100% acks,
100% accuracy match, p99 555ms (≈intercontinental RTT to HK).
tests/stress/live_loop.sh: tmux-friendly loop that runs the live
test every 60s and appends a JSONL summary line per cycle to
runs/live_summary.jsonl. Mirrors the morning's api_stress run_loop
shape so per-cycle aggregates are easy to scrape.
app/rate_limit.py: tiny in-memory token bucket. Capacity + refill
in tokens/minute, keyed by client IP via X-Forwarded-For (with a
fallback to request.client.host). Process-local state — admin
login is the only user.
POST /admin/login: rate-limited at 10 attempts/minute/IP. Generous
for the legit instructor (who succeeds in 1-2 tries) and prohibitive
for brute force from a single attacker IP. Student endpoints
deliberately NOT rate-limited because campus students share NAT
gateways and IP-level limits would false-positive a whole class.
The bucket is per-app-instance (instantiated inside the router
factory), so test apps each get a fresh one and tests don't poison
each other.
Per-question score is now a float in [0.0, 1.0] snapped to a 21-level
0.05 grid, replacing the previous 0-1000 integer scale. Easier to read
on a leaderboard, ties become acceptable rather than vanishingly rare,
and small clock-skew differences no longer split rankings.
DB schema: score is REAL now (SQLite type affinity is loose enough that
existing rows still read fine, but new inserts go in as floats).
Frontend: added fmtScore() helpers in admin.js and quiz.js to render
two decimal places consistently (0.85, 1.20, 5.00) so float-arithmetic
sums never display as 0.8500000000000001.
Tests: linear_decay/flat/exponential_decay assertions updated; added
a snap-to-grid invariant test.
Three coupled fixes from the first manual test pass:
1. Stale signed cookie no longer 500s. `rooms.me()` now raises KeyError
when the participant row is gone (the previous code deref'd None and
threw TypeError, caught by quiz.js's catch-all as 'link expired').
`/api/session/{sid}/me` translates KeyError into 401 + delete_cookie,
so the client falls back to the join form cleanly.
Returning a JSONResponse directly because `raise HTTPException`
discards Response.delete_cookie mutations (FastAPI middleware
composes a fresh response on exception).
2. Reset is now a soft restart from the student's perspective. Before
closing each student WS in `RoomManager.reset`, the server now sends
a `{"type": "session_reset"}` message. The student SPA tears down
local state and re-runs boot(); /me returns 401 (now that the
participant is gone) and the join form renders without the user
having to manually reload. The WS close handler suppresses its
"Disconnected" screen during a reset to avoid a flash.
3. "You" highlight on the student leaderboard is now matched by id, not
by name. `RoomManager.leaderboard()` accepts an optional
`you_student_id` and stamps `is_you: true` on the matching entry only.
No other students' ids leak over the wire (we still don't include
`student_id` in the public top5 payload). quiz.js's renderBoard
prefers `r.is_you` when any row is marked, falling back to name match
for backward compatibility.
41/41 tests pass. Two new tests cover (a) the 401 + cookie-clear path
after reset and (b) `is_you` marking only the requesting student.
Backend simplification:
- The server now loads ONE pool JSON from $QUIZ_POOL_PATH at startup and
upserts a single canonical session. The session id comes from the pool
JSON's optional "session_id" field, falling back to $QUIZ_SESSION_ID.
- The multi-quiz / multi-session CRUD API is gone:
DELETED GET/POST /admin/api/quizzes
DELETED POST /admin/api/quizzes/upload
DELETED GET/POST /admin/api/sessions
DELETED GET /admin/login (HTML stub)
DELETED GET /admin/api/sessions/{sid}/csv (replaced by /admin/api/csv)
Replaced with a single-session control surface:
GET /admin/ — serves admin.html unconditionally
GET /admin/api/state — admin-gated; pool meta + state + QR + join URL
POST /admin/api/reset — admin-gated; wipe submissions + back to lobby
POST /admin/logout — clear admin cookie
GET /admin/api/csv — single-session results
WS /ws/instructor/{sid} — kept; new commands "next" + "reset"
- Instructor "Next" button is now a single state-driving command
(RoomManager.advance_to_next): from lobby it opens Q0; from question_open
it closes the current Q and opens the next; from question_closed it
opens the next; if past the last question it ends the session.
- New RoomManager.reset wipes submissions, participants, and per-question
state, then broadcasts a clean lobby.
- Student GET / now redirects to /?sid=<canonical> when no sid is given,
so the QR / share URL is fully deterministic.
Frontend rewrite (functional baseline; visual polish to follow):
- /admin/ is now a single SPA: GET /admin/api/state decides login form
vs dashboard. No separate /admin/login URL bar.
- Admin dashboard is state-driven with one primary action per state.
QR code, join URL, and live participant list are always visible on the
left so the operator can leave the page on a projector.
- Student answer buttons are big and tappable; reveal screen highlights
correct/wrong choice + shows score, total, and rank.
- Static admin/student SPAs share a CSS palette with light/dark support.
Tests rewritten around the single canonical session id.
The auto-bootstrapped session lets each test fixture skip the old
quiz/session creation dance. 39/39 tests pass.
Cleanup:
- Deleted CODEX_PROMPT.md, IMPLEMENTATION_REPORT.md, NOTES.md, SPEC.md,
static/observer.html (obsolete codex-build artifacts and the unused
observer view).
- .gitignore now blocks /pool.json (the runtime file the operator drops
on the server) and the leftover .codex_done / codex_run.log / etc.
- bootstrap.sh seeds /opt/quiz/pool.json from examples/pool_example.json
on first deploy so a fresh box reaches a usable state without manual
intervention; .env now includes QUIZ_POOL_PATH.
Two suites under tests/stress/, plus a tmux-friendly run_loop.sh
runner. Both boot a fresh uvicorn on an isolated DB per cycle and
log JSON line summaries to runs/.
api_stress.mjs covers WS-level scenarios that the existing pytest
suite does not exercise: 20-student happy path, late joiners with
correct remaining_ms, mid-question disconnect, browser-sleep + wake
to a different question_idx, cookie tampering and cross-session
cookie reuse, duplicate student_id, bad submit (out-of-order, wrong
idx, resubmit no-op), close-boundary race with auto-close, malformed
JSON fuzz, and flaky reconnect.
ui_stress.mjs drives the same flows in a real Chromium context via
playwright: happy UI, sleep/wake by closing+reopening a context with
the persisted cookie, document.cookie tampering attempt, and two
browser contexts joining with the same student_id.
Findings will be summarised in runs/summary.jsonl over time. One known
issue surfaces from the fuzz scenario: app/room.py student_ws's
receive_json call propagates JSONDecodeError out of the only
try/except (which catches WebSocketDisconnect), killing that client's
WS handler. Other clients are unaffected.