Setpoint: Local-First AI Coaching System

Featured ProjectApr 2026 – Present

Setpoint: Local-First AI Coaching System

A local-first AI coaching system built for a single user (me): a FastAPI process serving a Next.js PWA over LAN, with $0 hosting, no cloud database, and no auth (the network is the perimeter). The interesting engineering sits on three axes. First, a 25-tag movement-pattern taxonomy that makes the injury filter anatomy-correct rather than name-correct, when an elbow injury is toggled on, tricep extensions get swapped because the medial epicondyle is the flexor-pronator origin, but biceps curls stay untouched because the biceps brachii inserts at the radial tuberosity and doesn't load that joint. Second, a hybrid BM25 + cosine retrieval layer over a curated 27-article knowledge base, chunked into ~229 sections, embedded with sentence-transformers/all-MiniLM-L6-v2 (384-dim, on-device, free) and stored as BLOB in SQLite, originally targeted sqlite-vec, pivoted to a numpy linear scan when Python 3.13's framework build couldn't load extensions; the scan is sub-millisecond at this corpus size. Third, a deliberate local-first architecture (laptop-as-server, phone-as-client, Bonjour discovery, offline write-queue) that trades cloud convenience for $0 cost, no vendor risk, and full data ownership. Built in five sprints, with Claude as architect-collaborator on every design call, including the moment the work was being silently written to a shell sandbox overlay that didn't persist, and we salvaged 27 articles by HTTP-dumping the running backend before rebuilding on real disk.

→25-tag movement-pattern engine · hybrid BM25 + cosine RAG · 27 KB articles · LAN-served PWA · $0 hosting

Hybrid BM25 + Cosine RAG25 Movement-Pattern Tags27 KB Articles · 229 ChunksLAN-Served PWA$0 Hosting

PythonFastAPINext.js 14SQLite + FTS5PWA · Local-FirstMiniLM embeddings (on-device)Hybrid BM25 + Cosine RAGOpenRouter (Claude / Gemini)Open-Meteo

View on GitHub

Pipeline Architecture

Phone opens the PWA over LAN · FastAPI serves from the laptop · the pattern-based filter recomputes the week's plan against active injuries · the RAG coach retrieves cited KB passages via hybrid BM25 + cosine · the response is grounded in live user state and KB context, all running on one machine, $0 hosted.

What’s actually happening at each stage

Each stage is explained twice, first for the finance reader, then for the engineer.

1. Pattern-Based Injury Engine: Anatomy First, Not Names

Finance lens

Most fitness apps swap exercises by name. A real injury filter has to swap by anatomy. The medial epicondyle is the origin of the flexor-pronator group, so tricep extensions and gripping under load aggravate it, but biceps curls don't load that joint at all because the biceps brachii inserts at the radial tuberosity. The right answer is to swap the right movements and leave the rest alone. A 25-tag movement-pattern taxonomy makes the filter anatomy-correct rather than name-correct, and the entire week's plan recomputes the moment an injury is toggled on or off.

Engineering lens

Each exercise is tagged into one of 25 movement patterns (grip_load, valgus_elbow, tricep_extension_extreme, deep_knee_flexion_load, etc.). Each of 5 injury types declares a {pattern: severity} map plus per-pattern swap targets, reduction cues, and force-drop overrides. Filter logic scores each plan exercise against the union of active injuries with strict drop > swap > reduce > cue precedence. Active injuries persist in SQLite, toggle live via POST /injuries/{name}, and the filter re-runs on every /programme/today and /programme/week fetch, instant recompute, instant revert.

2. Hybrid BM25 + Cosine RAG over a Curated KB

Finance lens

When the coach answers a question, it cites which articles it pulled from. The knowledge base is curated, articles across training, nutrition, recovery, and programming, chunked by heading. Retrieval is hybrid: keyword search catches literal phrasing, semantic search catches conceptual matches, and the two are fused so the right passage surfaces whether the user typed the exact term or paraphrased it.

Engineering lens

27 wiki articles chunked into ~229 sections, embedded with sentence-transformers/all-MiniLM-L6-v2 (384-dim, on-device, free), stored as BLOB in SQLite. Retrieval fuses FTS5 BM25 + numpy cosine via reciprocal-rank: α/(60+bm25_rank) + (1-α)/(60+vec_rank). Originally targeted sqlite-vec but Python 3.13's framework build can't load extensions, so I pivoted to a numpy linear scan, sub-millisecond over ~150 chunks at this corpus size. The coach prompt is a single template with user_context (live state: today's macros, last six sets, weight EMA, readiness, active injuries) plus retrieved chunks, routed through OpenRouter to Gemini or Claude with [topic §heading] citations enforced by the system prompt.

3. Research-Cited RAMP Warmup Generator

Finance lens

Every training day surfaces a structured warmup card before the workout: cardio, dynamic mobility (no static stretches, they measurably cut force output), activation work for the muscles that protect today's primary lift, and a specific %-of-working ramp on the first compound. Each section carries a one-line rationale and a citation, so the user can see why each block is there rather than treating the warmup as ritual.

Engineering lens

WARMUP_BLOCKS is a dict keyed by workout name, each value a list of section objects (section, duration, rationale, items). The PreSession response carries an optional sections[] field; the React PreSessionCard renders the structured form when present, flat-list fallback otherwise. Mobility recommendations are dynamic-only, Behm 2016 meta-analysis shows static stretching ≥30 s pre-lift impairs force output 5–8%. Specific ramp uses %-of-working sets per Israetel/Nippard guidance and stays sub-maximal, because post-activation potentiation (Wilson 2013) requires it.

4. Phase-Aware Programming + Weather-Driven Cardio

Finance lens

Two recurring decisions are automated. First, phase-correct programming: a 12-week plan has three phases, training and rest days have different targets, and today's numbers are computed from today's date, no manual reconfiguration. Second, outdoor cardio scheduling: the system scores the next 7 days of weather and surfaces the best contiguous 4-hour block per day, with a calorie estimate calibrated to the actual ride hardware (a heavy bikeshare bike burns 10–15% more than a personal road bike at the same speed).

Engineering lens

PHASES array keyed by (weeks_tuple, train_kcal, rest_kcal, protein_g). current_phase(today) returns the right row by week-of-cycle; macros_for_today does a deterministic 30/25/45 split (protein-locked, fat 25% of kcal, rest as carbs) and the Today summary auto-flips at midnight. Weather scorer hits Open-Meteo's free hourly forecast cached 30 min, scoring each daylight hour 0–100 (peak at 18°C, penalised by precip > 10%, wind > 12 km/h, temp drift), then sliding-window finds the best 4-hour contiguous block per day. Calorie estimate uses the ACSM formula MET × kg × 3.5 / 200, calibrated for the actual hardware in use.

5. Local-First: Single-User, $0 Hosted

Finance lens

The whole system runs on a laptop. The phone connects over the same WiFi to log workouts, meals, and weights. No cloud, no $20/month database, no app store. The PWA installs to the home screen and looks like a native app, but the perimeter is the network, not auth. If the WiFi works, the system works. The original architectural plan was Supabase + cloud hosting; the design call ($0 cost is the constraint, but the laptop can't stay on 24/7) landed on this laptop-as-server / phone-as-client design instead.

Engineering lens

Single FastAPI process bound to 0.0.0.0:8003, two SQLite databases (KB + app data), Bonjour macbook.local for phone discovery, localStorage offline write-queue with auto-flush on /health ping every 30 s and on online/focus events. PWA via manifest.webmanifest so the phone can "Add to Home Screen." Mid-build catastrophe: the work was being silently written to a shell sandbox overlay that didn't persist, caught it, salvaged 27 wiki articles by HTTP-dumping the running backend to disk, rebuilt on real disk, and shipped. The kind of failure mode you only learn about by hitting it.

Methodology notes

Anatomy-aware injury filter built from a 25-tag movement-pattern taxonomy, exercises are scored against active injuries by pattern, not name, so tricep extensions get swapped but biceps curls stay untouched (the biceps brachii inserts at the radial tuberosity, doesn't load the medial epicondyle). 5 injury types each declare {pattern: severity} maps with strict drop > swap > reduce > cue precedence; the entire week's plan recomputes on injury toggle.

Hybrid BM25 + cosine retrieval over a curated 27-article KB chunked into ~229 sections, embedded with sentence-transformers/all-MiniLM-L6-v2 (384-dim, on-device, free) stored as BLOB in SQLite. Reciprocal-rank fusion. Originally targeted sqlite-vec; pivoted to a numpy linear scan when Python 3.13's framework build couldn't load extensions, sub-millisecond at this corpus size. [topic §heading] citations enforced by the system prompt.

Research-cited RAMP warmups (Behm 2016 on static-stretch force impairment, Wilson 2013 on PAP, Bishop 2003 on warm-up physiology, Cools/Reinold on scapular rehab), dynamic-only mobility because static stretches ≥30 s impair force output 5–8%; ramp sets stay sub-maximal because PAP requires it.

Local-first by design: single FastAPI process on the laptop bound to 0.0.0.0, two SQLite DBs, Bonjour discovery for the phone, offline write-queue on the PWA flushing on /health pings and online/focus events. $0 hosting, no cloud database, no auth, the network is the perimeter.