Cap Table Reconciler: Pre-Valuation Data Hygiene

Featured ProjectMay 2026

Cap Table Reconciler: Pre-Valuation Data Hygiene

A Flask + HTMX tool that ingests a messy client cap-table .xlsx, runs a deterministic eight-rule gap-detection checklist, and produces a structured liquidation waterfall an analyst can hand to an OPM, Backsolve, or DCF. Built solo over one week ahead of a private-markets interview at Qapita. The posture is expert-led, not algorithm-only: when the file is too messy to defensibly compute on, the tool refuses and writes the punch list. That refusal, and the punch list, is the deliverable.

→Eight checklist rules · five fixtures with public-source provenance · live-formula .xlsx export (IF + SUMPRODUCT against named ranges) · text-PDF side-letter intake · 331 tests collected · public stress-test fixture documenting the holder-level limit honestly

Eight Checklist RulesFive Curated Fixtures + Stress TestLive-Formula .xlsx ExportPDF Side-Letter Intake331 Tests Collected

PythonFlaskHTMXopenpyxlpdfplumberpydantic v2Cap-Table CleanupLiquidation WaterfallBreakpoint EngineNVCA · VIMA · SEA precedent

View on GitHub Open the demo site

See it in action

axiom.app/dashboard

Home → messy Pelaut fixture → waterfall + live sensitivity slider → static .xlsx export.

Pipeline Architecture

A single .xlsx goes in · the parser normalizes what it can and reports what it dropped · the checklist surfaces eight categories of gap, each citing the source field · the analyst adjudicates in-session · the waterfall recomputes against the resolved cap table · a structured bundle exports (JSON, static .xlsx, live-formula .xlsx, audit-memo skeleton).

What’s actually happening at each stage

Each stage is explained twice, first for the finance reader, then for the engineer.

1. Zero Valuation Judgments: by Design

Finance lens

The premise: most of an analyst's hours on a private-company valuation aren't spent on judgment. They're spent reconciling what the client sent, a stale option pool, a SAFE that should have converted at the last round, a side letter with terms nobody transcribed, an anti-dilution column that says "see charter." Cleaning that file before any math runs is the work the tool does. Valuation, OPM allocation, DLOM, 409A, all explicitly out of scope. The tool is the input layer to the analyst's workflow, not a replacement for it.

Engineering lens

No LLM in the calculation pipeline. No probabilistic scoring of findings. Every checklist rule was authored by hand against a documented kind of real-world cap-table mess. Each finding carries a fields_referenced tuple pointing at the exact source cell, so any reviewer can verify the flag by opening the workbook. Side letters are stored verbatim; open Q-prefixed lines surface for in-session adjudication; no model is allowed to "interpret" charter terms.

2. The Eight Checklist Rules

Finance lens

Anti-dilution variant blank · full-ratchet documentation missing · unconverted SAFE past trigger · stale option pool · off-table warrant · side-letter scope unresolved · participating-with-cap missing the cap multiple · dual-class voting differential not recorded. Each was chosen because it is a documented practitioner gripe in NVCA Model Charter commentary, Singapore VIMA, or worked-example precedent. Findings sort blocker → warning → info; blockers prevent the waterfall from running until adjudicated.

Engineering lens

Each rule is a pure function over the parsed CapTable model (pydantic v2). Findings with a deterministic resolution shape (SAFE conversion share count, AD variant picker, warrant inclusion toggle, side-letter scope, pool date refresh) have inline HTMX-driven resolve forms. Submit, the cap table updates in memory, the waterfall re-runs, findings re-evaluate. No page round-trip, no model deciding what counts.

3. Waterfall Engine: Breakpoints, Allocation Matrix, Live Sensitivity

Finance lens

The waterfall produces the breakpoint set an OPM Backsolve consumes: LP-cleared thresholds, conversion thresholds, participation-cap thresholds, pure-conversion thresholds. Handles non-participating, participating-uncapped, and participating-with-cap including the senior-above-capped case (a Delaware double-cap fixture stress-tests this path). A cumulative-payout chart renders the per-class slope changes; the tranche allocation matrix shows the fraction of a marginal dollar going to each class within each tranche.

Engineering lens

A what-if scenario panel lets the analyst override share counts and LP multiples; the chart redraws and the breakpoint table populates with deltas within ~400ms. In-memory only, the saved cap table doesn't change. The same engine drives both the in-browser view and the exported live-formula .xlsx, where every breakpoint cell carries an IF/SUMPRODUCT formula against named input ranges. Edit a share count in Excel, the chart and breakpoints update natively without Python.

4. Side-Letter PDF Intake: Without Hallucinating

Finance lens

Real cap-table cleanup runs into side letters constantly, MFN clauses with undefined scope, redemption rights whose enforceability is in dispute, IPO vetos at specific valuation thresholds. The temptation is to "extract structured terms" via an LLM and apply them as overrides. That's exactly what an audit-defensible tool shouldn't do. Instead, the PDF is parsed text-only with pdfplumber; the analyst records counsel's adjudicated answer into a textarea that gets appended to the side-letter body. Image-only scanned PDFs return an empty-text warning rather than a fabricated transcription.

Engineering lens

pdfplumber-based extraction. Heuristic title detection (first non-blank line < 120 chars without a trailing period). Open-question detection scans for lines starting with Q:, Question:, or TODO:, these populate an unresolved_questions array that the checklist surfaces as a side-letter-scope-unresolved finding. The analyst's adjudication clears the finding and appends to the canonical side-letter body so the audit memo cites both the source PDF and the in-session resolution.

5. Honest About the Limits: Including the One That Hurts

Finance lens

A sixth fixture (Sundar Foods, fictional Indian D2C Series B-1) is included as a stress test, not a curated demo. It's built the way Qapita's actual customer base sends cap tables: holder-level rows (one per shareholder, not per class), tab names that don't match the parser's whitelist ("Conv. Instruments", "Co. Info"), prose in numeric columns, dates like "14th Feb '25", and off-charter terms living only in side-letter PDFs. The engine intentionally degrades on it, only the ESOP rows survive validation. The result is a 72-warning parse report rather than a fabricated waterfall.

Engineering lens

That report is the deliverable on a real-world file. The audit-defensible record of what an analyst would email back to the CFO before any cleanup work can begin. The roadmap fix (holder-level rollup pass, broader tab-name whitelist, fuller class-type alias map covering "Equity Common" / "CCPS Series B-1") is real and known, and not shipped yet, on purpose, until I've had a conversation about which customer-file patterns to prioritize. Visible-honest beats invisible-broken.

Methodology notes

Eight deterministic gap-detection rules (src/checklist.py), each pure function over the parsed pydantic CapTable, each finding citing a fields_referenced path back to the source cell.

Breakpoint engine (src/waterfall.py) covering non-participating, participating-uncapped, and participating-with-cap (including the senior-above-capped case from the NVCA worked examples).

Live-formula .xlsx exporter (src/formula_workbook.py), chart and breakpoint formulas remain valid for input edits (share counts, prices, LP multiples); structural changes (LP type change, seniority swap, participation cap added) require a fresh export. The validity caveat is shipped with the file.

Holder-level cap tables (one row per shareholder rather than per class, common in CFO-built Indian Excel) degrade to a structured warning report rather than a clean ingest. Documented in sample_uploads/sundar_foods/ as a public stress test, not buried as a roadmap line.

What this isn’t (yet)

The honest limits. A page called “honest” with no limitations would be a credibility own-goal.

Holder-level cap-table layout

A typical Indian CFO Excel pattern, one row per shareholder rather than one row per share class, degrades to a 70+ warning report rather than a clean ingest. Duplicate class names across holder rows are rejected at the pydantic preferred-must-have-LP validator. Stress-tested publicly in sample_uploads/sundar_foods/.

Tab-name detection is exact-match

The parser whitelists specific tab names ("Cap Table", "Capitalization", "Convertibles", "Company", "Side Letters"). Non-standard names like "Conv. Instruments" or "Co. Info" are skipped silently. Broader fuzzy-matching is a roadmap item.

Off-charter side-letter terms are flagged, not applied

A 2x LP override or full-ratchet override that lives only in a side letter is surfaced as a finding for analyst adjudication; the charter values on the cap table are not auto-overridden. This is intentional, applying off-charter terms silently is exactly the failure mode the tool was built to avoid.

No SAFE / convertible-note auto-conversion

Cap vs discount, pre vs post-money, MFN logic, too many opinions to be implicit. The analyst converts these to structured share classes during the mapping step. The parser ingests SAFEs and notes as a flat list and the checklist flags ones past their trigger; the math is the analyst's call.

Demo-grade application surface

In-memory + SQLite session, single Flask process, no auth, local-only at http://localhost:5050. The engine is the deliverable; the surrounding web app is illustrative. Production hardening (multi-tenancy, audit log persistence, file encryption at rest) is out of scope for the prototype.