Append-only. Scoring-logic or weight changes must land with an entry here (rule 10).
seed v0.3: Stage A skeleton + harvest layer (2026-06-29)
Sprint 1 + CAN-07: method package & seed import
- Frozen ontology v0.2 in pydantic v2; context entities have no score field (structural).
- Deterministic, domain-isolated scorer; published missing-data penalty (factor 0.5);
weights in `scenarios.yaml` (3 placeholder weightings).
- ER stub: no auto-merge < 0.95; Aggarwal != Nielsen.
- Imported seeds: 573 books / 162 papers / 183 persons / 132 orgs / 90 platforms;
250 descriptions; 573 categorized_as + 172 authored_by edges; Trap conflict-flagged.
Sprint 2: harvest layer (CAN-09 / CAN-10 / CAN-12)
- `data/raw/` write-once store with per-source sha256 manifest (rule 6).
- OpenAlex harvester: live fetch into the write-once cache; metrics derived from the
cached snapshot (offline-deterministic); no match / offline => declared gap, never imputed.
- Manual CSV-drop path (WorldCat / Open Syllabus): each row carries provenance or fails.
- `canon.harvest.assemble`: derives + validates + dedupes metrics (highest confidence wins)
into `data/resolved/metrics.json` with a `coverage.json` report.
- Scorer gains `--corpus` mode: ranks real works that have harvested evidence; works
without evidence are an honestly-declared coverage gap, not a fabricated zero.
Sprint 3: release builder, audit package, adversarial review (CAN-15/16/17)
- `canon.release`: frozen release under `data/releases/<version>/`: Top-50 per
(domain, scenario), full per-work breakdowns, divergence summary, a `Release`
governance record with a deterministic `corpus_hash` (date is metadata, not hashed),
coverage.json, and REPRODUCE.md. `--verify` rebuilds and asserts bit-identical (rule 3).
- `canon.redteam`: adversarial-review harness: reproducibility, provenance completeness,
domain isolation, no-imputation, conflict-flag surfacing, declared coverage, ranking
sanity, divergence honesty → `reports/red_team_findings.md` + a GATE-A verdict.
- **Pilot release `pilot-v0.1`: GATE A PASS (substantive)**: 0 blocking findings; reproducible.
- Second independent signal derived from the SAME cached OpenAlex snapshot (no new network):
`sustained_readership` = citations in 2023-2025 (recent momentum), distinct from all-time
`citation_count`. Assembled 174 metrics (88 citation_count + 86 sustained_readership).
- With two signals the three scenarios now produce **different orderings**
(`scenario_divergence: observed`). The method's central claim is demonstrated, not merely asserted.
- Adversarial loop ran the full two iterations: iteration 1 flagged a stale single-metric
`ranking_sanity` check (false positive); fixed to assert composite-score monotonicity; iteration 2 clean.
- Still-declared gaps: `library_holdings` / `syllabus_adoptions` await WorldCat / Open Syllabus CSV
drops; ~74 papers await the next OpenAlex daily-budget window.
Stage C: public site (CAN-21..25)
- `canon.export_site`: static generator (no framework, no JS deps, no tracker code) emits
`site/` from the release JSON + seeds: Canon-50 (3 scenario views), per-work breakdown
pages (the trust surface: every metric + provenance + missing-data penalty), papers shelf,
method, challenges, changelog, and a downloadable audit package under `site/audit/`.
- The approved homepage (`site/index.html`) is wired to the live pages; its Canon-50 teaser
is GENERATED (top-3 injected by the builder, idempotent) so the manifesto never carries
hand-typed ranking data that can drift.
- 59 pages, 621 internal links (0 broken), verified rendering in-browser. Deploy target:
`apparens.nl/ai-canon/` (Cloudflare Pages, static).
Design pass: align to apparens.nl + house style
- Generated chrome rewritten to mirror `apparens-design-system.css`: deep-blue fixed nav with
the white Apparens logo + serif wordmark, white body, orange `#B8430A`, DM Serif + DM Sans.
- The homepage is now GENERATED too (page_home), in the same design, so the whole site is
visually consistent and rebuilds from one place. Its Canon-50 teaser is the live top-3.
- House style: no em-dashes in any site copy (enforced by a test).
Acceptance audit response (decisions 1 to 6)
- **Library shipped** (`library.html`): all 573 candidate books, filterable by category / language /
provenance, descriptions where written and "Description pending" otherwise, conflict-of-interest flag
shown inline, labelled candidacy not canonical. Books are curated and browsable but not yet scored.
- **Context shelves shipped**: `voices.html` (183), `organizations.html` (132), `platforms.html` (90),
grouped by category, alphabetical within category, labelled "described, never ranked" (no score).
- Nav reordered so the Library leads: the reference library is the primary surface, the ranking is one view.
- Verbatim positioning line on the home page; verbatim humility clause on the Canon-50 and every per-work
page; significance lines added to the papers shelf.
- Open data: the audit page now offers JSON and CSV for the full corpus (books + papers + context).
- **Declared deferrals** (stated on the method page, not silently stubbed): per-ecosystem normalization
(rule 5) activates only when more than one ecosystem enters a scored domain, so the site makes no
worldwide / present-tense multilingual claim and the Chinese spine (28 works) is a declared gap; a fuller
longevity proxy (holdings over time, editions, availability); and book scoring. The pilot ranks papers
only, behind honest framing, and that scored view passed GATE A.
- House style enforced at the render boundary: even verbatim seed text shows no em-dashes; a test fails
the build on any em-dash in generated HTML.
- **Self-contained audit bundle (decision 3):** `canon.release` now emits `audit-bundle.zip`, a
byte-deterministic archive carrying the pipeline code, weights, pinned data snapshot, release outputs,
and a one-command `reproduce.sh`. Verified: extracted into a clean directory with no repo, it rebuilds
the release and reports corpus_hash MATCH. This is what makes the package archival and time-invariant.
Security hardening to the app's bar (v1.2, the [S##] guardrails)
Derived from the AI Control Index app's posture and adapted for a static site.
- Strict CSP in `site/_headers`: `default-src 'none'`, no `unsafe-inline` / `unsafe-eval`,
plus X-Content-Type-Options, Referrer-Policy, X-Frame-Options DENY, COOP, CORP,
Permissions-Policy, HSTS. [S5]
- All CSS and JS externalized to `site/assets/` so the strict CSP holds; no inline script or
style remains in any page. [S6]
- Self-hosted the DM fonts (reused the owner's licensed woff2): zero third-party requests,
no Google Fonts. [S7]
- Output safety: `esc()` escapes quotes too; `safe_url()` scheme-sanitizes data-derived hrefs
(javascript:/data: collapse to `#`); adversarial XSS fixtures prove hostile titles,
descriptions, and URLs cannot become markup or script. [S8]
- `scripts/static-gate.sh` runs all guardrails [S0]-[S13]; CI runs the gate; [S12] fails the
build if ARCHITECTURE.md and the checks drift. ARCHITECTURE.md added with the [S##] system.
- [S13] Accessibility: a full axe-core pass across all 14 page types is clean (0 violations).
Fixed body links to be underlined (distinguishable without color) and footer text contrast
(a global `p` rule was rendering footer text dark-on-dark); promoted a heading to fix order.
A static a11y lint (lang, single h1, img alt, heading order) keeps it from regressing in CI.
- 39 tests (8 security). The question behind this: if a million experts probe it, does it hold.
Published
- Public repository: https://github.com/Apparens/ai-canon (MIT code + CC BY 4.0 corpus/method).
- Zenodo DOI minted via the GitHub release integration. Concept DOI (always latest):
**10.5281/zenodo.21042034**. The method note, README badge, CITATION.cff, and the site Method
page all cite it.
Not yet
Book metric harvesting (title collisions: deferred), CN verification toward 60-90,
more harvested metrics (next OpenAlex daily window + WorldCat/Open Syllabus drops),
deploy the site to Cloudflare Pages.