Every machine-readable dataset in the Alexanarch corpus — sizes, record counts, what each carries, and which UI surfaces consume it. The canonical machine-readable catalog is /api/index.json; this page is its human-readable companion. Counts in teal are fetched live from the index on page load.
Primary registries
Canonical sources of truth. Editing these is the only way to change what the site shows.
data/registry.json5.97 MB · 881 deposits
The canonical deposit registry. Each entry: bibliographic metadata, canonical v2 AXN, content hash, full-text path, entities[] (subject/predicate/object/type/evidence_status triples — the graph's source), wiki_article, Phase C references_concepts[] + references_concept_count, legacy AXN aliases, and glyphic_canary.
Curated concept layer. Each: term, definition, defined_in (founder deposit, on 7,097 of 7,173), entity_triples[], type taxonomy (specification/extracted/structural/empirical/theoretical/formal/genre/foundational/method), engagement type, and Phase C referenced_in[] + reference_count on 2,120.
Refined from: data/lexical-minting-registry.json
Bridged with: data/semantic-addresses.json (348 of 7,173 concepts also targeted by canonical queries)
data/lexical-minting-registry.json3.52 MB · 12,032 raw terms
Broader pre-curation surface. Every term minted, coined, or formally extracted across the corpus — before noise-filtering and curation into the entity-index. 7,045 terms overlap with entity-index, 4,987 are LMR-only (raw), 128 are entity-index-only.
Query addresses — canonical queries posed to the composition layer (AI Overview, Google Search) with their observation status. Conceptually a field-set on lexical entities: 100% of the 348 unique refers_to targets exact-match into entity-index. Each address: canonical_query, is_quoted, refers_to[], type, battery_membership[], sources[], observations[], observation_class.
AI Overview Capture Registry — 176 captured queries with match status (mt): EXACT MATCH / BROAD MATCH / ADOPTION / ZERO RESULT / null. Each entry: section, slug, query, date, source-format, status, description, image refs. Reconciled into semantic-addresses via slug.
Legacy Zenodo DOI → Alexanarch AXN resolution table. Allows users coming in from old CHA DOI links to find the canonical successor record. Each entry: Zenodo DOI, target AXN, deposit_number, title, status (active / migrated / tombstoned).
Consumed by:no UI surface yet — planned at /resolve/
Full DataCite metadata snapshot of all 1,817 CHA-minted DOIs. The empirical foundation for the audit in EA-MPAI-DOI-IMPERMANENCE-01 v2.0 (#868). Methodology replicable via DataCite API at https://api.datacite.org/dois/{doi}.
Single source of truth. Lists all protocols, schemas, registries, derived surfaces, scripts — each with content_sha256, canonical_path, and referenced_by. New instances run bootstrap_familiarization.py to verify nothing has drifted before any work.
Alexanarch Identifier protocol. Format: AXN:<HEX>.<FAMILY>.<6 EMOJI> where the six-emoji suffix is derived from the first 6 bytes of SHA-256 of canonical bytes, mapped through 256 curated emoji. v1 (4-emoji) aliases preserved in deposit legacy_axn / axn_history.
Every deposit's canonical text is stored at data/texts/AXN-<HEX>-text.md. The hex maps via the registry's hex field to a canonical deposit_number. Body SHA-256 anchors each text into the AXN.
data/texts/AXN-*-text.md~25.5 MB total · 881 files
One Markdown file per deposit. The full text body. Source for citation extraction, concept backlinks, and wiki article generation.
Verifies every protocol/schema content hash matches what api/index.json claims. New instances run this with --strict at session start. Receipt appended to data/instance-familiarization.log.
The only supported path for modifying a protocol JSON. Recomputes hash, updates index, appends to change_log atomically. Direct hand-editing produces drift.
scripts/axn_lib.pycanonical AXN derivation
256-entry AXN_GLYPHS table + cluster catalog. Derives v2 6-emoji suffix from first 6 bytes of SHA-256.
scripts/regenerate_surfaces.pyidempotent
Brings every derived surface (browse, browse-index, chunks, sitemap, SHA256SUMS) into agreement with data/registry.json. Run after every registry change.
scripts/validate_deposit.pyCI-enforced
Validates the registry against the deposit protocol. Rule families: PV/REQ/AXN/CONS/SUR/IDX. Runs on every commit via .github/workflows/validate-registry.yml.
scripts/citation_extractor.pyenrichment
Scans deposit texts for AXN refs, EA-* IDs, #N references, and DOIs. Writes new edges to data/citation-graph.json.
scripts/concept_backlink.pyenrichment
Scans every deposit text for every entity-index concept. Writes referenced_in[] and reference_count onto entity-index concepts and references_concepts[] onto registry deposits.
scripts/backfill_axn_compliance.pyhistorical
One-time migration. Backfilled 13 pre-v2 AXNs from 4-emoji to 6-emoji canonical, preserving v1 forms in legacy_axn and axn_history.