How we publish.
Two non-negotiable rules. Five chained stages. Every artifact is a public PR.
Section 1 · 2 non-negotiable rules
The two rules.
Rule #1 Specs come from a hand-curated YAML catalog.
The LLM cannot emit numbers. Every numeric token in MDX prose must trace back to content/_specs/<slug>.yaml or the build fails.
CI evidence ↗
- validator·
scripts/validate-specs.ts - doctrine·
content/_specs/_README.md - schema·
content/_specs/_SCHEMA.md - CI step·
.github/workflows/content-gate.yml · validate:specs
- validator·
Rule #2 The LLM is forbidden from voice-bearing sections.
Voice-bearing headings — Verdict, Intro, Recommendation — may only appear under documents whose frontmatter author is in the human allowlist. The LLM does not write voice; the validator catches anything that slips past review.
CI evidence ↗
- validator·
scripts/validate-authorship.ts - CI step·
.github/workflows/content-gate.yml · validate:authorship
- validator·
Section 2 · 5 stages
The pipeline, stage by stage.
Every stage runs on a Vercel cron. Each one writes only its declared row type and commits to a single boundary it never crosses.
Every exit is a public PR · CI gates · auto-merge on green
| Stage | Cron | Reads | Writes | Never |
|---|---|---|---|---|
| 1. Catch | hourly | HN Algolia, Phoronix RSS, NVIDIA Developer Blog RSS, r/LocalLLaMA, arXiv cs.AR | mcs_signal_candidate (deduped by source + source_id) | persists Reddit body text — only title, score, permalink, url, created_utc (ToS perimeter). |
| 2. Cluster | 09:00 UTC daily | unprocessed mcs_signal_candidate rows in a 48-hour window | mcs_signal_cluster + mcs_signal_cluster_member (3-shingle Jaccard ≥ 0.45) | uses an LLM or embeddings — pure JS, lead-only centroid, deterministic. |
| 3. Strip | 10:00 UTC daily | mature clusters (closed_at NOT NULL OR member_count ≥ 3) | topic_label heuristically + mcs_signal_claim rows (kind: spec | benchmark | event | price) | writes voice — no Verdict, Intro, or Recommendation prose. Subject vocab is allowlisted. |
| 4. Quantify | 11:00 UTC daily (paired with Gate) | unverified mcs_signal_claim rows + the spec catalog | verdict on each claim: confirmed | conflict | novel | weak | writes copy — verdicts are advisory inputs to Gate, never published prose. |
| 5. Gate | 11:00 UTC daily (paired with Quantify) | ready clusters meeting threshold (≥1 conflict OR ≥3 novels OR ≥3 cluster members) | a single GitHub PR per topic — changelog stub, conflict YAML, novel-spec stub | merges. CI runs the gates; on green the PR auto-merges, on red it sits. |
Stage 1 Catch
hourly
- Reads
- HN Algolia, Phoronix RSS, NVIDIA Developer Blog RSS, r/LocalLLaMA, arXiv cs.AR
- Writes
- mcs_signal_candidate (deduped by source + source_id)
- Never
- persists Reddit body text — only title, score, permalink, url, created_utc (ToS perimeter).
Stage 2 Cluster
09:00 UTC daily
- Reads
- unprocessed mcs_signal_candidate rows in a 48-hour window
- Writes
- mcs_signal_cluster + mcs_signal_cluster_member (3-shingle Jaccard ≥ 0.45)
- Never
- uses an LLM or embeddings — pure JS, lead-only centroid, deterministic.
Stage 3 Strip
10:00 UTC daily
- Reads
- mature clusters (closed_at NOT NULL OR member_count ≥ 3)
- Writes
- topic_label heuristically + mcs_signal_claim rows (kind: spec | benchmark | event | price)
- Never
- writes voice — no Verdict, Intro, or Recommendation prose. Subject vocab is allowlisted.
Stage 4 Quantify
11:00 UTC daily (paired with Gate)
- Reads
- unverified mcs_signal_claim rows + the spec catalog
- Writes
- verdict on each claim: confirmed | conflict | novel | weak
- Never
- writes copy — verdicts are advisory inputs to Gate, never published prose.
Stage 5 Gate
11:00 UTC daily (paired with Quantify)
- Reads
- ready clusters meeting threshold (≥1 conflict OR ≥3 novels OR ≥3 cluster members)
- Writes
- a single GitHub PR per topic — changelog stub, conflict YAML, novel-spec stub
- Never
- merges. CI runs the gates; on green the PR auto-merges, on red it sits.
Section 3 · Topology
One picture.
Phoronix ─┐
NVIDIA ──┤
r/LocalL ─┤
HN ─┼──→ Catch ──→ Cluster ──→ Strip ──→ Quantify ──→ Gate ──→ PUBLIC PR
arXiv ─┤ │ │ │ │ │ │
TechCrunch ─┤ │ │ │ │ │ ▼
Verge AI ─┘ │ │ │ │ │ CI · gates
│ │ │ │ │ │
│ │ │ │ │ auto-merge
└──────────┴──────────┴───────────┴──────────┘ │
read-only admin views (/feed, /clusters) ▼
/signal
/changelogFive sources fan in. Five stages chain. The only way out is a public PR; CI auto-merges on green.
Section 4 · The boundary
What the LLM is and isn't allowed to do.
| Allowed | Forbidden | Enforced by |
|---|---|---|
| Recommend rigs from the catalog (The Specifier). | Emit numbers anywhere in MDX prose. | scripts/validate-specs.ts (Rule #1) |
| Summarize an article in a PR description. | Write a verdict, intro, or recommendation paragraph. | scripts/validate-authorship.ts (Rule #2) |
| Render specs from YAML the human committed. | Persist Reddit body text into mcs_signal_candidate.raw_payload. | lib/catch/sources/reddit-localllama.ts perimeter (ToS, operational) |
Allowed
Recommend rigs from the catalog (The Specifier).
Forbidden
Emit numbers anywhere in MDX prose.
Enforced by
scripts/validate-specs.ts (Rule #1)Allowed
Summarize an article in a PR description.
Forbidden
Write a verdict, intro, or recommendation paragraph.
Enforced by
scripts/validate-authorship.ts (Rule #2)Allowed
Render specs from YAML the human committed.
Forbidden
Persist Reddit body text into mcs_signal_candidate.raw_payload.
Enforced by
lib/catch/sources/reddit-localllama.ts perimeter (ToS, operational)