_Replace this title in your voice._.
_Replace this subtitle in your voice._
Section 1 · 3 non-negotiable rules
The three rules.
Rule #1 Specs come from a hand-curated YAML catalog.
The LLM cannot emit numbers. Every numeric token in MDX prose must trace back to content/_specs/<slug>.yaml or the build fails.
CI evidence ↗
- validator·
scripts/validate-specs.ts - doctrine·
content/_specs/_README.md - schema·
content/_specs/_SCHEMA.md - CI step·
.github/workflows/content-gate.yml · validate:specs
- validator·
Rule #2 No auto-publish.
The pipeline opens draft pull requests; humans merge. Drift Tracker, Catch, and Gate never auto-publish — operationally enforced, not CI-gated.
CI evidence ↗
- Gate opens drafts·
lib/github/draftPr.ts · draft: true - Drift drafter·
app/api/drift/draft/route.ts - pipeline-dryrun·
scripts/pipeline-dryrun.ts
- Gate opens drafts·
Rule #3 The LLM is forbidden from voice-bearing sections.
Voice-bearing headings — Verdict, Intro, Recommendation — may only appear under documents whose frontmatter author is in the human allowlist. The LLM does not write voice; the validator catches anything that slips past review.
CI evidence ↗
- validator·
scripts/validate-authorship.ts - CI step·
.github/workflows/content-gate.yml · validate:authorship
- validator·
Section 2 · 5 stages
The pipeline, stage by stage.
Every stage runs on a Vercel cron. Each one writes only its declared row type and commits to a single boundary it never crosses.
| Stage | Cron | Reads | Writes | Never |
|---|---|---|---|---|
| 1. Catch | 08:00 UTC daily | HN Algolia, Phoronix RSS, NVIDIA Developer Blog RSS, r/LocalLLaMA, arXiv cs.AR | mcs_signal_candidate (deduped by source + source_id) | persists Reddit body text — only title, score, permalink, url, created_utc (ToS perimeter). |
| 2. Cluster | 09:00 UTC daily | unprocessed mcs_signal_candidate rows in a 48-hour window | mcs_signal_cluster + mcs_signal_cluster_member (3-shingle Jaccard ≥ 0.45) | uses an LLM or embeddings — pure JS, lead-only centroid, deterministic. |
| 3. Strip | 10:00 UTC daily | mature clusters (closed_at NOT NULL OR member_count ≥ 3) | topic_label heuristically + mcs_signal_claim rows (kind: spec | benchmark | event | price) | writes voice — no Verdict, Intro, or Recommendation prose. Subject vocab is allowlisted. |
| 4. Quantify | 11:00 UTC daily (paired with Gate) | unverified mcs_signal_claim rows + the spec catalog | verdict on each claim: confirmed | conflict | novel | weak | auto-publishes — verdicts are advisory inputs to Gate, never copy. |
| 5. Gate | 11:00 UTC daily (paired with Quantify) | ready clusters meeting threshold (≥1 conflict OR ≥3 novels OR ≥3 cluster members) | a single draft GitHub PR per topic — changelog stub, conflict YAML, novel-spec stub | marks a PR ready — every PR opens as draft: true. Humans merge. |
Stage 1 Catch
08:00 UTC daily
- Reads
- HN Algolia, Phoronix RSS, NVIDIA Developer Blog RSS, r/LocalLLaMA, arXiv cs.AR
- Writes
- mcs_signal_candidate (deduped by source + source_id)
- Never
- persists Reddit body text — only title, score, permalink, url, created_utc (ToS perimeter).
Stage 2 Cluster
09:00 UTC daily
- Reads
- unprocessed mcs_signal_candidate rows in a 48-hour window
- Writes
- mcs_signal_cluster + mcs_signal_cluster_member (3-shingle Jaccard ≥ 0.45)
- Never
- uses an LLM or embeddings — pure JS, lead-only centroid, deterministic.
Stage 3 Strip
10:00 UTC daily
- Reads
- mature clusters (closed_at NOT NULL OR member_count ≥ 3)
- Writes
- topic_label heuristically + mcs_signal_claim rows (kind: spec | benchmark | event | price)
- Never
- writes voice — no Verdict, Intro, or Recommendation prose. Subject vocab is allowlisted.
Stage 4 Quantify
11:00 UTC daily (paired with Gate)
- Reads
- unverified mcs_signal_claim rows + the spec catalog
- Writes
- verdict on each claim: confirmed | conflict | novel | weak
- Never
- auto-publishes — verdicts are advisory inputs to Gate, never copy.
Stage 5 Gate
11:00 UTC daily (paired with Quantify)
- Reads
- ready clusters meeting threshold (≥1 conflict OR ≥3 novels OR ≥3 cluster members)
- Writes
- a single draft GitHub PR per topic — changelog stub, conflict YAML, novel-spec stub
- Never
- marks a PR ready — every PR opens as draft: true. Humans merge.
Section 3 · Topology
One picture.
Phoronix ─┐
NVIDIA ──┤
r/LocalL ─┼──→ Catch ──→ Cluster ──→ Strip ──→ Quantify ──→ Gate ──→ DRAFT PR
HN ─┤ │ │ │ │ │ │
arXiv ─┘ │ │ │ │ │ ▼
│ │ │ │ │ human merge
└──────────┴──────────┴───────────┴──────────┘ │
read-only admin views (/feed, /clusters) ▼
/signal
/changelogFive sources fan in. Five stages chain. The only way out is a draft PR a human merges.
Section 4 · The boundary
What the LLM is and isn't allowed to do.
| Allowed | Forbidden | Enforced by |
|---|---|---|
| Recommend rigs from the catalog (The Specifier). | Emit numbers anywhere in MDX prose. | scripts/validate-specs.ts (Rule #1) |
| Summarize an article in a draft PR description. | Write a verdict, intro, or recommendation paragraph. | scripts/validate-authorship.ts (Rule #3) |
| Propose a cluster topic-label heuristic for review. | Decide a publish date or merge a PR. | lib/github/draftPr.ts forces draft: true (Rule #2, operational) |
| Extract claims into structured rows for human review. | Open a non-draft PR or merge to main. | GitHub branch protection + lib/github/draftPr.ts (Rule #2, operational) |
| Render specs from YAML the human committed. | Persist Reddit body text into mcs_signal_candidate.raw_payload. | lib/catch/sources/reddit-localllama.ts perimeter (ToS, operational) |
Allowed
Recommend rigs from the catalog (The Specifier).
Forbidden
Emit numbers anywhere in MDX prose.
Enforced by
scripts/validate-specs.ts (Rule #1)Allowed
Summarize an article in a draft PR description.
Forbidden
Write a verdict, intro, or recommendation paragraph.
Enforced by
scripts/validate-authorship.ts (Rule #3)Allowed
Propose a cluster topic-label heuristic for review.
Forbidden
Decide a publish date or merge a PR.
Enforced by
lib/github/draftPr.ts forces draft: true (Rule #2, operational)Allowed
Extract claims into structured rows for human review.
Forbidden
Open a non-draft PR or merge to main.
Enforced by
GitHub branch protection + lib/github/draftPr.ts (Rule #2, operational)Allowed
Render specs from YAML the human committed.
Forbidden
Persist Reddit body text into mcs_signal_candidate.raw_payload.
Enforced by
lib/catch/sources/reddit-localllama.ts perimeter (ToS, operational)