Grand ProblemGP-00001
Open-weight model reaches closed-frontier parity on SWE-bench Verified
Train a ≤72B open-weight model, via a fully-published recipe under $500K compute, to ≥70% on SWE-bench Verified at ≤$0.50/task. The first Grand Problem on Omenion — centered on the single most commercially significant gap between open and closed AI.
Messy statement human-readable · not machine-checkable
V1The thesis here is direct. Closed-frontier models have pulled ahead of open-weight models on agentic coding by roughly 15–25 points on SWE-bench Verified and adjacent long-horizon benchmarks. This gap is the reason most production AI deployments route through closed APIs despite the cost and lock-in. Closing it in the open would be the single most valuable event in the open-source AI ecosystem in 2026. GP-00001 is that closure, structured as a contract. An open-weight base model of at most 72 billion active parameters, under a permissive license, is fine-tuned using a fully-published training recipe. Total training compute is capped at $500,000 at list price on standard cloud. The resulting checkpoint is evaluated on a sealed slice of SWE-bench Verified and must resolve at least 70% of issues while holding inference cost at or below $0.50 per task. The verifier is a reproduction harness, not a score upload. The protocol re-runs the published training recipe on a fresh sample of the specified base checkpoint, evaluates against a held-out slice, and confirms that claimed numbers reproduce within ±2%. This is deliberately more expensive than a normal axiom verification — it is the price of the claim being worth anything. The Grand Problem decomposes into eight leaf axioms. Seven are independently valuable and payout standalone: a verified SWE-style dataset, an RL reward function for multi-file edits, tool-use reliability at 50 steps, 4× speculative decoding, bounded test-time search, large-repo retrieval, and self-verification calibration. The eighth is the integration axiom (AX-00008) that gates the $1.5M GP-gate bounty and requires the end-to-end reproduction. Formalizer attribution on the whole tree is 8%, perpetual, routed on every downstream solve. Sponsor profile is intentionally left open for a consortium structure — a foundation anchor plus cross-stakes from the three or four open-weight labs (Qwen, Kimi, DeepSeek, and peers) that stand to benefit most from the ecosystem effect. Any single sponsor can take the slot as well.
Integration rule this is what closes the GP
Training reproductionRe-run training on fresh base-model sample; evaluate held-out slice; compare within ±2%.