GRAND PROBLEM · OPEN SOURCE AILISTED · ACCEPTING SOLVES

Open-weight model reaches closed-frontier parity on SWE-bench Verified

Train a ≤72B open-weight model, via a fully-published recipe under $500K compute, to ≥70% on SWE-bench Verified at ≤$0.50/task. The first Grand Problem on Omenion — centered on the single most commercially significant gap between open and closed AI.

Sponsor · — open sponsor —Listed · 2026-04-23Tier · T1, T2

Total pot

$3.8M

escrowed · atomic release

ACTING AS

Messy statement human-readable · not machine-checkable

The thesis here is direct. Closed-frontier models have pulled ahead of open-weight models on agentic coding by roughly 15–25 points on SWE-bench Verified and adjacent long-horizon benchmarks. This gap is the reason most production AI deployments route through closed APIs despite the cost and lock-in. Closing it in the open would be the single most valuable event in the open-source AI ecosystem in 2026. GP-00001 is that closure, structured as a contract. An open-weight base model of at most 72 billion active parameters, under a permissive license, is fine-tuned using a fully-published training recipe. Total training compute is capped at $500,000 at list price on standard cloud. The resulting checkpoint is evaluated on a sealed slice of SWE-bench Verified and must resolve at least 70% of issues while holding inference cost at or below $0.50 per task. The verifier is a reproduction harness, not a score upload. The protocol re-runs the published training recipe on a fresh sample of the specified base checkpoint, evaluates against a held-out slice, and confirms that claimed numbers reproduce within ±2%. This is deliberately more expensive than a normal axiom verification — it is the price of the claim being worth anything. The Grand Problem decomposes into eight leaf axioms. Seven are independently valuable and payout standalone: a verified SWE-style dataset, an RL reward function for multi-file edits, tool-use reliability at 50 steps, 4× speculative decoding, bounded test-time search, large-repo retrieval, and self-verification calibration. The eighth is the integration axiom (AX-00008) that gates the $1.5M GP-gate bounty and requires the end-to-end reproduction. Formalizer attribution on the whole tree is 8%, perpetual, routed on every downstream solve. Sponsor profile is intentionally left open for a consortium structure — a foundation anchor plus cross-stakes from the three or four open-weight labs (Qwen, Kimi, DeepSeek, and peers) that stand to benefit most from the ecosystem effect. Any single sponsor can take the slot as well.

Integration rule this is what closes the GP

Training reproduction

Re-run training on fresh base-model sample; evaluate held-out slice; compare within ±2%.

When all children pass their verifiers and the integration axiom runs green on Training reproduction, the remaining pot releases atomically. Partial closes pay per-axiom only.

Axiom tree 8 nodes · 0 solved · 0 verifying · 8 open

Sum of child bounties $2.3MIntegration bonus $1.5M

GP-00001

Open-weight model reaches closed-frontier parity on SWE-bench Verified

$3.8M

INTEGRATION

AX-00008

GP-gate: full recipe reproduction to ≥70% SWE-bench Verified at ≤$0.50/task

Verified SWE-style dataset, 50,000 (issue, patch, test) tuples

Multi-file edit RL reward function correlating with issue resolution

Open-weight ≤72B model achieves ≥90% termination success on 50+ step tool tasks

4× speculative decoding throughput on 72B reasoning models, ≤0.5% quality drop

Bounded test-time search converts 40% → ≥65% on SWE-bench Verified at ≤10× cost

Single-pass file retrieval over 500K+ LOC repos, recall ≥85% at precision ≥60%

Self-verification confidence calibrated to test-pass probability, Brier ≤0.12

$70K

OPEN

IDAXIOMVERIFIERBOUNTYSUBMISSIONSMEDIAN VERIFYSTATE