AI Agents Rebuild Minesweeper — Codex Wins

Krishn patel

19 Dec 2025 — 2 min read

AI vs Minesweeper

Key Takeaways:
Four AI coding agents (OpenAI Codex, Anthropic Claude Code, Google Gemini CLI, Mistral Vibe) were given a single-shot task to build a web version of Minesweeper with sound, mobile support and a surprise feature.
OpenAI’s Codex (GPT-5) produced the strongest one-shot result thanks to chording support, mobile-friendly flagging, and retro sound; Claude Code delivered polished presentation and novel power-ups but lacked chording.
Mistral Vibe produced a working but limited build; Google’s Gemini CLI failed to produce a usable one-shot game in this test.

What we asked the AIs to build

Editors fed the same prompt to four LLM-based coding agents: create a full-featured web Minesweeper that matches the Windows classic, adds a fun surprise feature, includes sound effects, and supports mobile touch controls.

The results were produced as single-shot runs with no human debugging, then judged blind by a Minesweeper expert to assess playability, presentation, mobile controls, and the novelty feature.

How each agent performed

OpenAI Codex (GPT-5)

Best overall in this one-shot test. Codex implemented chording (the advanced clearing mechanic), offered clear on-screen chording instructions, and added mobile-friendly long-press flagging.

Presentation leaned retro with simple sounds and an emoticon-style smiley face. The “Lucky Sweep Bonus” surprise feature gave occasional safe tiles to reduce forced guessing.

Score: top performer for gameplay accuracy and mobile usability.

Anthropic Claude Code (Opus 4.5)

Fastest to produce working code and highly polished visually. Claude included a “Power Mode” with multiple power-ups (Shield, Blast, X‑Ray, Freeze) that change gameplay dynamics.

However, Claude’s builds lacked chording, and some UI elements (grid gaps, clipped board) reduced play efficiency. Power Mode made expert boards much easier and felt unbalanced.

Mistral Vibe

Delivered a functioning Minesweeper but missed key features: no chording, no sound effects, and a clumsy mobile flagging flow. The surprise feature was mainly celebratory (rainbow background) rather than gameplay-changing.

Performance was slower than the top competitors but showed promise as an open-weight model.

Google Gemini CLI

Failed in this single-shot test. Gemini produced incomplete playfields and got hung up on overcomplicated dependencies and custom sound-generation attempts.

The experiment was limited to available Gemini models in the tested subscription; different tiers might behave differently, but the one-shot result here did not work.

Bottom line

Single-shot AI coding can produce playable projects, but results vary widely. OpenAI Codex led this task by best reproducing Minesweeper fundamentals. Claude Code excelled at polish and speed but sacrificed a key mechanic, while Mistral and Gemini lagged behind.

The takeaway: modern coding agents can accelerate development, but for reliable, production-quality code they still benefit from human review and iteration.