o1 vs o3 vs o4: I Tested Them So You Don’t Have To
The “Best AI Model” might be just the one you’ve already ignored. I put OpenAI’s o3, o4-mini, Claude 3.7, Gemini 2.5, and 4o head-to-head. One clear winner. And it’s not the one you think.
Why new models?
Have you ever asked this question?
You’ve been using your favorite AI for a while and then… AI providers just decide to flood the market with “new ones”.
I’ve been trying to write this letter for weeks now.
Every time I sat down to analyze the latest “genius” AI model, three more would drop before I could finish typing.
Just look at the past couple of weeks:
Monday: Claude-3.7 rewrites the game.
A few days later: Gemini 2.5 Pro is the new standard.
A few days later: OpenAI’s o3 and o4 are going to smash everyone else. (They didn’t.)
A few days later: Actually, the new standard is GPT-4.1. Not 4.5.
...
Each announcement bigger than the last. Each claim more outrageous than before. The tech news cycle turned into an endless stream of "forget everything you knew about AI" headlines.
So I waited.
Finally, a ceasefire. Maybe.
A rare week without a new launch.
Just long enough to write this before it gets outdated (maybe by the time you read this).
Let’s talk about what’s really happening.
⚡ The AI Arms Race Just Leveled Up
A couple of weeks ago, OpenAI released two new reasoning models — gpt-o4-mini and gpt-o3 (for paid subscriptions).
To be very precise, these are not exactly LLMs (or models); for me, these new creations are more like agents (they auto-reflect and search the web, etc.) than just "a model" per se.
People are already calling them “genius-level.”
(which isn't completely absurd, by the way)
But if that’s 100% true, we should be riding superconducting hoverboards by next week.
I would call them "powerful" or "potent," not really "genius"...
Because every few weeks, someone drops a "Genius Model" — impressive for solving many search, information retrieval, and "moderate problems," and great for being your "therapist," but usually failing badly at solving more complex, multi-task problems or doing complete end-to-end tasks with an "acceptable" level of "trust."
Still… OpenAI and other AI providers are shipping like mad.
Let’s zoom out for a second:
OpenAI
| Date | Release |
|---|---|
| February 27, 2025 | GPT-4.5 research preview OpenAI |
| April 14, 2025 | GPT-4.1 series API-only models OpenAIOpenAI |
| April 16, 2025 | OpenAI o3 & o4-mini with full tool access OpenAIOpenAI |
| April 16, 2025 | OpenAI Codex CLI (command-line coding agent) OpenAI |
Anthropic
| Date | Release |
|---|---|
| February 24, 2025 | Claude 3.7 Sonnet hybrid reasoning model & Claude Code CLI preview Anthropic |
| May 1, 2025 | “Claude can now connect to your world” Integrations beta Anthropic |
| May 5, 2025 | Introducing Anthropic’s “AI for Science” program Anthropic |
| May 7, 2025 | Web search on the Anthropic API (feature preview) Anthropic |
| Date | Release |
|---|---|
| March 25, 2025 | Introducing Gemini 2.5 (Pro Experimental) thinking model blog.google |
| April 4, 2025 | Start building with Gemini 2.5 Pro (public preview in AI Studio) blog.google |
| April 8, 2025 | Deep Research capability on Gemini 2.5 Pro Experimental blog.google |
| April 17, 2025 | Gemini 2.5 Flash preview (fast, cost-efficient thinking model) blog.google |
Confused by the naming? You’re not alone.
We’re talking about o4, not 4o…
Keep up. It’s a full-time job now.
🪓 Tools Everywhere, Focus Nowhere
Let’s take coding — one of the top AI use cases.
The dev tool space is insane right now.
