May 10, 2025 · Edition #34

o1 vs o3 vs o4: I Tested Them So You Don’t Have To

The “Best AI Model” might be just the one you’ve already ignored. I put OpenAI’s o3, o4-mini, Claude 3.7, Gemini 2.5, and 4o head-to-head. One clear winner. And it’s not the one you think.

Why new models?

Have you ever asked this question?

You’ve been using your favorite AI for a while and then… AI providers just decide to flood the market with “new ones”.

I’ve been trying to write this letter for weeks now.

Every time I sat down to analyze the latest “genius” AI model, three more would drop before I could finish typing.

Just look at the past couple of weeks:

Monday: Claude-3.7 rewrites the game.

A few days later: Gemini 2.5 Pro is the new standard.

A few days later: OpenAI’s o3 and o4 are going to smash everyone else. (They didn’t.)

A few days later: Actually, the new standard is GPT-4.1. Not 4.5.

...

Each announcement bigger than the last. Each claim more outrageous than before. The tech news cycle turned into an endless stream of "forget everything you knew about AI" headlines.

o1 vs o3 vs o4: I Tested Them So You Don’t Have To

So I waited.

Finally, a ceasefire. Maybe.

A rare week without a new launch.

Just long enough to write this before it gets outdated (maybe by the time you read this).

Let’s talk about what’s really happening.

⚡ The AI Arms Race Just Leveled Up

A couple of weeks ago, OpenAI released two new reasoning models — gpt-o4-mini and gpt-o3 (for paid subscriptions).

To be very precise, these are not exactly LLMs (or models); for me, these new creations are more like agents (they auto-reflect and search the web, etc.) than just "a model" per se.

People are already calling them “genius-level.”

(which isn't completely absurd, by the way)

But if that’s 100% true, we should be riding superconducting hoverboards by next week.

I would call them "powerful" or "potent," not really "genius"...

Because every few weeks, someone drops a "Genius Model" — impressive for solving many search, information retrieval, and "moderate problems," and great for being your "therapist," but usually failing badly at solving more complex, multi-task problems or doing complete end-to-end tasks with an "acceptable" level of "trust."

Still… OpenAI and other AI providers are shipping like mad.

Let’s zoom out for a second:

OpenAI

Date	Release
February 27, 2025	GPT-4.5 research preview OpenAI
April 14, 2025	GPT-4.1 series API-only models OpenAI OpenAI
April 16, 2025	OpenAI o3 & o4-mini with full tool access OpenAI OpenAI
April 16, 2025	OpenAI Codex CLI (command-line coding agent) OpenAI

Anthropic

Date	Release
February 24, 2025	Claude 3.7 Sonnet hybrid reasoning model & Claude Code CLI preview Anthropic
May 1, 2025	“Claude can now connect to your world” Integrations beta Anthropic
May 5, 2025	Introducing Anthropic’s “AI for Science” program Anthropic
May 7, 2025	Web search on the Anthropic API (feature preview) Anthropic

Google

Date	Release
March 25, 2025	Introducing Gemini 2.5 (Pro Experimental) thinking model blog.google
April 4, 2025	Start building with Gemini 2.5 Pro (public preview in AI Studio) blog.google
April 8, 2025	Deep Research capability on Gemini 2.5 Pro Experimental blog.google
April 17, 2025	Gemini 2.5 Flash preview (fast, cost-efficient thinking model) blog.google

Confused by the naming? You’re not alone.

We’re talking about o4, not 4o…

Keep up. It’s a full-time job now.

🪓 Tools Everywhere, Focus Nowhere

Let’s take coding — one of the top AI use cases.

The dev tool space is insane right now.