The first ever PokerBattle.AI and review of language models in online poker

Why language models were seated at the table for the first time and what this experiment cost the industry
Until recently, talks about AI in poker boiled down to solvers and specialized bots. PokerBattle.ai became the first test where they checked not computational machines, but language models — those same LLMs that now try to analyze hands like live players.
The result was revealing. Models are far from perfect, but they already know how to think in poker structure. This is the first step toward AI in poker no longer being pure theory and becoming a working analysis tool.
How pokerbattle.ai went
Organizers didn't complicate the experiment. Poker ai was built so each model ended up in identical conditions. As if they were seated at the same table, but without peeking at neighbors.
What exactly was given to the models:
☑️ hand description: positions, actions, bet sizes;
☑️ basic context: effective stacks, board structure;
☑️ ranges in general terms — without solver precision;
☑️ time for "thinking" — standard text response.
That is, the model had to decide for itself what to do: check, call, bet, raise, or fold. And most importantly — explain why. This requirement allowed seeing how it "thinks."
By what parameters they evaluated
Everything here is close to real play. The basis was decision quality.
|
Parameter |
What they evaluated |
|---|---|
|
Value selection |
whether the model correctly pressures weak ranges |
|
Bluff component |
understands where to pressure and where not |
|
Fold equity |
adequately assesses pressure strength |
|
Sizing |
chooses natural lines or goes to extremes |
|
Action explanation |
logic, absence of contradictions |
|
Stability |
whether the model behaves stably across spots |
Who played stronger and how AI looked at the virtual table
When all decisions were compiled into a single matrix, the difference between models became visible right away. Not by "beauty of responses," but by how much their line actually gave EV.

Winner — OpenAI o3 model
OpenAI o3 in PokerBattle.ai played like a solid reg. By the numbers, it had a very healthy, workable style: around 26% VPIP and 18% PFR. In the match, the model played 3799 hands and finished with $136,691, or roughly +$36,691 to the starting stack. On the distance, it looked not like a series of lucky hits, but like even, careful realization of edge:
✔️ almost no major leaks;
✔️ solid play with deep stacks;
✔️ clear adaptation to opponents;
✔️ timely folds in borderline spots and pressure where opponent's range is obviously weaker.
In poker terms, OpenAI o3 played like a good TAG that simply doesn't give away money. The machine consistently makes +EV decisions and naturally takes first place.
Second place — Claude Sonnet 4.5
Claude turned out to be a "thinking" participant. It saw nuances, explained context, built long logical chains. Claude Sonnet 4.5 went almost neck-and-neck with the leader.
Over 3799 hands distance, the model showed a result around $133,641, or roughly +$33,641 to the starting stack.
Claude's play looked like this:
✔️ less excessive aggression than OpenAI o3, but more stability;
✔️ good range defense, especially in borderline spots;
✔️ minimum errors under pressure.
Claude Sonnet 4.5 didn't become the show hero, but took second place for a simple reason: it consistently made good decisions and didn't go where EV goes negative.
Third place — Grok
Grok took third spot. It has a more loose style, and sometimes it seemed like it saw the table from a slightly different angle. Over 3799 hands distance, the result was about $128,796, or +$28,796 to the starting stack. The line was uneven — there were upward surges and noticeable downswings — but the model always returned to the game and stabilized the graph.
From how Grok made decisions, several characteristic traits stand out:
✔️ wider bluff spectrum than competitors, sometimes unexpected;
✔️ aggression in spots where standard models would prefer pot control;
✔️ willingness to enter uncomfortable spots, giving edge against more straightforward AIs.
Third place is a logical result for a model combining technical base with unconventional thinking.
Pokerbattle.AI participants
PokerBattle.AI gathered nine language models at one table — from industry monsters to experimental systems just finding their style. Unlike typical for-fun shows, here each model played the same distance of 3799 hands (except LLAMA 4, which busted early), making the table maximally fair.
Below is the visual final breakdown by participants, with final bankrolls and winnings. This is the overall picture showing who really held the distance and who crumbled under pressure.

Results
PokerBattle.AI turned out as an honest stress test for language models. No hints, soft mode, or artificial conditions. That's why the results came out so revealing.
Main takeaway — modern AIs already play like different reg archetypes:
✅ OpenAI o3 — disciplined aggressor;
✅ Claude — careful technician;
✅ Grok — creative LAG who doesn't fear pressure.
The middle group held thanks to fundamental strategy, while outsiders lost not due to "weak intelligence," but due to typical poker leaks like poor river play, overvaluing marginal spots.
But most importantly: the distance showed that AIs don't just know how to play — they start differing in styles and making human-like decisions. This is no longer solvers, but something closer to real opponents.
Latest poker news, AI models, and big tournaments can always be found in the blog.

Last news

The first ever PokerBattle.AI and review of language models in online poker

Dynamic poker at maximum speed: Hyper Dash, Rocket Dash, and SNG Dash on WPT Global
.png&w=640&q=75)
2025 results and New Year 2026 greetings from the CC-Poker team
%2520%25D0%25B2%2520%25D0%25BF%25D0%25BE%25D0%25BA%25D0%25B5%25D1%2580%25D0%25BD%25D1%258B%25D1%2585%2520%25D1%2582%25D1%2583%25D1%2580%25D0%25BD%25D0%25B8%25D1%2580%25D0%25B0%25D1%2585-1.png&w=640&q=75)
ICM (Independent Chip Model) in Poker Tournaments: How to calculate correctly and apply for decision-making
Similar articles
The first ever PokerBattle.AI and review of language models in online poker
The first PokerBattle.AI: OpenAI o3, Claude, and Grok play 3,799 hands, show different reg styles, and earn tens of thousands on top.
Dynamic poker at maximum speed: Hyper Dash, Rocket Dash, and SNG Dash on WPT Global
Everything you need to know about the new formats on WPT Global. Hyper Dash, Rocket Dash, and SNG Dash: strategy and how to start playing with buy-ins from $1
ICM (Independent Chip Model) in Poker Tournaments: How to calculate correctly and apply for decision-making
Unlock the secrets of ICM strategy: practical examples, chip value calculations, and proven methods for increasing ROI in MTT tournaments




%2520%25D0%25B2%2520%25D0%25BF%25D0%25BE%25D0%25BA%25D0%25B5%25D1%2580%25D0%25BD%25D1%258B%25D1%2585%2520%25D1%2582%25D1%2583%25D1%2580%25D0%25BD%25D0%25B8%25D1%2580%25D0%25B0%25D1%2585-1.png&w=1920&q=75)







