Move over math and reasoning, it's time to benchmark AI using Super Mario Bros.

This system allowed AI models to control Mario by generating Python code.

Even its predecessor, Claude 3.5, performed well.

Surprisingly, reasoning-heavy models like OpenAI’s GPT-4o and Google’s Gemini 1.5 Pro lagged behind.

Move over math and reasoning, it’s time to benchmark AI using Super Mario Bros.

Despite their reputation for strong reasoning abilities, they struggled with the game’s demands.

As it turns out, logical reasoning isn’t the key to excelling at Super Mario Bros. timing is.

Even a slight delay can send Mario tumbling into a pit.

For those curious to experiment, the Hao AI Lab hasopen-sourcedits GamingAgent framework on GitHub.

Featured on TechSpot#