News
Our tech columnist tests LSAT puzzles and a writing challenge on GPT-4. Here’s what the artificial intelligence upgrade can — and can’t — do.
FrontierMath, a new benchmark from Epoch AI, challenges advanced AI systems with complex math problems, revealing how far AI still has to go before achieving true human-level reasoning.
I picked GPT-4o, the default choice available to every ChatGPT user, as well as o3, OpenAI’s high-octane reasoning model designed to chew through math, code, and puzzles using reason like a scalpel.
For thousands of years, mathematicians have adapted to the latest advances in logic and reasoning. Are they ready for artificial intelligence?
Some results have been hidden because they may be inaccessible to you
Show inaccessible results