News

Our tech columnist tests LSAT puzzles and a writing challenge on GPT-4. Here’s what the artificial intelligence upgrade can — and can’t — do.
FrontierMath, a new benchmark from Epoch AI, challenges advanced AI systems with complex math problems, revealing how far AI still has to go before achieving true human-level reasoning.
I picked GPT-4o, the default choice available to every ChatGPT user, as well as o3, OpenAI’s high-octane reasoning model designed to chew through math, code, and puzzles using reason like a scalpel.
For thousands of years, mathematicians have adapted to the latest advances in logic and reasoning. Are they ready for artificial intelligence?