On SWE-Bench Verified, the model achieved a score of 70.6%. This performance is notably competitive when placed alongside significantly larger models; it outpaces DeepSeek-V3.2, which scores 70.2%, ...
(Bloomberg) -- OpenAI is rolling out a pair of new artificial intelligence models that mimic the process of human reasoning to field more complicated coding questions and visual tasks, the latest in a ...
Results that may be inaccessible to you are currently showing.
Hide inaccessible results