We conducted a two-phase evaluation. First, we assessed LLMs (GPT o4-mini and Gemini 2.5 Pro) on 1,000 synthetic clinical hematology/oncology vignettes with ...
Results that may be inaccessible to you are currently showing.
Hide inaccessible resultsResults that may be inaccessible to you are currently showing.
Hide inaccessible results