From this paper as covered in NYT
TL;DR: GPT-4 is better at diagnosis than doctors, when both are given a case report and asked to make a diagnosis.
I imagine that the only drawback here is that doctors (or maybe assistant-type doctors) are still necessary for compiling the case reports, asking some of the right questions, etc. for the time being. But I mean come on, this is a crazy outcome! Doctors not only do worse, they do worse when given access to the tool that does better! Maybe they should have modified the study to have yet another category of doctors primed with “this is what the chatbot thought when given all the info”, if that’d be more convincing, but given the gap between raw doctor diagnosis and LLM diagnosis, I’m skeptical that the doctors will perform well at the very related task of catching and correcting the LLM’s misdiagnoses.
And this line is hilarious to me—some might call it cope:
The LLM alone demonstrated higher performance than both physician groups, indicating the need for technology and workforce development to realize the potential of physician-artificial intelligence collaboration in clinical practice
See also: superhuman performance of a large language model on the reasoning tasks of a physician