similar to large language model influence on diagnostic reasoning but now with o1-preview
classic result of LLMs are good. I know little about healthcare or diagnostics, but “trials evaluating AI in real clinical settings” sounds like the correct next step. I think a good study would be comparing:
- a doctor
- a doctor with access to the LLM
- a doctor primed with the AI system diagnosis
- a doctor primed with the AI system diagnosis and access to the LLM
- a med student (with n years of school remaining)
- a med student access to the LLM
- a med student primed with the AI system diagnosis
- a med student primed with the AI system diagnosis and access to the LLM
- the AI system
I’m mostly skeptical that “access to the LLM tool” will be more useful than priming. I’d guess results like:
- doctors only being marginally higher than med students (confidence intervals overlapping, mean only 3% higher) when they’re on the same level.
- only having access to the llm tool being a similar boost (confidence intervals overlapping, mean only 3% higher)
-
- priming getting results ~equivalent to the AI system
-
- priming and access to the LLM being +4-5% better than the AI system