Science of Evaluations

Improving how AI systems are measured and tested in academia, industry, and public sector.

Reliability of LLMs as medical assistants for the general public: a randomized preregistered study

Published in Nature Medicine