
Dr Henry Tang, Merton Lecturer, recently contributed to an AI project published in Nature on 28 January 2026, in which he is a named author. The paper, titled “A benchmark of expert-level academic questions to assess AI capabilities”, accompanies the benchmark dataset created through Humanity’s Last Exam (HLE), an initiative designed to test the limits of artificial intelligence using expert-level questions.
The project addresses a growing issue in the field of AI evaluation: existing benchmark questions have become too easy for rapidly advancing AI models, limiting their usefulness in distinguishing high-performing systems. The HLE project sought to resolve this by compiling a new set of exceptionally challenging questions—difficult enough that any AI capable of solving them would demonstrate research‑expert‑level knowledge across a wide range of academic disciplines.
More than 1,000 researchers and experts worldwide submitted questions spanning over 100 subjects. These underwent a three‑stage selection process:
AI Evaluation, in which five leading AI models (as of late 2024) attempted each question, with only those that all models failed progressing to the next stage; Expert Review, where specialists refined and assessed the questions and answers; and
Final Selection, conducted by a panel of experts and organisers.
Of the more than 70,000 questions originally submitted, only 2,500 were accepted into the final benchmark, with the top 550 designated as winners and awarded prizes. All contributors were invited to join the accompanying publication as co‑authors.
Tang was the sole participant from Merton College, contributing several questions and receiving a prize for his submissions. One of his questions is featured among the six highlighted in the Nature paper and is the only one representing the Humanities.
Notably, the AI models struggled to interpret aspects of the Regina Tombstone, the classical source on which his question was based—an artefact he has previously used in undergraduate admissions discussions.
Tang emphasises that, regardless of differing opinions about the impact of AI on society and research, understanding the capabilities of advanced AI systems will remain essential. He hopes that the HLE benchmark will continue to be a valuable tool for doing so.