Benchmark Integrity and Reasoning-Trace Errors in Medical Question Answering With Large Language Models: Mixed Methods Study With Sparse Autoencoders

Published on 12.Jun.2026 in Vol 28 (2026)

Preprints (earlier versions) of this paper are available at https://preprints.jmir.org/preprint/90061, first published 20.Dec.2025.

; Siru Liu^{3, 4}

; Adam Wright^{3, 5}

There are no citations yet available for this article according to Crossref .

This paper is in the following e-collection/theme issue: