AA-Omniscience Benchmark: What is it Actually Measuring?
https://mylessultimateperspectives.yousher.com/why-your-company-said-no-javascript-and-what-smart-teams-actually-do-about-it
In the last six months, the discourse surrounding LLM evaluation has shifted from "vibes-based" testing to a desperate scramble for quantitative metrics