Why one benchmark wasn't enough: Interpreting Perplexity Sonar Pro and Gemini 2.5 Pro results
https://seo.edu.rs/blog/why-the-claim-web-search-cuts-hallucination-73-86-fails-when-you-do-the-math-10928
3 key factors when choosing an evaluation strategy for large language models When you compare model claims such as "Perplexity Sonar Pro shows 37% citation errors" versus "Gemini 2.5 Pro reports 7.0% hallucination, improving on Gemini 2