++, having played with all three, agreed

lyapunova · on Aug 7, 2023

I'm curious, is there a standard benchmark any one knows of that compares "practical usefulness" of LLMs instead of tries to make them take some kind of useless IQ test?

e.g. how useful is this LLM for 1) code debugging, 2) (accurate) fact retrieval, 3) daily task planning

rushingcreek · on Aug 7, 2023

Kagi did an evaluation a while back: https://blog.kagi.com/kagi-ai-search

lyapunova · on Aug 7, 2023

Thanks! I love kagi's ethos!