Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

++, having played with all three, agreed


I'm curious, is there a standard benchmark any one knows of that compares "practical usefulness" of LLMs instead of tries to make them take some kind of useless IQ test?

e.g. how useful is this LLM for 1) code debugging, 2) (accurate) fact retrieval, 3) daily task planning


Kagi did an evaluation a while back: https://blog.kagi.com/kagi-ai-search


Thanks! I love kagi's ethos!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: