Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Not an expert in LLM benchmarks, but I generally I think of benchmarks as being good particularly for measuring usefulness for certain usecases. Even if measuring LLMs is not as straightforward as, say, read/write speeds when comparing different SSDs, if a certain model's responses are consistently measured as being higher quality / more useful, surely that means something, right?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: