Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
bakugo
5 days ago
|
parent
|
context
|
favorite
| on:
“Car Wash” test with 53 models
The article claims that every Claude model other than Opus 4.6 reliably fails. This is not true, Sonnet 3.5 answers correctly around half of the time, even though it's such an old model it's not even available on the main API anymore.
help
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: