I'd be interested to know how o1 compares. On may days after I completed the AoC...

qsort · on Dec 30, 2024

According to this thread: https://old.reddit.com/r/adventofcode/comments/1hnk1c5/resul...

o1 got 20 out of 25 (or 19 out of 24, depending on how you want to count). Unclear experimental setup (it's not obvious how much it was prompted), but it seems to check out with leaderboard times, where problems solvable with LLMs had clear times flat out impossible for humans.

An agent-type setup using Claude got 14 out of 25 (or, again, 13/24)

https://github.com/JasonSteving99/agent-of-code/tree/main

joseneca · on Dec 31, 2024

I have to wonder why o1 didn't work. That post is unfortunately light on details that seem pretty important.

jebarker · on Dec 31, 2024

I was thinking 20/25 is pretty great! At least 5 of the problems were pretty tricky and easy to fail due to small errors.