This is super big news if it’s real. Basically, given an agent with an initial s...

Kiro · on Dec 1, 2024

Devin is real. What do you mean?

Anyway, this is pretty standard stuff already. In all my agent workflows the agents are able to write their own code and execute it before passing the result to the next agent. It doesn't need to be perfect since you always have an agent validating the results, sending the task back if necessary.

I haven't read the paper beyond the synopsis so I might be missing a crucial key takeaway and I presume it has a lot of additional layers.

wokwokwok · on Dec 1, 2024

As evidenced by the reaction to Devin, no, it’s not real.

There’s a limit, beyond which agent generated code is, in general, not reliable.

All of the people who claim otherwise (like the Devin videos) have shown to be fake (1) or cherry-picked.

Having agent generated code is arbitrary code to solve arbitrary problems is. Not. A. Solved. Problem.

Yet.

…no matter, no matter how many AI bros claim otherwise, currently.

Being able to decompose complex problems into part small enough to be able to be solved by current models would be a big deal if it was real.

(Because, currently the SoTA can’t reliably do this; this should not be a remotely controversial claim to people familiar with this space)

So tldr; extraordinary claims require extraordinary evidence. Which is absent here, as far as I can tell. They specifically call out in the paper that generated actions are overly specific and don’t always work; but as I said, it’s doing well on the leader board, so it’s clearly doing something, which is working, but there’s just noooooo way of seeing what.

[1] - https://www.zeniteq.com/blog/devins-demo-as-the-first-ai-sof...

IanCal · on Dec 1, 2024

> If you’ve figured out how to get an agent to write error free deterministic code to perform arbitrary actions in a chain of thought process

You don't have to have it perfect, and the more you reuse things that you know work the less you have to build each time (reducing places for errors)

> Just generate code for each step.

We don't do this as humans, we build and reuse pieces.