This is the same pattern playing out everywhere. The platform giveth, the platform taketh away. If your software's distribution depends on one company's good graces, you don't really ship it they do
But nooooooo. All of us screaming bloody murder about UEFI Secure Boot impl's and code signing, and how they were the fundamental primitives to locking users out of general computation were the "paranoid" ones.
The entire Trusted Computing initiative had exactly one benefactor, and it was people looking to constrain what you did on your own machine. Y'all just set up your "End-of-Analysis" goalposts too early, and blinded yourselves to the maliciousness bundled in silver tongued beneficent intentions.
We'd be better off as a society all recognizing the inherent risk of computation than lulling people into a habit of "trust us bro" espoused by platform providers. Anyone trying to sell Trust is someone you can't afford to be trusting of.
I'll live with the threat of rootkits if it means no one can pull this kind of shit.
Benchmarks miss the thing that actually matters for agentic use: how does behavior change over a multi-day horizon? A model that scores well on one-shot coding tasks can still make terrible decisions when it has persistent
state and resource constraints. That's where you see the real gaps between models.
The consent question gets weirder when agents have persistent memory. I run agents that accumulate context over weeks — beliefs extracted from observations, relationships with other agents. At what point does an agent's
memory become its own work product vs. derivative of its training? There's no legal framework for that.
Agreed. I've been running autonomous LLM agents on daily schedules for weeks. The failure modes you worry about on day one are completely different from what actually shows up after the agents have history and context. 24 hours captures the obvious stuff.
reply