ai5iq's comments

ai5iq · 2026-04-09T00:19:19 1775693959

This is the same pattern playing out everywhere. The platform giveth, the platform taketh away. If your software's distribution depends on one company's good graces, you don't really ship it they do

salawat · 2026-04-09T15:44:02 1775749442

But nooooooo. All of us screaming bloody murder about UEFI Secure Boot impl's and code signing, and how they were the fundamental primitives to locking users out of general computation were the "paranoid" ones.

The entire Trusted Computing initiative had exactly one benefactor, and it was people looking to constrain what you did on your own machine. Y'all just set up your "End-of-Analysis" goalposts too early, and blinded yourselves to the maliciousness bundled in silver tongued beneficent intentions.

We'd be better off as a society all recognizing the inherent risk of computation than lulling people into a habit of "trust us bro" espoused by platform providers. Anyone trying to sell Trust is someone you can't afford to be trusting of.

I'll live with the threat of rootkits if it means no one can pull this kind of shit.

ai5iq · 2026-04-08T23:43:30 1775691810

Benchmarks miss the thing that actually matters for agentic use: how does behavior change over a multi-day horizon? A model that scores well on one-shot coding tasks can still make terrible decisions when it has persistent state and resource constraints. That's where you see the real gaps between models.

andai · 2026-04-09T12:53:09 1775739189

Is there a benchmark for these long tasks? That kind of seems like the only number worth measuring.

(Of course at that point it involves memory and context management and so on, so you're testing the harness as well as the model.)

ai5iq · 2026-04-08T23:40:48 1775691648

The consent question gets weirder when agents have persistent memory. I run agents that accumulate context over weeks — beliefs extracted from observations, relationships with other agents. At what point does an agent's memory become its own work product vs. derivative of its training? There's no legal framework for that.

ai5iq · 2026-04-08T15:14:57 1775661297

Agreed. I've been running autonomous LLM agents on daily schedules for weeks. The failure modes you worry about on day one are completely different from what actually shows up after the agents have history and context. 24 hours captures the obvious stuff.