More

sigbottle · 2026-03-19T17:49:23 1773942563

It's been ages since I long divided lol.... I might not even remember it anymore.

I'd probably do some hacky ass binary search or something (at least for easy integers)

sigbottle · 2026-03-18T17:53:35 1773856415

That quote is so relatable lol.

sigbottle · 2026-03-18T17:46:55 1773856015

How much would a real personal assistant cost?

cheema33 · 2026-03-18T18:27:11 1773858431

> How much would a real personal assistant cost?

A lot. And wouldn't be as good or fast. I am speaking from experience.

sigbottle · 2026-03-18T14:27:52 1773844072

Yup yup yup. I burned literally a weeks worth of the 20$ claude subscription and then 20$ worth of API credits on gsdv2. To get like 500 LOC.

And that was AFTER literally burning a weeks worth of codex and Claude 20$ plans and 50$ API credits and getting completely bumfucked - AI was faking out tests etc.

I had better experiences just guiding the thing myself. It definitely was not a set and forget experience (6 hours of constant monitoring) but I was able to get a full research MVP that informed the next iteration with only 75% of a codex weekly plan.

FromTheFirstIn · 2026-03-18T15:18:15 1773847095

You spent $25 on 500 LOC?

sigbottle · 2026-03-18T17:30:04 1773855004

Well, there were milestones and docs and extra scaffolding that the gsd system produces, but yes. and it didn't seem like progress was going to go any faster.

sigbottle · 2026-03-16T14:55:03 1773672903

Not to be an "uhm actually" guy but this goes into a lot of interesting philosophy in the first half of the 20th century. You would probably agree that "a fish is a fish" is a tautology, but for more complicated things it gets murkier and murkier. Separating out what are the tautologies from not was a big effort. Then Quine came along, and a big portion of people migrated away from the distinction

scythmic_waves · 2026-03-16T16:14:28 1773677668

I dabble in "um actually"s myself (especially given that my original comment was one), so no worries :)

I don't disagree with your comment exactly. But I primarily wanted to push back on a common response to scientific works. Something to the effect of "Well obviously, everyone knew that!".

Except they didn't because they (presumably) didn't actually investigate. And even after the science, they still don't _know it_ know it. But post-scientific inquiry, they have a much stronger claim to the knowledge than they did before. So the type of dismissal in the root comment is seriously missing the point.

sigbottle · 2026-03-16T09:28:09 1773653289

I recently had a horrible misalignment issue with a 1 agent loop. I've never done RL research, but this kind of shit was the exact kind of thing I heard about in RL papers - shimming out what should be network tests by echoing "completed" with the 'verification' being grepping for "completed", and then actually going and marking that off as "done" in the plan doc...

Admittedly I was using gsdv2; I've never had this issue with codex and claude. Sure, some RL hacking such as silent defaults or overly defensive code for no reason. Nothing that seemed basically actively malicious such as the above though. Still, gsdv2 is a 1-agent scaffolding pipeline.

I think the issue is that these 1-agent pipelines are "YOU MUST PLAN IMPLEMENT VERIFY EVERYTHING YOURSELF!" and extremely aggressive language like that. I think that kind of language coerces the agent to do actively malicious hacks, especially if the pipeline itself doesn't see "I am blocked, shifting tasks" as a valid outcome.

1-agent pipelines are like a horrible horrible DFS. I still somewhat function when I'm in DFS mode, but that's because I have longer memory than a goldfish.

sigbottle · 2026-03-16T03:34:25 1773632065

There should be more willingness to have agents loudly fail with loud TODOs rather than try and 1 shot everything.

At the very least, agentic systems must have distinct coders and verifiers. Context rot is very real, and I've found with some modern prompting systems there are severe alignment failures (literally 2023 LLM RL levels of stubbing out and hacking tests just to get tests "passing"). It's kind of absurd.

I would rather an agent make 10 TODO's and loudly fail than make 1 silent fallback or sloppy architectural decision or outright malicious compliance.

This wouldn't work in a real company because this would devolve into office politics and drudgery. But agents don't have feelings and are excellent at synthesis. Have them generate their own (TEMPORARY) data.

Agents can be spun off to do so many experiments and create so many artifacts, and furthermore, a lot more (TEMPORARY) artifacts is ripe for analysis by other agents. Is the theory, anyways.

The effectively platonic view that we just need to keep specifying more and more formal requirements is not sustainable. Many top labs are already doing code review with AI because of code output.

sigbottle · 2026-03-16T01:11:49 1773623509

I am rewriting an agent framework from scratch because another agent framework, combined with my prompting, led to 2023-level regressions in alignment (completely faking tests, echoing "completed" then validating the test by grepping for the string "completed", when it was supposed to bootstrap a udp tunnel over ssh for that test...).

Many top labs [1] [2] already have heavily automated code review already and it's not slowing down. That doesn't mean I'm trusting everything blindly, but yes, over time, it should handle less and less "lower level" tasks and it's a good thing if it can.

[1] https://openai.com/index/harness-engineering/ [2] https://claude.com/blog/code-review

Further I want to vent about two things:

- Things can be improved.

- You are allowed to complain about anything, while not improving things yourself.

I think the mid 2010s really popularized self improvement in a way that you can't really argue with (if you disagree with "put in more effort and be more focused", you're obviously just lazy!). It's funny because the point of engineering is to find better solutions, but technically yes, an always valid solution is just "suck it up".

But moreover, if you do not allow these two premises, what ends up happening in practice for a lot of people, is that basically you can just interpret any slightly pushback as "oh they're just a whiner", and if they're not doing something to fix their problem this instant, that "obviously" validates your claim (and even if they are, it doesn't count, they should still not be a "debbie downer", etc.).

Sometimes a premise can sound extreme, but people forget that premises are not in a complete logical vaccuum, you actually live out and believe said premises, and by taking on a certain position, it's often more about what follows downstream from the behavior than the actual words themselves.

sigbottle · 2026-03-16T00:28:22 1773620902

hell yeah! The terminal is great!

sigbottle · 2026-03-14T23:38:28 1773531508

Append only logs >>> in-place writing and rewriting.

I mean, in real life, we call this a "diary" LOL. But even the fact that a mere "diary" doesn't have the same prestiege as say, all other forms of communication, I feel like just a tiny part of it was because it was generally hard throughout human history for the majority of people to write. Like most people were not knowledge workers, typing has definitely made it easier to write, and distribution of writing is prolific.

Obviously, there's actual benefits - compression, the concept of iterating on thoughts over and over, all of that is good.

But some of it I feel like is undeserved. Append only logs are great :D

katspaugh · 2026-03-15T06:35:35 1773556535

Thanks for this larger-scale observation! I personally always feel a bit like a Lovecraft character writing those entries. :)