Hacker Newsnew | past | comments | ask | show | jobs | submit | takwatanabe's commentslogin

We build and run a multi-agent system. Today Cursor won. For a log analysis task — Cursor: 5 minutes. Our pipeline: 30 minutes.

Still a case for it: 1. Isolated contexts per role (CS vs. engineering) — agents don't bleed into each other 2. Hard permission boundaries per agent 3. Local models (Qwen) for cheap routine tasks

Multi-agent loses at debugging. But the structure has value.


Knowing fundamentals gives you deeper intuition about the technology, at every layer. When compilers appeared, you no longer needed to understand assembly and registers. But knowing how assembly and registers actually work makes you better at C. When Python came along, low-level languages felt unnecessary. But understanding C's memory management is what lets you understand Python's limitations. Now LLMs write the implementation. LLMs abstract away the code. But knowing how algorithms work, even in a high-level language like Python, is exactly how you catch LLM mistakes and inefficiencies.

Knowledge builds on knowledge. We learn basic math before advanced math for a reason. The pyramid keeps accumulating from what came before. Understanding the fundamentals still matters, I think.


The author of Claude Code himself mentioned this in a recent interview. If I recall correctly, he mentioned that the best programmers he knows have an understanding of the "layer below the layer", which I think it's a good way of putting it. You're a better C programmer if you understand assembly, and you're a better "vibe coder" if you can actually understand the LLM generated code.


Right. At Opus 4.6 rates, once you're at 700k context, each tool call costs ~$1 just for cache reads alone. 100 tool calls = $100+ before you even count outputs. 'Standard pricing' is doing a lot of work here lol


Cache reads don’t count as input tokens you pay for lol.

https://www.claudecodecamp.com/p/how-prompt-caching-actually...


As a psychiatrist, this problem reminds me of something we studied for a long time. Patients get worse in areas we are not measuring, but the numbers we record still look normal. We learned that checking results catches things that checking process cannot catch.


The post-it note analogy is good, but as a psychiatrist, I'd frame it differently: LLMs are essentially patients with anterograde amnesia.

They can reason brilliantly within a single conversation — just like an amnesic patient can hold an intelligent discussion — but the moment the session ends, everything is gone. No learning happened. No memory formed.

What's worse, even within a session, they degrade. Research shows that effective context utilization drops to <1% of the nominal window on some tasks (Paulsen 2025). Claude 3.5 Sonnet's 200K context has an effective window of ~4K on certain benchmarks. Du et al. (EMNLP 2025) found that context length alone causes 13-85% performance degradation — even when all irrelevant tokens are removed. Length itself is the poison.

This pattern is structurally identical to what I see in clinical practice every day. Anxiety fills working memory with background worry, hallucinations inject noise tokens, depressive rumination creates circular context that blocks updating. In every case, the treatment is the same: clear the context. Medication, sleep, or — for an LLM — a fresh session.

The industry keeps betting on bigger context windows, but that's expanding warehouse floor space while the desk stays the same size. The human brain solved this hundreds of millions of years ago: store everything in long-term memory, recall selectively when needed, consolidate during sleep, and actively forget what's no longer useful.

We can build the smartest single model in the world — the greatest genius humanity has ever seen — but a genius with no memory and no sleep is still just an amnesic savant. The ceiling isn't intelligence. It's architecture.


I want to believe I'm reading an insightful comment from an actual human deeply familiar with both human congnition and how LLMs work, but this post is chock full of LLMisms


Yeah, fair enough. I leaned on Claude to clean up my English. I normally write in Japanese. The clinical stuff is mine though, I run a psych clinic in Japan (link in profile). Should've just written it messier.


The only real way to unfuck your foreign language is to use it. Which does mean accepting you wouldn't be perfect doing it.


Yep. It's the guy from the movie "Memento" doing your physics homework on a couple pages of legal paper. When he runs out of paper, he has to write a post-it note summarizing it all, then burn the papers, and his memory resets. You can only do so much with that.

If we can crack long memory we're most of the way there. But you need RL in addition to long memory or the model doesn't improve. Part of the genius of humans is their adaptability. Show them how to make coffee with one coffee machine, they adapt to pretty much every other coffee machine; that's not just memory, that's RL. (Or a simpler example: crows are more capable of learning and acting with memory than an LLM is)

Currently the only way around both of these is brute-force (take in RL input from users/experiments, re-train the models constantly), and that's both very slow and error-prone (the flaws in models' thinking comes from lack of high-quality RL inputs). So without two major breakthoughs we're stuck tweaking what we got.


The coffee machine example is interesting. That's procedural memory in neuroscience. You don't memorize each machine. You abstract the steps. Grind, filter, add grounds, pour water. Then you adapt to any machine.

LLMs can't form procedural memory on their own. But you can build it outside the model. Store abstracted procedures, inject them when needed. That's closer to how the brain actually works than trying to retrain the model every time.


A lot of that seems to be the usual "you're training them wrong".

Sonnet 3.5 is old hat, and today's Sonnet 4.6 ships with an extra long 1M context window. And performs better on long context tasks while at it.

There are also attempts to address long context attention performance on the architectural side - streaming, learned KV dropout, differential attention. All of which can allow LLMs to sustain longer sessions and leverage longer contexts better.

If we're comparing to wet meat, then the closest thing humans have to context is working memory. Which humans also get a limited amount of - but can use to do complex work by loading things in and out of it. Which LLMs can also be trained to do. Today's tools like file search and context compression are crude versions of that.


I know Sonnet 4.6 has a 1M context window. I use it every day. But in my experience with Claude Code and Cursor, performance clearly drops between 20k and 200k context. External memory is where the real fix is, not bigger windows.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: