Same experience here! As an analogy, consider the model knows both about arabic or roman number representations. But in alternate universe, it has been trained so much on roman numbers ("Bad Code") that it won't give you the arabic ones ("Good Code") unless you prompt it directly, even when they are clearly superior.
I also believe that overall repository code quality is important for AI agents - the more "beautiful" it is, the more the agent can mimic the "beauty".
This article resonates exactly how I think about it as well. For example, at minfx.ai (a Neptune/wandb alternative), we cache time series that can contain millions of floats for fast access. Any engineer worth their title would never make a copy of these and would pass around pointers for access. Opus, when stuck in a place where passing the pointer was a bit more difficult (due to async and Rust lifetimes), would just make the copy, rather than rearchitect or at least stop and notify user. Many such examples of ‘lazy’ and thus bad design.
Definitely the combination of callgrind (valgrind --tool=callgrind) and kcachegrind, or the combination of HotSpot and perf.
I have toyed with Intel's vTune, but I felt it was very hard to get running so its discouraging before you even start. That said, if you need a lot of info on cache etc., vTune is fantastic.
What I think the author is hoping to get is some inspectable graph of the whys that can be a basis for further automation/analysis. That’s interesting, but the line to actual code then becomes blurry. For instance, what about self-consistency across time? If this would be just text, it would come out of sync (like all doc text does). If it's code, then maybe you just had wrong abstractions the whole time?
The way we solve the why/what separation (at minfx.ai) is by having a top-level PLAN.md document for why the commit was built, as well as regenerating README.md files on the paths to every touched file in the commit.
Admittedly, this still leans more into the "what" rather than "why".
I will need to think about this more, hmm.
This helps us to keep it well-documented and LLM-token efficient at the same time. What also helps is Rust forces you into a reasonable code structure with its pub/private modules, so things are naturally more encapsulated, which helps the documentation as well.
My experience (with minfx.ai) has been that it is very important to build a system which imposes lots of constraints on the code. The more constrained you can make it, the better. Rust helps a lot in this. Thanks to this, for the first time in my career, I feel like the bigger the system gets, /the easier/ it is to develop, because AI can discover and reuse common components. While human would struggle searching for these and how to use them in a large codebase. Very counter-intuitive!
That's still pretty bloated. That's enough size to fit an entire Android application a few years ago (before AndroidX) and simple Windows/Linux applications. I'll agree that it's justified if you're optimizing for runtime performance rather than first-load, which seems to be appropriate for your product, right?!
What is this 2 MB for? It would be interesting to hear about your WebAssembly performance story!
Regarding the website homepage itself: it weighs around 767.32 kB uncompressed in my testing, most of which is an unoptimized 200+kB JPEG file and some insanely large web fonts (which honestly are unnecessary, the website looks _pretty good_ and could load much faster without them).
We love wasm! You can get pretty far with it. We’re building new machine learning experiment tracker using wasm on the front end. (If you know what Wandb or Neptune is, you should give us a try!)
As far as I know, we are the fastest on the market. The multithreaded support is a pain though.
I also believe that overall repository code quality is important for AI agents - the more "beautiful" it is, the more the agent can mimic the "beauty".