not a fan of these kinds of arguments. the 'correct' token is entirely dependent on the dataset. a LLM could have perfect training loss given a dataset, but this has no predictive power on its ability to 'answer' arbitrary prompts.
In natural language, many strings are equally valid. there are many ways to chain tokens together to get the 'correct' answer to an in sample prompt. A model with perfect loss will then for ambiguous sequences of tokens, produce a likelihood over the next tokens that corresponds to number valid token paths in the given corpus given the next token.
Compounding errors can certainly happen, but for many things upstream of the key tokens its irrelevant. There are so many ways to phrase things that are equally correct- I mean this is how language evolved (and continues to). Getting back to my first point, if you assume you have a LLM with perfect loss on the training dataset, you still can get garbage back at test time- thus i'm not sure thinking about 'compounding errors' is useful.
Errors in LLM reasoning I suspect are more closely related to noisy training data or an overabundance of low quality training data. I've observed this in how all the reasoning LLMs work, given things that are less common in the corpus of (the internet and digital assets) and require higher order reasoning, they tend to fail. Whereas these advanced math or programming problems tend to go a bit better, input data is likely much cleaner.
But for something like: how do I change the fixture on this light, I'll get back some kind of garbage from the SEO-verse. IMO next step for LLMs is figuring out how to curate an extremely high quality dataset at scale.
> ChatGPT is neat. For all we know we’re near a local maxima of what we’re capable of achieving without another completely new approach that will take 10 or 15 years to figure out. There’s no proof that the acceleration and capabilities we’ve seen over the last 2 to 3 years will continue like that.
Two issues here:
1) we are only about ~10 years into the deep learning boom
2) we've seen deep learning scale with compute over this 10 years, not only over the last 2-3 years.
It could be we've reached the end of the road for NLP, no one really knows. But generally we see breakthroughs in lockstep with big jumps in compute capability (typically, GPU releases, occasionally with architecture changes).
>... local maxima ... new approach that will take 10 or 15 years ...
I was listening to the recent interviews with Sam Altman and the Anthropic guy who are familiar with current research and they are very not like that. It's more wow we've got so much to build, AGI in a couple of years. (For it seems to me a rather limited version of AGI - more can code well rather than can fix your plumbing.)
Their future success is heavily tied to that set of opinions being correct and drumming up further investment. Even with the best will in the world, this type of quantitative opinion will be hugely positively biased.
They are CEOs, half the job is public cheerleading.
this will be the new "fusion in 10 years", but with the added downside of expending a small-country's worth of carbon per day while not actually getting us there.
like, of course sam altman is going to talk about how close they are and how they need more money.
Been a while since AMD had the top tier offering, but it has been trading blows in the middle tier segment the entire time. If you are just looking for a gamer card (ie not max AI performance), the AMD is typically cheaper and less power hungry than the equivalent Nvidia.
But, the fact that Nvidia cards command higher margins also reflects their better software stack, right? Nvidia “lets them” trade blows in the midrange, or, equivalently, Nvidia is receiving the reward of their software investments: even their midrange hardware commands a premium.
It was true with RDNA 2. RDNA 3 regressed on this a bit, supposedly there was a hardware hiccup that prevented them from hitting frequency and voltage targets that they were hoping to reach.
In any case they're only slightly behind, not crazy far behind like Intel is.
competitive with H100 for inference. a 2 year old product on just one half of the ML story. H200 (and potentially B100) is the appropriate comparison based on their production in volume.