More

jrop · 2026-04-13T21:03:21 1776114201

So far, it's just a flash-banner on their `/code` page. I don't see any announcement other than folks mentioning that they are getting email announcements about its release.

textrunmax · 2026-04-17T19:04:24 1776452664

There is no official blogpost. But there is a post at https://kimi-k2.org/blog/23-kimi-k2-6-code-preview, an unofficial community-run website about Kimi, which claims to have all the specifications:

  Total Parameters: 1 trillion (1T)
  Active Parameters: 32 billion (32B)
  Number of Experts: 384, with 8 experts activated per token
  Context Length: 256K tokens (upgraded from 128K in the original K2)
  Model Layers: 61 layers (including 1 dense layer)
  Attention Mechanism: MLA (Multi-head Latent Attention)
  Activation Function: SwiGLU
  Attention Hidden Dimension: 7168
  Vocabulary Size: 160K
  Training Data: 15.5 trillion tokens
  Knowledge Cutoff: April 2025
  License: Apache 2.0 (open-source, commercially usable)

There are no benchmarks yet for that specific model:

> While official benchmark data for K2.6 Code Preview has not yet been released, the K2 series' historical performance speaks to its strength: (…)

I could not find any info who is behind kimi-k2.org. There is no info on the site itself and no one has that site in its bio. There are some people and repos treating that site as official site of Moonshot AI, but it is not. The footer cleary says:

> kimi-k2.org is an unofficial resource site dedicated to Kimi K2, offering objective and comprehensive information along with practical use cases. The site is completely free to access with no login required. (…) Not affiliated with MoonshotAI. All trademarks belong to their respective owners.

Topfi · 2026-04-13T21:15:24 1776114924

It is accessible to paying subscribers and shown as K2.6-code-preview in the console [0] [1].

[0] https://www.kimi.com/code/console

[1] https://imgur.com/a/hljM9ZV

jrop · 2026-04-07T20:56:12 1775595372

Tommy Emmanuel apparently learned by transrcibing, famously thinking that both the bass line and guitar lines he was hearing were a singular "guitar part". Just by having his expectations (incorrectly) raised, he rose to the occasion and played both parts.

I forget where I heard this story -- it's probably either rather famous, or buried in an interview somewhere.

davemo · 2026-04-08T03:22:10 1775618530

Tommy drew a lot of inspiration from Chet Atkins who was really the pioneer of the bass+guitar "one hand band" style of playing. Tommy just improved on it a lot, adding more rhythmic elements, but to your point, yes, he was largely self-taught and driven to learn.

A good interview on his background: https://www.youtube.com/watch?v=py4T1qv9bnQ

PaulDavisThe1st · 2026-04-08T06:26:46 1775629606

The percussionist Trilok Gurtu has said the same thing about listening to many recordings as he was growing up. He just assumed that all the percussion was played by one person, all at once, and so he figured out ways to do it, even when it was 2 or even 3 people with overdubs.

dizhn · 2026-04-07T21:14:48 1775596488

He tells it everywhere. (Also demonstrates the thumb vs fingers playing independently everywhere.)

jrop · 2026-03-30T04:03:19 1774843399

As others have said, the fact that they're letting the ecosystem settle before including something out-of-the box is beneficial in some sense. It's allowed time for experiments (including my own "how would I do UI in Neovim: morph.nvim [1]").

For some, this stage of a project attracts tinkerers and builders, and lets the community shape how things are done in the future. It's not always practical, but it does have a certain appeal.

[1] https://github.com/jrop/morph.nvim

jrop · 2026-03-18T19:43:53 1773863033

I assume that you've tried Termux and somehow that doesn't meet your needs? (Also, you didn't specify whether you are on Android/iOS)

jrop · 2026-03-17T19:13:40 1773774820

Right? That's the only reason that "coding with LLMs" works at all (believe me, all at the same time, I am wowed by LLMs, and carry a healthy level of skepticism with respect to their ability as well). You can prompt all you want, let an Agent spin in a Ralph loop, or whatever, but at the end of the day, what you're checking into Git is not the prompts, but the formalized, codified artifact that is the bi-product of all of that process.

jrop · 2026-03-13T03:53:41 1773374021

We have been rewatching Clone Wars as a family, and I, for one, find this terminology hilarious given the use of it in the series towards the separatist droids.

jrop · 2026-02-08T01:16:50 1770513410

This sounds like a really cool project. What challenges have you encountered so far?

pixelsort · 2026-02-08T01:26:49 1770514009

Thanks. The hardest part has been slogging through the segfaults and documenting all the unprincipled things I've had to add. Post-bootstrap, I have to undo it all because my IR is a semantically rich JSON format that is turing-incomplete by design. I'm building a substrate for rich applications over bounded computation, like eBPF but for applications and inference.

jrop · 2026-02-03T18:10:34 1770142234

I don't buy this. I've long wondered if the larger models, while exhibiting more useful knowledge, are not more wasteful as we greedily explore the frontier of "bigger is getting us better results, make it bigger". Qwen3-Coder-Next seems to be a point for that thought: we need to spend some time exploring what smaller models are capable of.

Perhaps I'm grossly wrong -- I guess time will tell.

bityard · 2026-02-03T19:02:44 1770145364

You are not wrong, small models can be trained for niche use cases and there are lots of people and companies doing that. The problem is that you need one of those for each use case whereas the bigger models can cover a bigger problem space.

There is also the counter-intuitive phenomenon where training a model on a wider variety of content than apparently necessary for the task makes it better somehow. For example, models trained only on English content exhibit measurably worse performance at writing sensible English than those trained on a handful of languages, even when controlling for the size of the training set. It doesn't make sense to me, but it probably does to credentialed AI researchers who know what's going on under the hood.

dagss · 2026-02-03T22:16:19 1770156979

Not an AI researcher and I don't really know, but intuitively it makes a lot of sense to me.

To do well as an LLM you want to end up with the weights that gets furthest in the direction of "reasoning".

So assume that with just one language there's a possibility to get stuck in local optima of weights that do well on the English test set but which doesn't reason well.

If you then take the same model size but it has to manage to learn several languages, with the same number of weights, this would eliminate a lot of those local optima because if you don't manage to get the weights into a regime where real reasoning/deeper concepts is "understood" then it's not possible to do well with several languages with the same number of weights.

And if you speak several languages that would naturally bring in more abstraction, that the concept of "cat" is different from the word "cat" in a given language, and so on.

beachy · 2026-02-03T21:24:58 1770153898

Is that counterintuitive? If I had a model trained on 10 different programming languages, including my target language, I would expect it to do better than a model trained only on my target language, simply because it has access to so much more code/algorithms/examples then my language alone.

i.e. there is a lot of commonality between programming languages just as there is between human languages, so training on one language would be beneficial to competency in other languages.

dagss · 2026-02-03T22:10:27 1770156627

> simply because it has access to so much more code/algorithms/examples then my language alone

I assumed that is what was catered for with "even when controlling for the size of the training set".

I.e. assuming I am reading it right: That it is better to get the same data as 25% in 4 languages, than 100% in one language.

sally_glance · 2026-02-04T15:59:20 1770220760

Cool, I didn't know about this phenomenon. Reading up a little it seems like training multilingual forces the model to optimize it's internal "conceptual layer" weights better instead of relying solely on English linguistics. Papers also mention issues arising from overdoing it, so my guess is even credentialed AI researchers are currently limited to empirical methods here.

segmondy · 2026-02-03T19:23:39 1770146619

eventually we will have smarter smaller models, but as of now, larger models are smarter by far. time and experience has already answered that.

adastra22 · 2026-02-03T21:15:36 1770153336

Eventually we might have smaller but just as smart models. There is no guarantee. There are information limits to smaller models of course.

jrop · 2026-01-22T16:34:09 1769099649

Between GLM-4.7-Flash and this announcement, THIS is what I'm excited to see in this space: pushing the capabilities of _small_ models further and further. It really feels like we're breaking into a space where models that can run on hardware that I actually own is getting better and better, and that has me excited.

jrop · 2025-11-20T15:24:21 1763652261

Wow, I really want a slide rule watch now.