Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

What hardware do you have it running on? Do you feel you could replace the frontier models with it for everyday coding? Would/will you?


Around 20ish tokens a second with 6-bit quant at very long context lengths on my AMD AI Max 395+

I’m trying to use local models whenever possible. Still need to lean on the frontier models sometimes.


I'm getting ~30 tok/s on the A3B model with my 3070 Ti and 32k context.

> Do you feel you could replace the frontier models with it for everyday coding? Would/will you?

Probably not yet, but it's really good at composing shell commands. For scripting or one-liner generation, the A3B is really good. The web development skills are markedly better than Qwen's prior models in this parameter range, too.


That seems oddly low / slower by a fair amount than i get on my m4. (I believe it was ~45 tok/s?)

What quant are you using? How much ram does it have?


60 to 70 on a 5080, but only tinkering for now. The smaller models seem exceptionally good for what they are, and some can even do OCR reliably.


What quantization are you running on the 5080? I'm waiting to receive mine.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: