Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

LM studio in API mode, then literally any frontend that talks openAI api.

Or, just use the LM studio front end, it's better than anything I've used for desktop use.

I get 35t/s gemma 15b Q8 - you'll need a smaller one, probably gemma 3 15b q4k_l. I have a 3090, that's why.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: