I have to wonder if its missing forest for the trees: do you perceive GPT-OSS-120b as an emotionally warm model?
(FWIW this reply may be beneath your comment, but not necessarily voiced to you, the quoted section jumped over it too, direct from 5 isn't warm, to 4o-non-reasoning is, to the math on self-hosting a reasoning model)
Additionally, author: I maintain a llama.cpp-based app on several platforms for a couple years now, I am not sure how to arrive at 4096 tokens = 3 GB, it's off by an OOM AFAICT.
I was going off of what I could directly observe on my M3 Max MacBook Pro running Ollama. I was comparing the model weights file on disk with the amount that `ollama ps` reported with a 4k context window.
> I have to wonder if its missing forest for the trees: do you perceive GPT-OSS-120b as an emotionally warm model?
I haven't needed it to be "emotionally warm" for the use cases I use it for, but I'm guessing you could steer it via the developer/system messages to be sufficiently warm, depending on exactly what use case you had in mind.
(FWIW this reply may be beneath your comment, but not necessarily voiced to you, the quoted section jumped over it too, direct from 5 isn't warm, to 4o-non-reasoning is, to the math on self-hosting a reasoning model)
Additionally, author: I maintain a llama.cpp-based app on several platforms for a couple years now, I am not sure how to arrive at 4096 tokens = 3 GB, it's off by an OOM AFAICT.