I have created a "semi-interactive" AI coding workflow.
I write what I want, the LLM responds with edits, my 100 lines of Python implement them in the project. It can edit any number of files in one LLM call, which is very nice (and very cheap and fast).
I tested this with Kimi K2 on Groq (also 1000 tok/s) and very impressed.
I want to say this is the best use case for fast models -- that frictionlessness in your workflow -- though it burns tokens pretty fast working like that! (Agentic is even more nuts with how fast it burns tokens on fast models, so the $50 is actually pretty great value.)
I write what I want, the LLM responds with edits, my 100 lines of Python implement them in the project. It can edit any number of files in one LLM call, which is very nice (and very cheap and fast).
I tested this with Kimi K2 on Groq (also 1000 tok/s) and very impressed.
I want to say this is the best use case for fast models -- that frictionlessness in your workflow -- though it burns tokens pretty fast working like that! (Agentic is even more nuts with how fast it burns tokens on fast models, so the $50 is actually pretty great value.)