LM Studio puts stats at the bottom of each reply like: 2.09 tok/sec, 346 tokens, 1.74s to first token. This was for a 259 word response, so ~ 0.75 words/token. If that ratio holds, you might be getting 8 tok/sec on you M4 Max?
Looks like LM Studio is available for ARM based Macs, if you want to give that a try, that'd be one way to get these stats. LM Studio also surfaces up some parameters to play around with, and keeps a record of past conversations if that might appeal to you.
Looks like LM Studio is available for ARM based Macs, if you want to give that a try, that'd be one way to get these stats. LM Studio also surfaces up some parameters to play around with, and keeps a record of past conversations if that might appeal to you.