> And since they're also getting dramatically cheaper, it's becoming increasingly compelling to actually run these models in real-life applications.
They're not really cheaper than the SOTA open models on third-party inference platforms, and they're generally dumber. I suppose they're still worth it if you must minimize latency for any given level of smarts, but not really otherwise.
They're not really cheaper than the SOTA open models on third-party inference platforms, and they're generally dumber. I suppose they're still worth it if you must minimize latency for any given level of smarts, but not really otherwise.