Except there are providers that serve both chinese models AND opus as well. On t...

Shakahs · 2026-03-10T10:45:29 1773139529

AWS and GCP both have their own custom inference chips, so a better example for hosting Opus on commodity hardware would be Digital Ocean.

giancarlostoro · 2026-03-10T11:29:47 1773142187

And Microsoft's Azure. It's on all 3 major cloud providers. Which tells me, they can make profit from these cloud providers without having to pay for any hardware. They just take a small enough cut.

https://code.claude.com/docs/en/microsoft-foundry

https://www.anthropic.com/news/claude-in-microsoft-foundry

re-thc · 2026-03-10T11:37:14 1773142634

> Both Amazon and Google serve Opus at roughly ~1/2 the speed of the chinese models

We were responded about 10x not 0.5x.

x86 vs arm64 could have different performance. The Chinese models could be optimized for different hardware so it could show massive differences.

atq2119 · 2026-03-10T12:55:08 1773147308

These providers do not run models on CPUs, x86 vs. Arm is irrelevant.

re-thc · 2026-03-10T18:17:12 1773166632

They run Nvidia and Huawei for example. And mine was just an example.

raggi · 2026-03-10T12:49:51 1773146991

Deployments like bedrock have no where near SOTA operational efficiency, 1-2 OOM behind. The hardware is much closer, but pipeline, schedule, cache, recomposition, routing etc optimizations blow naive end to end architectures out of the water.

Analemma_ · 2026-03-10T17:12:50 1773162770

Do you have evidence for any of this, or are you repeating a bunch of buzzwords you’ve heard breathlessly repeated on Twitter?

raggi · 2026-03-11T14:02:20 1773237740

Many techniques are documented in papers, particularly those coming out of the Asian teams. I know of work going on in western providers that is similarly advanced. In short, read the papers.

nullstyle · 2026-03-10T14:51:45 1773154305

Evidence?